You are here

ImageCLEF 2008: WikipediaMM Task

ImageCLEF's wikipediaMM task provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a larger scale and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs.

In 2008, ImageCLEF wikipediaMM will use the image collection created and employed by the INEX Multimedia (MM) Track (2006-2007). This (INEX MM) wikipedia image collection contains approximately 150,000 images that cover diverse topics of interest. These images are associated with unstructured and noisy textual annotations in English.

This is an ad-hoc image retrieval task; the evaluation scenario is thereby similar to the classic TREC ad-hoc retrieval task and the ImageCLEFphoto task: simulation of the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance). The goal of the simulation is: given a textual query (and/or sample images and/or concepts) describing a user's (multimedia) information need, find as many relevant images as possible from the (INEX MM) wikipedia image collection. In this first year of the task, the focus in on monolingual retrieval.

Any method can be used to retrieve relevant documents. We encourage the use of both concept-based and content-based retrieval methods and, in particular, multimodal approaches that investigate the combination of evidence from different modalities.


The wikipediaMM task adopts the user model followed in INEX, whereby the participants in the various tracks create the topics and perform the relevance assessments themselves.

Therefore, participation in ImageCLEF's wikipediaMM task requires that each participating group:

  • creates topics
  • performs the relevance assessments on the created topics

Our experience on the INEX MM track indicates that the creation of topics does not require much effort, whereas the assessments usualy take around 1-2 working days per topic. This procedure is also reflected in the schedule of the task (see below).

Note that only those who participate in the topic development and assessment process wlll be granted access to the relevance assessments.

Data: Images & Metadata

The (INEX MM) wikipedia image collection consists of approximately 150,000 wikipedia images (in JPEG and PNG formats) provided by wikipedia users. Each image is associated with user-generated alphanumeric, unstructured metadata in English. These metadata usually contain a brief caption or description of the image, the Wikipedia user who uploaded the image, and the copyright information. These descriptions are highly heterogeneous and of varying length. The figure below provides an example image and its associated metadata.

anne frank house

Further information about the image collection can be found in:

T. Westerveld and R. van Zwol. The INEX 2006 Multimedia Track. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Advances in XML Information Retrieval:Fifth International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI). Springer-Verlag, 2007.

DOWNLOAD (participants only)

  • The wikipediaMM image collection (151,519 .jpeg and .png images) can be downloaded: here (14 GB)

    If you have problems donwloading the above file, you can download the images in smaller batches here

  • The thumbnails of the wikipediaMM image collection can be downloaded: here (2 GB)
  • The metadata of the images in wikipediaMM image collection can be downloaded: here (20 MB)
  • Additional information on the wikipediaMM collection can be downloaded here (1.7 MB). This contains:
    • A README-wikipediaMM file describing the provided data.
    • A imagesIDs.txt file listing all image identifiers.
    • A imagefile2metadatafile.txt file listing the correspondence between image and metadata files.
Data: Image Features & Concepts

Additional sources of information are also provided to help participants in the retrieval tasks. These resources are:

  • Image classification scores:  For each image, the classification scores for the 101 different MediaMill concepts are provided by University of Amsterdam (UvA). The UvA classifier is trained on manually annotated TRECVID video data and the concepts are selected for the broadcast news domain.

    More details can be found in:

    C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia, pages 421–430, New York, NY, USA, 2006. ACM Press.

    DOWNLOAD (participants only)

    • The list of the 101 MediaMill concepts can be found here.
    • The classification scores of (most of) the wikipediaMM images for the MediaMill concepts can be found here.
  • Image features: For each image, the set of the 120D feature vectors that has been used to derive the above image classification scores is available. Participants can use these feature vectors to custom-build a CBIR system, without having to pre-process the image collection.

    More details can be found in:

    J. C. v. Gemert, J.-M. Geusebroek, C. J. Veenman, C. G. M. Snoek, and A. W. M. Smeulders. Robust scene categorization by learning image statistics in context. In CVPRW ’06: Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop, page 105, Washington, DC, USA, 2006. IEEE Computer Society.

  • DOWNLOAD (participants only)

    • The feature vectors of (most of) the wikipediaMM images can be found here.

The topics for the 2008 ImageCLEF wikipediaMM task will include (i) topics previously used in INEX MM and ImageCLEF photo tasks and (ii) topics created by this year's task participants.

DOWNLOAD (participants only)

  • The guidelines for the topic development by the participants are available here.
  • The Candidate Topic Submission Form is available here

  • Topic release :
    • The list of the 75 topics can be found here.
    • The list of the 75 topics (with their narratives) can be found here.
    • The list of the image examples in the topics that are not part of the wikipediaMM collection can be found here.

The topics are multimedia queries that can consist of a textual, visual and a conceptual part, with the latter two parts being optional. An example topic in the appropriate format is the following:

  <number> 1 </number>
  <title> cities by night <title>
  <concept> building </concept>
  <image> </image>
  <narrative> I am decorating my flat and as I like photos of cities at night, I would like to find some that I could possibly print into posters. I would like to find photos of skylines or photos that contain parts of a city at night (including streets and buildings).Photos of cities (or the earth) from space are not relevant. </narrative>

Therefore, the topics include the following fields:

  • title: query by keywords
  • concept: query by one or more concepts (optional)
  • image query by one or more images (optional)
  • narrative description of the information need where the definitive definition of relevance and irrelevance are given
Baseline Retrieval Systems
To help you with the topic development process, we will provide baseline retrieval systems on the wikipediaMM data.
  • A text-based retrieval system powered by PF/Tijah can be found here *** NO LONGER AVAILABLE ***
    • Place the mouse over a retrieved image (thumbnail) to display its metadata
    • Click a retrieved image (thumbnail) to view the full-size image
  • A concept-based retrieval system can be found here *** NO LONGER AVAILABLE ***
    • Click a retrieved image (thumbnail) to view the full-size image
    • The classifiers are trained on TRECVID data, so the performance may not be optimal
    • The demo may return images that were part of the INEX MM collection, but are not part of the wikipediaMM collection. For these images, only their ids are displayed.
Data: Past Topics & Relevance Assessments

DOWNLOAD (participants only)

  • The topics used in INEX MM 2006-2007 and their relevance assessments are provided here (1.9 MB).

The topics are in the format outlined above, with an additional field <nexi> field that expresses the topic in the NEXI query language used in INEX. The relevance assessments (qrels) are in TREC format.

Some statistics on the querying paradigms employed in the INEX MM topics can be found below:

  2006 2007 2006-2007
Number of topics 13 20 33
Number of topics with multimedia hints 7 10 17
Number of topics with image (query-by-example) 6 7 13
Number of topics with concept (query-by-concept) 2 6 8
Number of topics with both image and concept 1 3 4
Evaluation Objectives

The characteristics of the (INEX MM) wikipedia image collection allow for the investigation of the following objectives:

  • how well do the retrieval approaches cope with larger scale image collections?
  • how well do the retrieval approaches cope with noisy and unstructured textual annotations?
  • how well do the content-based retrieval approaches cope with images that cover diverse topics and are of varying quality?
  • how well can systems exploit and combine different modalities given a user's multimedia information need? Can they outperform monomodal approaches like query-by-text, query-by-concept or query-by-image?

In the context of INEX MM 2006-2007, mainly text-based retrieval approaches have been examined. Here, we hope to attract more visually-oriented approaches and most importantly, multimodal approaches that investigate the combination of evidence from different modalities.

Retrieval Experiments

Experiments are performed as follows: the participants are given topics, these are used to create a query which is used to perform retrieval on the image collection. This process iterates (e.g. maybe involving relevance feedback) until they are satisfied with their runs. Participants might try different methods to increase the number of relevant in the top N rank positions (e.g., query expansion).

Participants are free to experiment with whatever methods they wish for image retrieval, e.g., query expansion based on thesaurus lookup or relevance feedback, indexing and retrieval on only part of the image caption, different models of retrieval, and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to perform the ad-hoc retrieval, rather than list all of these we will ask participants to indicate which of the following applies to each of their runs (we consider these the "main" dimensions which define the query for this ad-hoc task):

Dimension Available Codes
Topic language EN
Annotation language EN
Query/run type AUTO
Feedback/expansion FB, QE, FBQE, NOFB

Query language:
Used to specify the query language used in the run. Only English queries will be provided this year, so the language code indicating the query language should be English (EN).

Annotation language:
Used to specify the target language (i.e., the annotation set) used for the run. Only English annotation will be provided this year, so the language code indicating the target language should be English (EN).

Query/run type:
We distinguish between manual (MAN) and automatic (AUTO) submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. We encourage groups who want to investigate manual intervention further to participate in the interactive evaluation (iCLEF) task.

Feedback or Query Expansion:
Used to specify whether the run involves query expansion (QE) or feedback (FB) techniques, both of them (QEFB) or none of them (NOFB).

This describes the use of visual (image), text features or concepts in your submission. A text-only run will have modality text (TXT); a purely visual run will have modality image (IMG), a concept-based run will have modality concept (CON) and a combined submission (e.g. initial text search followed by a possibly combined visual search) will have as modality any combination thereof: text+image(TXTIMG), text+concept (TXTCON), image+concept(IMGCON), and text+image+concept(TXTIMGCON).


Participants can submit as many system runs as they require via the wikipediaMM submission site. *** NO LONGER AVAILABLE ***

Participants are required to submit ranked lists of (up to) the top 1000 images ranked in descending order of similarity (i.e. the highest nearer the top of the list). The format of submissions for this ad-hoc task can be found here and the filenames should distinguish different types of submission according to the table above.

Participants can submit a run in any of the permutations detailed in the previous table (above) , e.g., EN-EN-AUTO-NOFB-TXT for the English-English monolingual run using fully automatic text-based retrieval methods.

It is extremely important that we can get a detailed description of the techniques used for each submitted run.

When the topic contains an image example that is part of the wikipediaMM collection, this image should not be part of the retrieval results, i.e., we are seeking relevant images that the users are not familiar with (as they are with the images they provided as examples).

Please note that there should be at least 1 document entry in your results for each topic (i.e. if your system returns no results for a query then insert a dummy entry, e.g. 25 1 16019 0 4238 xyzT10af5 ). The reason for this is to make sure that all systems are compared with the same number of topics and relevant documents. Submissions not following the required format will not be evaluated.

Relevance Assessments
Assessors can perform their work starting from the wikipediaMM assessment page.

The page contains an explanation of the assessment system and contains links to the pools for the different groups. To access and assess the pools, you need your username and password (emailed by the wikipediaMM organisers).

DOWNLOAD (participants only)

  • The relevance assessments for the 75 topics can be found here. They are in TREC format.
The schedule can be found here:

  • 20.2.2008: registration opens for all CLEF tasks
  • 17.3.2008: data release (images + metadata)
  • 18.3.2008: data release (past INEX MM topics)
  • 19.3.2008: instructions and formatting criteria for candidate topics/queries provided to participants
  • 14.4.2008: submission deadline for candidate topics
  • 22.4.2008: topic release
  • 5.6.2008: submission of runs
  • 10.6.2008: distribution of merged results to participants for relevance assessments
  • 19.7.2008: submission deadline for relevance assessments
  • 19.7.2008: release of results
  • 15.8.1008: submission of working notes papers
  • 17.-19.9.2008: CLEF workshop in Aarhus, Denmark