You are here

Wikipedia retrieval task 2011

wikipedia header

Registration is now closed.

ImageCLEF's Wikipedia Retrieval task provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images and articles. The aim is to investigate retrieval approaches in the context of a large and heterogeneous collection of images and their noisy text annotations (similar to those encountered on the Web) that are searched for by users with diverse information needs. This diversity is simulated on behalf of the various topics covered by the queries as well as the different types of queries that are supposedly better solved by textual, visual or multimodal retrieval.

In 2011, the task uses the ImageCLEF 2010 Wikipedia Collection (Popescu et al., 2010), which contains 237,434 Wikipedia images that cover diverse topics of interest. These images are associated with unstructured and noisy textual annotations in English, French, and German.

(Popescu et al., 2010) A. Popescu, T. Tsikrika and J. Kludas Overview of the Wikipedia Retrieval Task at ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops) 2010.

  • ad-hoc image retrieval task:

    Given a textual, multilingual query and sample images describing a user's (multimedia) information need, find as many relevant images as possible from the Wikipedia image collection. To strengthen the visual modality up to 5 example images will be given.

    Any method can be used to retrieve relevant documents. We encourage the use of both text-based and content-based retrieval methods and, in particular, multi-modal and multi-lingual approaches that investigate the combination of evidence from different modalities and language resources.

ImageCLEF 2010 Wikipedia Collection

The ImageCLEF 2010 Wikipedia collection consists of 237,434 images and associated user-supplied annotations. The collection was built to cover similar topics in English, German and French. Topical similarity was obtained by selecting only Wikipedia articles which have versions in all three languages and are illustrated with at least one image in each version: 44,664 such articles were extracted from the September 2009 Wikipedia dumps, containing a total number of 265,987 images. Since the collection is intended to be freely distributed, we decided to remove all images with unclear copyright status. After this operation, duplicate elimination and some additional cleaning up, the remaining number of images in the collection is 237,434, with the following language distribution:

-English only: 70,127

-German only: 50,291

-French only: 28,461

-English and German: 26,880

-English and French: 20,747

-German and French: 9,646

-English, German and French: 22,899

-Language undetermined: 8,144

-No textual annotation: 239

Two examples that illustrate the images in the collection and their metadata are provided below:




The data are no longer available here; they can be downloaded from the Resources page for the ImageCLEF Wikipedia Image Retrieval Datasets: HERE.

Search Engines:

  • Cross-Modal Search Engine (CMSE by UniGe) that allows you to search the ImageCLEF 2010 Wikipedia image collection through a web interface using text queries, example images or both at once.
  • Multimodal Retrieval (by DUTH), an experimental multimodal search engine, which allows multimedia and multi-language queries, and makes use of the total available information in a multimodal collection. Both the 2010 and the 2011 topics are uploaded and the results are also provided in TREC format.
Evaluation Objectives
The characteristics of the new Wikipedia collection allow for the investigation of the following objectives:

  • how well do the retrieval approaches cope with larger scale image collections?
  • how well do the retrieval approaches cope with noisy and unstructured textual annotations?
  • how well do the content-based retrieval approaches cope with images that cover diverse topics and are of varying quality?
  • how well can systems exploit and combine different modalities given a user's multimedia information need? Can they outperform mono modal approaches like query-by-text or query-by-image?
  • how well can systems exploit the multiple language resources? Can they outperform mono-lingual approaches that use for example only the English text annotations?

The results of Wikipedia Retrieval at ImageCLEF 2010 showed that the best multimedia retrieval approaches outperformed the text-based approaches. To promote research on multi-modal approaches, this year a subtask focused on late-fusion approaches is introduced. In this subtask, which will take place after the announcement of the main task results, all participants have access to text- and content-based runs submitted by other participants and are free to combine them in whatever way they consider suitable in order to obtain multi-modal runs. Similarly to 2010, a second focus will be the effectiveness of multi lingual approaches for multimedia document retrieval.


The topics for ImageCLEF 2011 Wikipedia Retrieval task were developed based on the analysis of an image search engine's logs.

The data are no longer available here; they can be downloaded from the Resources page for the ImageCLEF Wikipedia Image Retrieval Datasets: HERE.

The topics are multimedia queries that can consist of a textual and a visual part. Concepts that might be needed to constrain the results should be added to the title field. An example topic in the appropriate format is the following:


  <number> 1 </number>

  <title xml:lang="en">historic castle <title>

  <title xml:lang="de">historisches schloss<title>

  <title xml:lang="fr">château fort historique<title>

  <image> castle.jpg </image>


Therefore, the topics include the following fields:

  • title: query by keywords in each of the 3 languages (English, French, German)
  • image query by 4 or 5 image examples
  • narrative description of the information need where the definitive definition of relevance and irrelevance are given; this will be provided during the assessment phase.

Retrieval Experiments

Experiments are performed as follows: the participants are given topics, these are used to create a query which is used to perform retrieval on the image collection. This process iterates (e.g. maybe involving relevance feedback) until they are satisfied with their runs. Participants might try different methods to increase the number of relevant in the top N rank positions (e.g., query expansion).

Participants are free to experiment with whatever methods they wish for image retrieval, e.g., query expansion based on thesaurus lookup or relevance feedback, indexing and retrieval on only part of the image caption, different models of retrieval, and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to perform the ad-hoc retrieval, rather than list all of these we ask participants to indicate which of the following applies to each of their runs (we consider these the "main" dimensions which define the query for this ad-hoc task):

Dimension Available Codes
Annotation language EN, DE, FR, EN+DE, EN+FR, FR+DE, EN+FR+DE
Comment YES, NO
Topic language EN, DE, FR, EN+DE, EN+FR, FR+DE, EN+FR+DE
Run type AUTO, MAN
Feedback/expansion FB, QE, FBQE, NOFB
Retrieval type (Modality) IMG, TXT, TXTIMG

Annotation language:
Used to specify the target language (i.e., the annotation set) used for the run: English (EN), German (DE), French (FR) and their combinations.

Used to specify whether the <comment> field in the text annotation has been used: Yes/No.

Topic language:
Used to specify the query language used in the run: English (EN), German (DE), French (FR) and their combinations.

Run type:
We distinguish between manual (MAN) and automatic (AUTO) submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. A nice description on the differences between these types of runs is provided by TRECVID at here

Feedback or Query Expansion:
Used to specify whether the run involves query expansion (QE) or feedback (FB) techniques, both of them (QEFB) or none of them (NOFB).

Retrieval type (Modality):
This describes the use of visual (image) or text features in your submission. A text-only run will have modality textual (TXT) and a purely visual run will have modality visual (IMG). Combined submissions (e.g., an initial text search followed by a possibly combined visual search) will have as modality: text+visual (TXTIMG), also referred to as "mixed".

Topic field:
This specifies the topic fields employed in the run: only the title field of the topic (TITLE); only the example images in the topic (IMG_Q); both the title and image fields (TITLEIMG_Q).


Participants can submit up to 20 system runs. The submission system is now open at the ImageCLEF registration system (select Runs > Submit a Run).

Participants are required to submit ranked lists of (up to) the top 1000 images ranked in descending order of similarity (i.e. the highest nearer the top of the list). The format of submissions for this ad-hoc task is the TREC format. It can be found here.

Please note that there should be at least 1 document entry in your results for each topic (i.e. if your system returns no results for a query then insert a dummy entry, e.g. 25 1 16019 0 4238 xyzT10af5 ). The reason for this is to make sure that all systems are compared with the same number of topics and relevant documents. Submissions not following the required format will not be evaluated.

Information to be provided during submission

  • Method Description: A brief description of the approach employed in the submitted run.
  • Retrieval type: The modality of the employed features: textual, visual, or a combination of both (mixed).
  • Language: The annotation language(s) used: "English", "French", "German", "English & French", "English & German", "French & German", "English, French & German".
  • Run type: Select either Automatic or Manual. Whether the run includes the use of relevance feedback should be specified below.
  • Primary Run: Indicate whether this is the group's primary run.
  • Other information: Specify the remaining aspects of the submitted run by deleting as appropriate:

    Topic language: EN, DE, FR, EN+DE, EN+FR, FR+DE, EN+FR+DE

    Comment used: YES, NO

    Feedback/expansion : FB, QE, FBQE, NOFB

    Topic field: TITLE, IMG_Q, TITLEIMG_Q

  • Additional resources used: Optional field to describe any additional resources employed.
A tentative schedule can be found here:

  • 1.2.2011: registration opens for all ImageCLEF tasks
  • 18.3.2011: data release (images + metadata + article)
  • 21.4.2011: topic release
  • 15.5.2011: registration closes for all ImageCLEF tasks
  • 22.6.2011: submission of runs
  • 26.7.2011: release of results
  • 14.8.2011: submission of working notes papers
  • 19.09.2011-22.09.2011: CLEF 2011 Conference, Amsterdam, The Netherlands