ImageCLEF 2009 Photo Retrieval Task

The photo retrieval task of ImageCLEF 2009 is intended to provide a further study in the importance of diversity in image search result, which has started being studied since ImageCLEF last year. This year’s task elevates the research scope by using a new data set containing half a million images.

Diversity is highly needed when users put poorly specified or ambiguous queries. Since search engine knows nothing about users’ preference of relevant images, search engine could increase probability of finding relevant images by presenting different images which cover all possible representations within the query. Duplicate images should be reduced, as they are not considered useful by users. Diversity is also needed when users put same queries, but require different sets of results. These reasons are evidences of the importance of diversity, and image search engine should be able to tackle these problems by presenting a diverse result. This task requires participants to analyse the best way to improve diversity while still presenting images with high relevance.

The Task – Promote Diversity

Participants will run each provided topic on their image search system and produce a ranking that in the top 20, holds as many relevant images that are representative of the different sub-topics within the results. The definition of what consitutes diversity will vary across the topics, but there will be a clear indication in the topic, which is defined by the tag "cluster", indicating what the clustering criteria the evaluators will use.

For each topic in the ImageCLEFPhoto set, relevant images will be manually clustered into sub-topics and relevance judgements will be augmented to indicate which cluster an image belongs to. Relevance assessors will be instructed to look for simple clusters based on the form of a topic. For example if a topic asks for images of beaches in Brazil, clusters will be formed based on location; if a topic asks for photos of animals, clusters will be formed based on animal type.

Participating groups will return to us, for each topic, a ranked list of images IDs. We will determine which images are relevant and count how many clusters are represented in the ranking. We do not require you to identify or label clusters in the ranked list how you choose to do the clustering is an internal matter for you.

Evaluation will be based on two measures: precision at 20 and instance recall at rank 20 (also called S-recall), which calculates the percentage of different clusters represented in the top 20. It will be important to maximise both measures: simply getting lots of relevant images from one cluster or filling the ranking with diverse, but non-relevant images, will result in a poor overall effectiveness score.
Note, that it is quite possible to submit runs from a "standard" non-clustering image search system, though we would expect clustering systems to out-perform the standard systems in producing a diverse ranked list in the top 20.

Query Topics

ImageCLEFPhoto 2009 will use new topics developed based on the analysis of Belga query logs in 2008. Therefore, these topics highly represent the real life situation of diversity need in information retrieval. There will be around 50 topics in this year's ImageCLEFPhoto task's evaluation. All topics are in English language and they will have a variable number of clusters, ranging from 2 to 10.

The format of this year's queries would be different to the ones in the previous year, as we decided to eliminate the cluster tag. Instead, we will give cluster titles which represent what the relevant images would be about. Each of the cluster title would have an example image. An example of this year's topics is shown below:

<num> Number: 0 </num>
<title> soccer </title>
<clusterTitle> soccer belgium </clusterTitle>
<clusterDesc> Relevant images contain photographs of the Belgium team in a soccer match. </clusterDesc>
<image> belga38/00704995.jpg </image>
<clusterTitle> spain soccer </clusterTitle>
<clusterDesc> Relevant images contain photographs of the Spain team in a soccer match. </clusterDesc>
<image> belga6/00110574.jpg </image>
<clusterTitle> beach soccer </clusterTitle>
<clusterDesc> Relevant images contain photographs of a soccer beach match. </clusterDesc>
<image> belga33/06278068.jpg </image>
<clusterTitle> italy soccer </clusterTitle>
<clusterDesc> Relevant images contain photographs of the Italy team in a soccer match. </clusterDesc>
<image> belga20/1027435.jpg </image>
<clusterTitle> soccer netherlands </clusterTitle>
<clusterDesc> Relevant images contain photographs of the Netherlands team in a soccer match or the teams in Netherlands' league. </clusterDesc>
<image> belga10/01214810.jpg </image>
<clusterTitle> soccer -belgium -spain -beach -italy -netherlands </clusterTitle>
<clusterDesc> Relevant images contain photographs of any aspects or subtopics of soccer which are not related to the above clusters. </clusterDesc>
<image> belga20/01404831.jpg </image>

The topic above contains five main clusters (shown in blue) and one 'other' cluster (shown in red). This 'other' cluster is created because there are other needs of diversity in the search results, but the numbers of these diverse queries were not big enough to be shown as different clusters. Due to this situation, we created the last cluster to encourage participants to find other images which are relevant to the initial query, but are not included in the main clusters.

Topics Download

There are fifty topics available for this year's photo retrieval task. These topics were constructed based on Belga's query logs in 2008. The coordinators are now developing a document containing how these queries were constructed and this information will be released very soon. You can download the topics by using the links below:

  • Topics - part 1
    All topics in this list adopt the format shown in the previous section, where each query title has several cluster titles, cluster descriptions and their relevant images.
  • Topics - part 2
    In the part two of the topics, only the query title and image examples are given out to the participants. Three relevant images are given out for each query, but this number does not represent the number of clusters in that query. Participants are encouraged to decide on how broad the results should be for each of these topics.

IMPORTANT: Please note that the image examples given in these lists must not be included in the search results.

Data Collections

This year’s ImageCLEFPhoto task will use a new collection, which contains 498,920 images from Belga News Agency, which is an image search engine for news photographs. Each photograph will be up to a maximum of 512 pixels in either width or height, accompanied by a caption composed of English text up to a few sentences in length. Different to the data last year, captions are provided without a specific format to increase the challenges to participants. Caption might contain date and place where image was captured.

Invalid Images

Due to an unknown format, we unfortunately had to eliminate 881 images from the collection. Please update your collection by deleting these invalid images. Thanks to Gao Sheng for the list.

An example of image and its annotations are shown below:

20090126 - DENDERMONDE, BELGIUM: Lots of people pictured during a commemoration for the victims of the knife attack in Sint-Gilles, Dendermonde, Belgium, on Monday 26 January 2009. Last friday 20-Year old Kim De Gelder killed three people, one adult and two childs, in a knife attack at the chidlren's day care center "Fabeltjesland" in Dendermonde. BELGA PHOTO BENOIT DOPPAGNE

Submission Format and Guidelines

The submission format and guidelines are available here.

Important Dates

  • 15 January 2009
  • :Registration opens for Photo Retrieval Task
  • 3 April 2009
  • :Release of data collection and topic examples
    Release of topic examples has been cancelled. Participants, however, can see an example of the new topics in the "Query Topics" section in this web page.
  • 8 May 2009
  • :Topic release
  • 8 June 2009
  • :Submission of runs
  • 5 Aug 2009
  • :Release of Results
  • 23 August 2009
  • :Submission of Working Notes Papers
  • 30 September - 2 October 2009
  • :CLEF Workshop in Corfu, Greece

