
The photo retrieval task of ImageCLEF 2009 is intended to provide a further study in the importance of diversity in image search result, which has started being studied since ImageCLEF last year. This year’s task elevates the research scope by using a new data set containing half a million images.
Diversity is highly needed when users put poorly specified or ambiguous queries. Since search engine knows nothing about users’ preference of relevant images, search engine could increase probability of finding relevant images by presenting different images which cover all possible representations within the query. Duplicate images should be reduced, as they are not considered useful by users. Diversity is also needed when users put same queries, but require different sets of results. These reasons are evidences of the importance of diversity, and image search engine should be able to tackle these problems by presenting a diverse result. This task requires participants to analyse the best way to improve diversity while still presenting images with high relevance.
Participants will run each provided topic on their image search system and produce a ranking that in the top 20, holds as many relevant images that are representative of the different sub-topics within the results. The definition of what consitutes diversity will vary across the topics, but there will be a clear indication in the topic, which is defined by the tag "cluster", indicating what the clustering criteria the evaluators will use.
For each topic in the ImageCLEFPhoto set, relevant images will be manually clustered into sub-topics and relevance judgements will be augmented to indicate which cluster an image belongs to. Relevance assessors will be instructed to look for simple clusters based on the form of a topic. For example if a topic asks for images of beaches in Brazil, clusters will be formed based on location; if a topic asks for photos of animals, clusters will be formed based on animal type.
Participating groups will return to us, for each topic, a ranked list of images IDs. We will determine which images are relevant and count how many clusters are represented in the ranking. We do not require you to identify or label clusters in the ranked list how you choose to do the clustering is an internal matter for you.
Evaluation will be based on two measures: precision at 20 and instance recall at rank 20 (also called S-recall), which calculates the percentage of different clusters represented in the top 20. It will be important to maximise both measures: simply getting lots of relevant images from one cluster or filling the ranking with diverse, but non-relevant images, will result in a poor overall effectiveness score.
Note, that it is quite possible to submit runs from a "standard" non-clustering image search system, though we would expect clustering systems to out-perform the standard systems in producing a diverse ranked list in the top 20.
ImageCLEFPhoto 2009 would use new topics developed based on analysis of Belga query logs in 2008. Therefore, these topics highly represent the real life situation of diversity need in information retrieval.
Topics would be defined in the same format as previous years, with cluster tag showing aspects of diversity which should be included in the result. All topics are in English language.
|
Topics will be released on 11 May 2009.
This year’s ImageCLEFPhoto task will use a new collection, which contains 500,000 images from Belga News Agency, which is an image search engine for news photographs. Each photograph will be up to a maximum of 512 pixels in either width or height, accompanied by a caption composed of English text up to a few sentences in length. Different to the data last year, captions are provided without a specific format to increase the challenges to participants. Caption might contain date and place where image was captured.
An example of image and its annotations are shown below:
BELGIUM DENDERMONDE COMMEMORATION KNIFE ATTACK |
| 20090126 - DENDERMONDE, BELGIUM: Lots of people pictured during a commemoration for the victims of the knife attack in Sint-Gilles, Dendermonde, Belgium, on Monday 26 January 2009. Last friday 20-Year old Kim De Gelder killed three people, one adult and two childs, in a knife attack at the chidlren's day care center "Fabeltjesland" in Dendermonde. BELGA PHOTO BENOIT DOPPAGNE |
|
| : | Registration opens for Photo Retrieval Task |
|
| : | Data collection and training topic release |
|
| : | Topic release |
|
| : | Submission of runs |
|
| : | Release of results |
|
| : | Submission of Working Notes Papers |
Primary Contact : Monica Lestari Paramita, Department of Information Studies, University of Sheffield (m.paramita@shef.ac.uk)
Mark Sanderson, Department of Information Studies, University of Sheffield (m.sanderson@shef.ac.uk)
Paul Clough, Department of Information Studies, University of Sheffield (p.d.clough@shef.ac.uk)
We have set up a mailing list: imageclef@sheffield.ac.uk for participants. Please contact Paul Clough to be added to the list.