The photo retrieval task of ImageCLEF2008 will take a different approach to evaluation by studying image clustering. A good search engine ensures that duplicate or near duplicate documents retrieved in response to a query are hidden from the user. Providing this functionality is particularly important when a user types in a query that is either poorly specified or ambiguous; a common type of query in image search. Given such a query, a search engine that retrieves a diverse, yet relevant set of images is more likely to satisfy its users.
The reason why it's a good idea to promote diversity is because often different people type in the same query but wish to see different results. So if a search engine knows nothing
about the user entering the query, a good strategy for the engine is to produce results that are both diverse and relevant, effectively the engine is spreading its bets on what the user might want to retrieve.
Perhaps surprisingly almost no test collection exists that examines this important aspect of search. ImageCLEF will be the first evaluation campaign to look at this problem in over a decade. In order to make participation in the task as easy as possible, we will use an existing imageCLEF collection, use its topics, and also keep both the topic and run format the same from previous years. (In future years we plan to extend the task to have systems return image clusters and even explore cluster labelling.)
From a sub set of existing topics on the IAPR TC-12 collection,
relevant images will be manually clustered and relevance judgements will be augmented to indicate which cluster an image belongs to. Participants will run the topic sub set on their image search system and produce a ranking that in the top 10, holds relevant images from as many of the clusters as possible.
A version of the collection will be made available that allows
participants to explore cross language aspects of image clustering. In this version, members of the clusters will be captioned in different languages.
Relevance assessors will be instructed to look for simple image
clusters based on the form of a topic. For example if a topic asks for images of beaches in Brazil, clusters will be formed based on location; if a topic asks for photos of animals, clusters will be formed based on the type of animal.
Evaluation will be based on precision at 10 and also on a measure of cluster recall, which calculates the number of different clusters retrieved.
Note, it's quite possible to submit runs from a "standard"
non-clustering image search system, though we would expect clustering systems to out-perform the standard systems.
Participants will need to sign a EULA agreement prior to obtaining the database.