The visual concept detection and annotation task is a multi-label classification challenge. It aims at the automatic annotation of a large number of consumer photos with multiple annotations.
The task can be solved by following three different approaches:
1) Automatic annotation with visual information only
2) Automatic annotation with Flickr user tags (tag enrichment)
3) Multi-modal approaches that consider visual information and/or Flickr user tags and/or EXIF information
In all cases the participants are asked to annotate the photos of the test set with a predefined set of keywords (the concepts). This defined set of keywords allows for an automatic evaluation and comparison of the different approaches.
This year the focus of the task lies in the comparison of the strengths and limitations of the different approaches:
* Do multi-modal approaches outperform text only or visual only approaches?
* Which concepts can be annotated with a better quality with which approach?
* Can image classifiers scale to the large amount of concepts and data?
Categories are for example abstract categories like Family&Friends or Partylife, the time of day (Day, Night, sunny, …), Persons (no, single, small or big group), Quality (blurred, underexposed …) and Aesthetics. The 53 concepts that were used in the ImageCLEF 2009 benchmark are utilized again, but we plan to extent the number of concepts. We will be able to provide the annotations for around 60 concepts. In contrast to the annotations of the 53 concepts from last year, the new annotations will be obtained via Amazon Mechanical Turk.
The task uses the MIR Flickr 25.000 image dataset for the annotation challenge. The training and test set will consist of at most 8000 and 10000 images, respectively. We will try to acquire annotations for the whole dataset. The annotations are provided as plain txt files.
The MIR Flickr collection supplies all original tag data provided by the Flickr users (further denotes as Flickr user tags). In the collection there are 1386 tags which occur in at least 20 images, with an average total number of 8.94 tags per image. (For comparison: we had a label cardinality of 9 annotations per photo in average for the manual annotations in the last year.) These Flickr user tags are made available for the textual and multi-modal approaches. For most of the photos the EXIF data is included and may be used.
The evaluation follows the concept-based and example-based evaluation paradigm. For the concept-based evaluation the Mean Average Precision (MAP) will be utilized. This measure showed better characteristics than the EER and AUC in a recent study. For example-based evaluation we will apply the example-based F-Measure. Additionally we plan to extent the Ontology Score of the last year with a different cost map (based on Flickr distances) and investigate if this adaption can cope with the limitations of the ontology-based score.
Due to the database restrictions it is necessary to sign a user agreement to get access to the data. Please print the document, sign it and send it via fax to Henning Müller. (For detailed instructions see the explanation in the document). You will receive the username and password to a ftp-account where all data for the task is stored.
ImageCLEF also has its own registration interface . Here you can choose an user name and a password. This registration interface is e.g. used for the submission of runs. If you already have a login from the ImageCLEF 2009 or the ICPR competition you can migrate it to ImageCLEF 2010 here
Please don´t use the annotation information that is delivered with the MIR Flickr 25.000 image dataset. We renamed all files and trust you that you don´t try to find out the original filenames.
The submission format is equal to the annotation format of the training data, except that you are expected to give some confidence scores for each concept to be present or absent. That means, you have to submit a file containing the same number of columns, but each number can be an arbitrary floating point number between 0 and 1 , where higher numbers denote higher security regarding the presence of a particular concept. So please submit your results in one txt file for all results.
For the example-based evaluation measure, we need a binary mapping of the confidence scores to 0 or 1. Please provide an additional file in which you define a threshold for each concept. There will be an example for this threshold file in time.
Please note that we restrict the number of runs per group to maximal 5 submissions.
Stefanie Nowak, Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany, Stefanie.Nowak[at]idmt.fraunhofer.de
Mark Huiskes, Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands, mark.huiskes[at]liacs.nl