You are here

Visual Concept Detection and Annotation Task


The test collection of the Visual Concept Detection and Annotation Task 2010 is now freely available here.


The visual concept detection and annotation task is a multi-label classification challenge. It aims at the automatic annotation of a large number of consumer photos with multiple annotations.

The task can be solved by following three different approaches:
1) Automatic annotation with visual information only
2) Automatic annotation with Flickr user tags (tag enrichment)
3) Multi-modal approaches that consider visual information and/or Flickr user tags and/or EXIF information

In all cases the participants are asked to annotate the photos of the test set with a predefined set of keywords (the concepts). This defined set of keywords allows for an automatic evaluation and comparison of the different approaches.

This year the focus of the task lies on the comparison of the strengths and limitations of the different approaches:
* Do multi-modal approaches outperform text only or visual only approaches?
* Which approaches are best for which concepts?
* Can image classifiers scale to the large number of concepts and data?

Categories are for example abstract categories like Family&Friends or Partylife, the time of day (day, night, sunny, …), Persons (no, single, small or big group), Quality (blurred, underexposed …) and Aesthetics. The 53 concepts that were used in the ImageCLEF 2009 benchmark are utilized again, but we plan to extend the number of concepts to around 80. In contrast to the annotations from last year, the new annotations will be obtained via Amazon Mechanical Turk.

Data Sets

The task uses the MIR Flickr 25.000 image dataset for the annotation challenge. The training and test set consist of 8000 and 10000 images, respectively. The annotations are provided as plain txt files.

The MIR Flickr collection supplies all original tag data provided by the Flickr users (further denoted as Flickr user tags). In the collection there are 1386 tags which occur in at least 20 images, with an average total number of 8.94 tags per image. (For comparison: we had a label cardinality of 9 annotations per photo on average for the manual annotations of last year.) These Flickr user tags are made available for the textual and multi-modal approaches. For most of the photos the EXIF data is included and may be used.

Evaluation Measures

The evaluation follows the concept-based and example-based evaluation paradigm. For the concept-based evaluation the Mean Average Precision (MAP) is utilized. This measure showed better characteristics than the EER and AUC in a recent study. For example-based evaluation we apply the example-based F-Measure. Additionally, we extend the Ontology Score of the last year with a different cost map (based on Flickr metadata) and investigate if this adaption can cope with the limitations of the ontology-based score.

How to register for the task

Due to the database restrictions it is necessary to sign a user agreement to get access to the data. Please print the document , sign it and send it via fax to Henning Müller. (For detailed instructions see the explanation in the document). You will receive the username and password to an ftp-account where all data for the task is stored.

ImageCLEF also has its own registration interface . Here you can choose an user name and a password. This registration interface is for example used for the submission of runs. If you already have a login from the ImageCLEF 2009 or the ICPR competition you can migrate it to ImageCLEF 2010 here

Training Set

The training set is now available. We provide manual annotations for 93 concepts (see the list of all concepts). You can access the training set at our ftp with the username and passwort that can be found in the "Detail" view of the Photo Annotation collection in the ImageCLEF registration system. It is available to all participants that signed the license agreement.

Test Set

The test set is now available. You can access the test set at our ftp with the username and password that can be found in the "Detail" view of the Photo Annotation test set collection in the ImageCLEF registration system. It is available to all participants that signed the license agreement.


How to cheat (but please don´t)

Please don´t use the annotation information that is delivered with the MIR Flickr 25.000 image dataset. We renamed all files and trust you that you don´t try to find out the original filenames.

Submission Format

The submission format is equal to the annotation format of the training data, except that you are expected to give some confidence scores for each concept to be present or absent. That means, you have to submit a file containing the same number of columns, but each number can be an arbitrary floating point number between 0 and 1 , where higher numbers denote higher confidence in the presence of a particular concept. Please submit your results in one txt file for all results.
For the example-based evaluation measure, we need a binary mapping of the confidence scores to 0 or 1. Please append binary annotations for all concepts and images below the confidence values.

So, the submission file should look like:
imageID00001 0.34 0.67 0.78 .... 0.7 (altogether 93 float values between 0 and 1)
imageID10000 0.34 0.67 0.78 .... 0.7 (altogether 93 float values between 0 and 1)
imageID00001 0 1 1 0 1 1 1 1 .... 1 (altogether 93 binary values, 0 or 1)
imageID10000 0 1 1 1 1 0 0 1 .... 0 (altogether 93 binary values, 0 or 1)

The submission system performs automatic checks if the submission file is correct. In case not, you will get an error message during upload.

Please note that we restrict the number of runs per group to maximal 5 submissions.


The results for the Visual Concept Detection and Annotation Task are now available. We had submissions of 17 groups with altogether 63 runs. In total there were 45 submissions which use visual only information, 2 submissions which use just textual information and 16 submissions utilizing multi-modal approaches.
We applied three measures to determine the quality of the annotations. One for the evaluation per concept and two for the evaluation per photo. The evaluation per concept was performed with the Mean Average Precision (MAP). The evaluation per example was performed with the example-based F-Measure (F-ex) and the Ontology Score with Flickr Context Similarity costmap (OS-fcs). For the evaluation per concept the provided confidence annotation scores were used, while the evaluation per example was conducted utilizing the binary annotation scores.

On the following sites you can find the results:
* MAP Results
* Results for Example-based Evaluation


Tentative Schedule