The ImageCLEF 2010 Wikipedia collection consists of 237,434 images and associated user-supplied annotations. The collection was built to cover similar topics in English, German and French. Topical similarity was obtained by selecting only Wikipedia articles which have versions in all three languages and are illustrated with at least one image in each version: 44,664 such articles were extracted from the September 2009 Wikipedia dumps, containing a total number of 265,987 images. Since the collection is intended to be freely distributed, we decided to remove all images with unclear copyright status. After this operation, duplicate elimination and some additional cleaning up, the remaining number of images in the collection is 237,434, with the following language distribution:|
-English only: 70,127
-German only: 50,291
-French only: 28,461
-English and German: 26,880
-English and French: 20,747
-German and French: 9,646
-English, German and French: 22,899
-Language undetermined: 8,144
-No textual annotation: 239
The main difference between the ImageCLEF 2010 Wikipedia collection and the INEX MM collection (Westerveld and van Zwol, 2007) used in the previous WikipediaMM tasks is that the multilingual aspect has been reinforced and both mono- and cross-lingual evaluations can be carried out. Another difference is that this year, participants will receive for each image both its user-provided annotation and also links to the article(s) which contain the image. Finally, in order to encourage multi modal approaches, several types of low-level image features will be provided to participants.
(Popescu et al., 2010) A. Popescu, T. Tsikrika and J. Kludas Overview of the Wikipedia Retrieval Task at ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops) 2010.
(Westerveld and van Zwol, 2007) T. Westerveld and R. van Zwol. The INEX 2006 Multimedia Track. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Advances in XML Information Retrieval:Fifth International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI). Springer-Verlag, 2007.