You are here

Revision of ImageCLEF 2009 wikipediaMM task from Tue, 02/24/2009 - 16:14

image header
Introduction
ImageCLEF's wikipediaMM task provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate retrieval approaches in the context of a larger scale and heterogeneous collection of images (similar to those encountered on the Web) that are searched for by users with diverse information needs.

In 2008, ImageCLEF wikipediaMM used the image collection created and employed by the INEX Multimedia (MM) Track (2006-2007). This (INEX MM) wikipedia image collection contains approximately 150,000 images that cover diverse topics of interest. These images are associated with unstructured and noisy textual annotations in English.In this first year of the task, the focus was on monolingual retrieval.

In 2009, ImageCLEF wikipediaMM will use a similar collection of Wikipedia images. More details to be provided soon.

This is an ad-hoc image retrieval task; the evaluation scenario is thereby similar to the classic TREC ad-hoc retrieval task and the ImageCLEF photo retrieval task: simulation of the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance). The goal of the simulation is: given a textual query (and/or sample images and/or concepts) describing a user's (multimedia) information need, find as many relevant images as possible from the Wikipedia image collection.

Any method can be used to retrieve relevant documents. We encourage the use of both concept-based and content-based retrieval methods and, in particular, multimodal approaches that investigate the combination of evidence from different modalities.

Data: Images & Metadata

The (INEX MM) wikipedia image collection consists of approximately 150,000 wikipedia images (in JPEG and PNG formats) provided by wikipedia users. Each image is associated with user-generated alphanumeric, unstructured metadata in English. These metadata usually contain a brief caption or description of the image, the Wikipedia user who uploaded the image, and the copyright information. These descriptions are highly heterogeneous and of varying length. The figure below provides an example image and its associated metadata.


Anne Frank house

Further information about the image collection can be found in:
T. Westerveld and R. van Zwol. The INEX 2006 Multimedia Track. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Advances in XML Information Retrieval:Fifth International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (LNCS/LNAI). Springer-Verlag, 2007.

Data: Image/Text Features & Concepts

Additional sources of information like concepts, image and text features are provided to help participants in the retrieval tasks and to anticipate multi modal approaches.
Further details will be soon provided.
Topics
The topics for the 2009 ImageCLEF wikipediaMM task will include (i) topics previously used in INEX MM and WikipediaMM, (ii) as well as from the ImageCLEF photo tasks and (iii) topics created by this year's task participants.

As an innovation this year we do not 'force' our participants to help developing the topics. But if you wish to participate in the topic creation you can send us your proposals till 31st march. The topic development guideline from last year can be found here.

The topics are multimedia queries that can consist of a textual, visual and a conceptual part, with the latter two parts being optional. An example topic in the appropriate format is the following: <topic>
  •   <number> 1 </number>
  •   <title> cities by night <title>
  •   <concept> building </concept>
  •   <image> http://www.bushland.de/hksky2.jpg </image>
  •   <narrative> I am decorating my flat and as I like photos of cities at night, I would like to find some that I could possibly print into posters. I would like to find photos of skylines or photos that contain parts of a city at night (including streets and buildings).Photos of cities (or the earth) from space are not relevant. </narrative>
  • </topic>
  • Therefore, the topics include the following fields:
    • title: query by keywords
    • concept: query by one or more concepts (optional)
    • image query by one or more images (optional)
    • narrative description of the information need where the definitive definition of relevance and irrelevance are given
    Evaluation Objectives

    The characteristics of the (INEX MM) wikipedia image collection allow for the investigation of the following objectives:
    • how well do the retrieval approaches cope with larger scale image collections?
    • how well do the retrieval approaches cope with noisy and unstructured textual annotations?
    • how well do the content-based retrieval approaches cope with images that cover diverse topics and are of varying quality?
    • how well can systems exploit and combine different modalities given a user's multimedia information need? Can they outperform mono modal approaches like query-by-text, query-by-concept or query-by-image?
    In the context of INEX MM 2006-2007, mainly text-based retrieval approaches have been examined. Here, we hope to attract more visually-oriented approaches and most importantly, multi modal approaches that investigate the combination of evidence from different modalities. The results of WikipediaMM at ImageCLEF 2008 showed that multimedia retrieval approaches outperformed for certain topics the text-based approaches, but globally the retrieval based on text remains unbeaten. The retrieval of multimedia documents will stay in the focus of attention for 2009.
    Schedule
    The tentative schedule can be found here:
    • 1.2.2009: registration opens for all CLEF tasks
    • 15.3.2009: data release (images + metadata)
    • 31.3.2009: topic proposals
    • 15.4.2009: topic release
    • 15.5.2009: submission of runs
    • 15.7.2009: release of results
    • 15.8.2009: submission of working notes papers
    • 30.9-2.10.2009: CLEF workshop in Corfu, Greece
    Organisers

    AttachmentSize
    annefrankdoc.png51.25 KB
    imgfiles3.jpeg59.22 KB