You are here

ImageCLEF 2009 lung nodule detection and medical annotation task

In 2009, ImageCLEF will offer two medically oriented tasks that require visual techniques.

Lung nodule detection task


This year will be the inaugural year for the lung nodule detection task.
The goal of this task is to compare the performance of lung nodule detection techniques with a gold standard of manually identified nodules. Runs submitted can be: fully automatic, methods with minimal user interaction and interactive segmentation methods. However, participants requested to provide information about the nature of their runs.


The data for this task will be a subset of the LIDC ( database, a set of images that have been manually annotated for nodules by multiple radiologists. We will provide training, development and test data consisting of CT scans and associated annotation indicating the presence of nodules.

Link to data and submission site: ImageCLEF 2009 Lung Nodule Detection task

Medical Image Annotation Task

ImageCLEF will offer again this year, for the last time, a Medical Image Annotation Task. It will be a survey on the last four years experience and we warmly invite all the participants groups of the last editions to take part to the challenge. We are going to contact an editor to prepare and publish a book describing the most interesting and successful approaches developed during the last years in this task.

Automatic image annotation or image classification can be an important step when searching for images from a database. Automatic techniques able to identify acquisition modality, body orientation, body region, and biological system examined based on the images could be used for multilingual image annotations as well as for DICOM header corrections in medical image acquisition routine.


The training and test image sets are based on the IRMA project as usual.
A database of 12729 fully classified radiographs, taken randomly from medical routine, is made available and can be used to train a classification system. Images are labelled according to four classification label sets considering:

- 57 classes as in 2005 ( 12683 images) + a "clutter" class C (46 images)
- 116 classes as in 2006 ( 12371 images) + a "clutter" class C (358 images)
- 116 IRMA codes as in 2007 ( 12371 images) + a "clutter" class C (358 images)
- 196 IRMA codes as in 2008 ( 12729 images)

In the test phase we will ask to classify about 2000 unlabelled radiographs according to the four different schemes. The final ranking will be defined on the basis of the sum of the error scores evaluated on the four years classification outputs.

We expect classification as in the following example:

imageNo; <2005>; <2006>; <2007>; <2008>
1887; 52; -; -; 1121-117-722-442
1888; 52; -; -; 1121-117-722-442
1891; 17; 14; 1121-115-700-400; 1121-115-700-400
1892; 51; 94; 1121-127-700-500; 1121-127-700-500
1893; 51; 94; 1121-127-700-500; 1121-127-700-500
1894; -; -; -; 1121-127-720-512

- For 2005 and 2006, labels range in 1-57 and 1-116. There is also the possibility to set a wildcard (*) to say "don't know".

- For 2007 and 2008, labels correspond to the complete IRMA code. The wildcard (*) can be used in single positions of the code to declare ignorance in a particular level of the hierarchy.

In the previous list we put "-" in correspondence to images belonging to the class "clutter" C. This is to explain that, whatever is the label assigned to that images, they will not contibute to the error evaluation. This means that even if in the test set there will be images belonging to the clutter class it will NOT be relevant WHICH is the class assigned to those images. Their label WILL NOT influence the error evaluation for 2005, 2006 and 2007.

For 2008 the class clutter does not exist and in the test phase, the label assigned to each image WILL contribute to the error evaluation.

A complete description of the error counting scheme for 2009 is here

We are not going to release a development set. If you need data to tune your system you are required to split the training data into training and development data yourself. When doing this, please note the distribution of the images in the classes of the training set: for 2005, 2006 and 2007 classes have more than 6 images while in 2008 there are classes with 1 to 5 images. Using the hierarchy of the IRMA code would be useful particularly in relation to the 2008 labels because the test data will have a large share of images which are badly represented in the training data and fewer images will be in the classes that are well represented in the training data.

Each group is allowed to submit different runs, but each of them should be based only ONE algorithm which should be optimized to face the four different classification problems. The aim is to understand how each algorithm answers to the increasing number of classes and to the unbalancing. It would also be possible to evaluate which is the best way to exploit the hierarchy.

Data Download

To be able to download any of these data, you need to be registered for ImageCLEF and use the ImageCLEF 2009 data access login and password which you should have received for this task.

Checking the database we came to some questionable images. We please all the participants to discard these images in defining the annotation algorithms.

Note that the classes for the 2008 database pass from 196 to 193 because we are now discarding all the images belonging to the classes 1121-120-450-700, 1121-120-700-400, 1121-490-913-700.

Evaluation Tool

We release two evaluation tools, to calculate

remember that the final ranking will be defined summing up the error score for all the four years.

You can run the scripts on arbitrary subsets of the training data and evaluate the error.
We provide example files (sub_) to test the scripts and readme files which describe the output of the error evaluation on the example files.


You can find here an example of the submission format. The submission deadline is fixed on June 25, midnight GMT.
Submission Site

Tentative Schedule

15.1.2009: registration opens for all CLEF task (done)
18.3.2009: training data and task release (done)
10.6.2009: test data release (done)
25.6.2009: submission of runs (done)
06.7.2009: release of results: they are available here.
23.8.2009: submission of working notes papers
30.9-2.10.2009: CLEF workshop in Corfu, Greece.


Barbara Caputo, Idiap Research Institute, Martigny, Switzerland,
Tatiana Tommasi, Idiap Research Institute, Martigny, Switzerland,
Henning Müller, University and University Hospitals of Geneva, Switzerland,
Thomas M. Deserno, RWTH Aachen University, Medical Informatics, IRMA group,
Jayashree Kalpathy-Cramer, Oregon Health & Science University, Portland, OR, USA,