Image 2008: Medical Automatic Image Annotation Task

Following the success of the medical image annotation tasks in 2005, 2006, and 2007 ImageCLEF will offer a medical image annotation task in 2008.

Task Description

The 2008 task will closely follow the protocol of the ImageCLEF 2007 medical image annotation task.

Automatic image annotation or image classification can be an important step when searching for images from a database. Based on the IRMA project a database of 12,089 fully classified radiographs taken randomly from medical routine is made available and can be used to train a classification system. 1,000 radiographs for which classification labels are not available to the participants have to be classified. The aim is to find out how well current techniques can identify image modality, body orientation, body region, and biological system examined based on the images. The results of the classification step can be used for multilingual image annotations as well as for DICOM header corrections.

The difference between this years task and the task of the last year is the distribution of images. To foster using the class hierarchy, the images in the 2008 test set will be mainly from classes which have only few examples of the same class in the training data and thus it will be significantly harder to consider this task as a flat classification task as most of the successful techniques did in 2007. Instead, it is expected that exploiting the hierarchy will lead to large improvments.

The error counting scheme will be similar to that from 2007. A description of the error counting scheme for 2008 is here.

Differences to the error counting scheme from 2007 are:

the error is normalised axis wise, not over the whole code, so that each axis has the same weight
setting a wildcard (*) instead of a 0 is not considered a mistake

A description of the error counting scheme from 2007 is here..

Schedule

Release of training data: April 3, 2008
Release of test data: June 9 2008
Submission of results: June 22, midnight GMT (no extension to this deadline will be given)
Submission site
Release of results and classification of test data: Results were released on July 8 and are available here.

After this schedule we are encouraging participants to run further experiments to learn how far the
hierarchy can be exploited to improve results.

Outline

We will release a dataset of approximately 12,000 images, labelled according to the IRMA code. In June 1,000 test images are released which have to be classified according to the IRMA code. In contrast to past years, this year the test images may have classes that are not fully represented in the training database and thus hierarchical classification techniques will be essential for good performance.

Contrary to previous years, we do NOT have development data this year. If you need data to tune your system you are required to split the training data into training and testing data yourself.
When doing this, please note that the distribution of the classes in the evaluation test data will NOT match the class distribution of the training data. To foster using the hierarchy, the test data will have a large share of images which are badly represented in the training data and fewer images will be in the classes that are well represented in the training data.

Data Download

To be able to download any of these data, you need to be registered for ImageCLEF and use the ImageCLEF 2008 data access login and password which you should have received for this task.

At a later date, the data will be availble from the IRMA Website.

Training data
Labels of the Training data
Tests data
Labels of the test data will be release on July 2 (or slightly later)

Evaluation Tool

28 Apr 2008: A new version of the evaluation tool is available that fixes two bugs

A new version of the evaluation tool is available. Download here!

You can use arbitrary subsets of the training data and evaluate it.
We provide a very short example file (t.t) to test the tool

normal run:

# ./evaluate.py t.t
Filename: t.t Error Score: 0.475 Error rate: 0.833333333333 Classified: 6

verbose run:

./evaluate.py -v t.t
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 62*] 0.1 error count: 0.025000
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 621] 0.2 error count: 0.050000
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 615] 0.8 error count: 0.200000
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 6*5] 0.4 error count: 0.100000
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 6**] 0.4 error count: 0.100000
[1124 - 1124] 0.0 [410 - 410] 0.0 [620 - 620] 0.0 [625 - 625] 0.0 error count: 0.000000
Filename: t.t Error Score: 0.475 Error rate: 0.833333333333 Classified: 6

Organisers

Thomas Deselaers, RWTH Aachen University, Aachen, Germany
Thomas M. Deserno, RWTH Aachen University Hospital, Aachen, Germany

Attachment	Size
hierarchical2008.pdf	77.28 KB
evalscript.tgz	54.34 KB

Navigation

You are here