You are here

ImageCLEF 2009 medical retrieval task


The medical retrieval task of ImageCLEF2009 will use a similar database to 2008 but with a larger number of images.
There will be two types of tasks in 2009, the retrieval of similar images for a precise information need and the retrieval of similar cases (test task). Retrieval of cases is more complex to evaluate and thus we will only provide with a small number of test cases to plan an eventually larger task for 2010.

The data set used contains all images from articles published in Radiology and Radiographics including the text of the captions and a link to the html of the full text articles. Over 70,000 images are currently available in this way.


31.1.2009: inscription for CLEF (done)
1.4.2009: release of the data set (done)
1.5.2009: release of the topics (done)
26.6.2009: submission deadline for runs (done)
31.7.2009: end of the relevance judgment process (done)
10.8.2009: distribution of results to the participants (done)
23.8.2009: submission deadline for working notes papers
29.9.2009: pre-CLEF workshop on visual information retrieval evaluation
30.9.-2.10.2009: CLEF workshop

Data Download

Our database distribution includes an xml file with the image id, the captions of the images, the titles of the journal articles in which the image had appeared and the PubMed ID of the journal article. In addition, a compressed file containing the over 70,000 images will be provided.

The data is now available for download at .
The data includes the data for the five test topics for case-based retrieval.
Please login using the user id and password provided during registration.


25 ad-hoc topic are now available at
Please login using the user id and password provided during registration.
The topics are provided in xml format, with descriptions in English, French and German. 2-4 sample images (in jpeg format) are also provided for each image.

5 case-based topics (numbered 16-30) have been made available at ttp://
The topics are provided in xml format, with captions only in English as this is a test case. 4-5 images per topic are available for the visual retrieval part.
Results in this case are not images but entire research articles. The topics are numbered 26-30 but results need to be submitted in separate files from the image-based topics.

Data Submission

Please ensure that your submissions are compliant with the trec_eval format prior to submission.
We will reject any runs that do not meet the required format.
Also, please note that each group is allowed a maximum of 10 runs for image-based and case-based topics each. The qrels will be distributed among the participants, so
further runs can be evaluated for the working notes papers by the participants.
Do not hesitate to ask if you have questions regarding the trec_eval format.

At the time of submission, the following information about each run will be requested. Please let us know if you would like clarifications on how to classify your runs.

1. What was used for the retrieval: Image, text or mixed (both)
2. Was other training data used?
3. Run type: Automatic, Manual, Interactive
4. Query Language

trec_eval format

The format for submitting results is based on the trec_eval program ( as follows:

1 1 27431 1 0.567162 OHSU_text_1
1 1 27982 2 0.441542 OHSU_text_1
1 1 52112 1000 0.045022 OHSU_text_1
2 1 43458 1 0.9475 OHSU_text_1
25 1 28937 995 0.01492 OHSU_text_1

• The first column contains the topic number, in our case from 1-25 (or 26-30 for the case-based topics)
• The second column is always 1
• The third column is the image identifier without the extension jpg and without any image path (or the full article URL for the case-based topics)
• The fourth column is the ranking for the topic (1-1000)
• The fifth column is the score assigned by the system
• The sixth column is the identifier for the run and should be the same in the entire file

Several key points for submitted runs are:
• The topic numbers should be consecutive from 1-25 (or 26-30 for the case based topics)
• Case-based and image-based topics have to be submitted in separate files
• The score should be in decreasing order (i.e. the image at the top of the list should have a higher score than images at the bottom of the list)
• Up to (but not necessarily) 1000 images can be submitted for each topic.
• Each topic must have at least one image.
• Each run must be submitted in a single file. Files should be pure text files and not be zipped or otherwise compressed.


The results are now available for download at

Please login using the user id and password provided during registration. Additional runs can be evaluated using trec_eval and qrels provided -please contact us if you have any questions regarding this process.