You are here


Primary tabs


Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines. Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. In this task, we cast the problem of image understanding as a cross-modality matching scenario in which visual content and textual descriptors need to be aligned and concise textual interpretations of medical images are generated. We work on the basis of a large-scale collection of figures from open access bio-medical journal articles (PubMed Central). Each image is accompanied by its original caption, constituting a natural testbed for this image captioning task.


  • 6.2.2017: Training data set is released.
  • 18.10.2016: ImageCLEFCaption Website goes live.

Concept Detection Task

As a first step to automatic image captioning and scene understanding, participating systems are tasked with identifying the presence of relevant biomedical concepts in medical images. Based on the visual image content, this subtask provides the building blocks for the scene understanding step by identifying the individual components from which full captions will be composed.

Caption Prediction Task

On the basis of the concept vocabulary detected in the first subtask as well as the visual information of their interaction in the image, participating systems are tasked with composing coherent captions for the entirety of an image. In this step, rather than the mere coverage of visual concepts, detecting the interplay of visible elements is crucial for recreating the original image caption.


The training set for both subtasks contains 164,614 biomedical images extracted from scholarly articles on PubMed Central.
For the concept detection subtask, a file containing image ID and corresponding UMLS concepts is provided.
For the caption prediction subtask, a file containing image ID - caption pairs is provided.
Additionally, a validation set of 10,000 images is provided for both subtasks.
The test set will contain 10,000 images for both subtasks.

Evaluation methodology

Concept detection

Evaluation is conducted in terms of F1 scores between system predicted and ground truth concepts.
The ground truth for the test set was generated based on the UMLS Full Release 2016AB and the labels obtained using the QuickUMLS concept extraction tool.

Caption prediction

Evaluation is based on BLEU scores, using the following methodology and parameters:

  • The default implementation of the Python NLTK (v3.2.2) (Natural Language ToolKit) BLEU scoring method is used. It is documented here and based on the original article describing the BLEU evaluation method
  • A Python (3.6) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair
  • Each caption is pre-processed in the following way:
    • The caption is converted to lower-case
    • All punctuation is removed an the caption is tokenized into its individual words
    • Stopwords are removed using NLTK's "english" stopword list
    • Stemming is applied using NLTK's Snowball stemmer
  • The BLEU score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences. No smoothing function is used.
  • All BLEU scores are summed and averaged over the number of captions (10'000), giving the final score.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.6.x, on a system where the NLTK (v3.2.2) Python library is installed. The script should be run like this:

/path/to/python3.6 /path/to/candidate/file /path/to/ground-truth/file

Preliminary Schedule

  • 15.11.2016: registration opens for all ImageCLEF tasks (until 22.04.2016)
  • 01.02.2017: development data release starts
  • 15.03.2017: test data release starts
  • 04.05.2017: deadline for submission of runs by the participants
  • 15.05.2017: release of processed results by the task organizers
  • 26.05.2017: deadline for submission of working notes papers by the participants
  • 17.06.2017: notification of acceptance of the working notes papers
  • 01.07.2017: camera ready working notes papers
  • 11.-14.09.2017: CLEF 2017, Dublin, Ireland

Participant registration

Registration for ImageCLEF 2017 is now open and will stay open until at least 21.04.2017. To register please follow the steps below:
Once registered and the signature validated, data access details can be found in the ImageCLEF system -> Collections. Please note that depending on the task, before downloading the data, you may be required for signing some additional data usage agreements. Should you have any questions about the registration process, please contact Mihai Dogariu <dogariu_mihai8(at)>.

Submission instructions

Concept detection

For the submission of the caption detection task we expect the following format:
  • <Figure-ID><whitespace><Concept-ID-1>,<Concept-ID-2>,<Concept-ID-n>
  • 1743-422X-4-12-1-4 C1,C6,C100
  • 1743-422X-4-12-1-3 C89,C374
  • 1743-422X-4-12-1-2 C8374
You need to respect the following constraints:
  • The separator between the figure ID and the concepts has to be a whitespace
  • The separator between the UMLS concepts has to be a comma (,)
  • Each figure ID of the testset must be included in the runfile exactly once

Caption prediction

For the submission of the caption prediction task we expect the following format:
  • <Figure-ID><TAB><description>
  • 1743-422X-4-12-1-4   description of the first image in one single line
  • 1743-422X-4-12-1-3   description of the second image....
  • 1743-422X-4-12-1-2   descrition of the third image...
You need to respect the following constraints:
  • The separator between the figure ID and the description has to be a tabular whitespace
  • Each figure ID of the testset must be included in the runfile exactly once
  • You should not include special characters in the description.


  • When referring to the ImageCLEFcaption 2017 task general goals, general results, etc. please cite the following publication which will be published by September 2017:
    • Carsten Eickhoff, Immanuel Schwall, Alba García Seco de Herrera and Henning Müller. Automatic Biomedical Image Understanding –an Overview of the Medical Image Captioning Task at ImageCLEF 2017 (2017), in: Computerized Medical Imaging and Graphics
    • BibText:
        Title = {Automatic Biomedical Image Understanding –an Overview of the Medical Image Captioning Task at {ImageCLEF} 2017},
        Author = {Eickhoff, Carsten and Schwall, Immanuel and García Seco de Herrera, Alba and M\"uller, Henning},
        Journal = {Computerized Medical Imaging and Graphics},
        Year = {2017}


Join our mailing list: