You are here

ImageCLEFmed Caption

Welcome to the 4th edition of the Caption Task!



Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines.

Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. The more image characteristics are known, the more structured are the radiology scans and hence, the more efficient are the radiologists regarding interpretation. We work on the basis of a large-scale collection of figures from open access biomedical journal articles (PubMed Central). All images in the training data are accompanied by UMLS concepts extracted from the original image caption. 

Lessons learned:

  • In the first and second editions of this task, held at ImageCLEF 2017 and ImageCLEF 2018, participants noted a broad variety of content and situation among training images. In 2019, the training data was reduced solely to radiology images
  • The focus of the ImageCLEF 2020 is on radiology images, with additional imaging modality information, for pre-processing purposes and multi-modal approaches
  • A large number of concepts were used in the previous years. This year, the captions are first processed before concept extraction, hence leading to a reduced number of concepts.
  • Concepts with less occurrence will be removed
  • As uncertainty regarding additional source was noted, we will clearly separate systems using exclusively the official training data from those that incorporate additional sources of evidence


  • 12.11.2019: website goes live
  • 31.01.2020: developement dataset is released on AICrowd
  • 30.03.2020: test dataset is released on AICrowd
  • 23.04.2020:preliminary schedule extended

Task Description

Concept Detection Task

The first step to automatic image captioning and scene understanding is identifying the presence and location of relevant concepts in a large corpus of medical images. Based on the visual image content, this subtask provides the building blocks for the scene understanding step by identifying the individual components from which captions are composed. The concepts can be further applied for context-based image and information retrieval purposes.

Evaluation is conducted in terms of set coverage metrics such as precision, recall, and combinations thereof. This task will be run using a subset of the extended Radiology Objects in COntext (ROCO) dataset [1], with additinational iamging modality information.


From the PubMed Open Access subset containing 1,828,575 archives, a total number of 6,031,814
image - caption pairs were extracted. To focus on radiology images and non-compound figures, automatic filtering with deep learning systems as well as manual revisions were applied. In ImageCLEF 2020, additional information regarding the modalities of all 80,747 images will be distributed.

NOTE: If the usage of an additional source for training is intended, it should not be a subset of PubMed Central Open Access (archiving date: 01.02.2019 - 15.02.2020), to avoid an overlap with the test data.

Evaluation Methodology

Evaluation is conducted in terms of F1 scores between system predicted and ground truth concepts, using the following methodology and parameters:

  • The default implementation of the Python scikit-learn (v0.17.1-2) F1 scoring method is used. It is documented here.
  • A Python (3.x) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT concept sets
  • For each candidate-GT concept set, the y_pred and y_true arrays are generated. They are binary arrays indicating for each concept contained in both candidate and GT set if it is present (1) or not (0).
  • The F1 score is then calculated. The default 'binary' averaging method is used.
  • All F1 scores are summed and averaged over the number of elements in the test set (10'000), giving the final score.

The ground truth for the test set was generated based on the UMLS Full Release 2019AB.

NOTE: The source code of the evaluation tool is available here. It must be executed using Python 3.x, on a system where the scikit-learn (>= v0.17.1-2) Python library is installed. The script should be run like this:

/path/to/python3 /path/to/candidate/file /path/to/ground-truth/file

Participant registration

Please refer to the general ImageCLEF registration instructions

Preliminary Schedule

  • 31.01.2020: development data release starts
  • 27.03.2020: test data release starts
  • 11.05.2020 05.06.2020: deadline for submitting the participants runs
  • 18.05.2020 12.06.2020: release of the processed results by the task organizers
  • 25.05.2020 10.07.2020: deadline for submission of working notes papers by the participants
  • 15.06.2020 07.08.2020: notification of acceptance of the working notes papers
  • 29.06.2020 21.08.2020: camera ready working notes papers
  • 22-25.09.2020: CLEF 2020, Thessalonik, Greece

Please refer to the general ImageCLEF registration instructions

Submission Instructions

Please note that each group is allowed a maximum of 10 runs per subtask.

For the submission of the concept detection task we expect the following format:

  • <Figure-ID><TAB><Concept-ID-1>;<Concept-ID-2>;<Concept-ID-n>


  • ROCO_CLEF_41341 C0033785;C0035561
  • ROCO_CLEF_07563 C0043299;C1306645;C1548003;C1962945

You need to respect the following constraints:

  • The separator between the figure ID and the concepts has to be a tabular whitespace
  • The separator between the UMLS concepts has to be a semicolon (;)
  • Each figure ID of the test set must be included in the submitted file exactly once (even if there are not concepts)
  • The same concept cannot be specified more than once for a given figure ID
  • The maximum number of concepts per image is 100


Group Name Submission Run F1 Score Rank
AUEB_NLP_Group InterceptCheXNetCheckpoints.csv 0.394008918511068 1
AUEB_NLP_Group BestOf.csv 0.393315952596593 2
PwC_MedCaption_2020 folderwise_KNN_resnet101_test_pred.csv 0.392385594508549 3
PwC_MedCaption_2020 combined_test_pred_v1.csv 0.388937751196626 4
PwC_MedCaption_2020 folder_wise_test_pred_v1.csv 0.388937751196626 5
AUEB_NLP_Group UnionCheXNetCheckpoints.csv 0.386955997558695 6
essexgp2020 submit_run3.csv 0.380778265751415 7
essexgp2020 submit_run5.csv 0.380465977377446 8
essexgp2020 submit_run1.csv 0.379667212793968 9
essexgp2020 cp99_all_modified.txt 0.378518454747941 10
essexgp2020 c99_all_man.txt 0.377697817205294 11
iml imageclefmed2020-test-vgg16-f1-bce-nomissing-iml.txt 0.374525478882926 12
iml imageclefmed2020-test-vgg16-f1-bce-iml.txt 0.374402134956526 13
PwC_MedCaption_2020 combined_test_pred_new.csv 0.368091961270053 14
PwC_MedCaption_2020 NLP_clusters_test_pred.csv 0.366817554326238 15
PwC_MedCaption_2020 knn_t117_test_pred.csv 0.366611908483629 16
iml imageclefmed2020-test-resnet50-iml.txt 0.365168555515581 17
iml imageclefmed2020-test-vgg16-iml.txt 0.363067945861981 18
iml imageclefmed2020-test-densenet169-iml.txt 0.360156086299303 19
TUC_MC model_thr0_18.csv 0.351209087821515 20
TUC_MC streamlined1_thr0_25.csv 0.34863172975173 21
TUC_MC streamlined1_thr0_20.csv 0.348603019442078 22
TUC_MC streamlined1.csv 0.348603019442078 23
TUC_MC basemodel_thr0_20.csv 0.347419093345578 24
TUC_MC model_low_lr_thr0_20.csv 0.345465313853767 25
essexgp2020 submit_run2.csv 0.344923744230078 26
TUC_MC streamlined1_nomax.csv 0.344773997122607 27
TUC_MC basemodel.csv 0.343469093000168 28
TUC_MC streamlined1_thr0_12.csv 0.342252692871525 29
PwC_MedCaption_2020 f1_band_test_t025_pred.csv 0.33791900466725 30
essexgp2020 cp98_all.txt 0.336938604100577 31
TUC_MC model_weighting.csv 0.332499200648449 32
PwC_MedCaption_2020 NLP_test_pred_fixed.csv 0.316288624915404 33
essexgp2020 canberra_all_modified.txt 0.280402036768753 34
PwC_MedCaption_2020 combined_wo_folder_test.csv 0.26548478858872 35
essexgp2020 cp95_all.txt 0.24594203686794 36
Morgan_CS MSU_dense_fcn.txt 0.167327401832357 37
Morgan_CS MSU_dense_fcn_4.txt 0.159077515631665 38
Morgan_CS MSU_dense_resnet_fcn_1.txt 0.15340326842221 39
Morgan_CS MSU_dense_resnet_fcn_1.txt 0.14467291414794 40
Morgan_CS MSU_dense_feat.txt 0.139523355736718 41
saradadevi captions_output.txt 0.134677453646311 42
Morgan_CS MSU_dense_feat.txt 0.128436832871501 43
Morgan_CS MSU_dense_fcn_2.txt 0.0943443294401692 44
Morgan_CS MSU_dense_fcn_3.txt 0.0894177582060959 45
Morgan_CS MSU_autoenc_fcn.txt 0.0633624177027157 46
Morgan_CS MSU_lstm_dense_fcn.txt 0.0624977988457857 47

CEUR Working Notes

  • All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
  • The working notes paper should be submitted using this link:
    and select track "ImageCLEF - Multimedia Retrieval in CLEF".
    Add author information, paper title/abstract, keywords, select "Task 3 - ImageCLEFmedical" and upload your working notes paper as pdf.
  • The working notes are prepared using the LNCS template available at:

    However, CEUR-WS asks to include the following copyright box in each paper:

    Copyright c 2020 for this paper by its authors. Use permitted under
    Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CLEF 2020, 22-25 September 2020, Thessaloniki, Greece.

    To facilitate authors, we have prepared a LaTex template you can download at:


When referring to the ImageCLEFmed 2020 concept detection task general goals, general results, etc. please cite the following publication:

  • Obioma Pelka, Christoph M. Friedrich, Alba García Seco de Herrera and Henning Müller. Overview of the ImageCLEFmed 2020 Concept Prediction Task: Medical Image Understanding. CEUR Workshop Proceedings (CEUR-, ISSN $$
  • BibTex:
    author = {Pelka, Obioma and Friedrich, Christoph M and Garc\'ia Seco de Herrera, Alba and M\"uller, Henning},
    title = {Overview of the {ImageCLEFmed} 2020 Concept Prediction Task: Medical Image Understanding},
    booktitle = {CLEF2020 Working Notes},
    series = {{CEUR} Workshop Proceedings},
    year = {2020},
    volume = {1166},
    publisher = { $<$$>$},
    month = {September 22-25},
    address = {Thessaloniki, Greece}

When referring to the ImageCLEF 2020 lab general goals, general results, etc. please cite the following publication (also referred to as ImageCLEF general overview):

  • Bogdan Ionescu, Henning Müller, Renaud Péteri, Asma Ben Abacha, Vivek Datla, Sadid A. Hasan, Dina Demner-Fushman, Serge Kozlovski, Vitali Liauchuk, Yashin Dicente Cid, Vassili Kovalev, Obioma Pelka, Christoph M. Friedrich, Alba García Seco de Herrera, Van-Tu Ninh, Tu-Khiem Le, Liting Zhou, Luca Piras, Michael Riegler, Pål Halvorsen, Minh-Triet Tran, Mathias Lux, Cathal Gurrin, Duc-Tien Dang-Nguyen, Jon Chamberlain, Adrian Clark, Antonio Campello, Dimitri Fichou, Raul Berari, Paul Brie, Mihai Dogariu, Liviu Daniel Ștefan, Mihai Gabriel Constantin, Overview of the ImageCLEF 2020: Multimedia Retrieval in Medical, Lifelogging, Nature, and Internet Applications In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF 2020), Thessaloniki, Greece, LNCS Lecture Notes in Computer Science, 12260, Springer (September 22-25, 2020).
  • BibTex:
      author = {Bogdan Ionescu and Henning M\"uller and Renaud P\'{e}teri and Asma Ben Abacha and Vivek Datla and Sadid A. Hasan and Dina Demner-Fushman and Serge Kozlovski and Vitali Liauchuk and Yashin Dicente Cid and Vassili Kovalev and Obioma Pelka and Christoph M. Friedrich and Alba Garc\'{\i}a Seco de Herrera and Van-Tu Ninh and Tu-Khiem Le and Liting Zhou and Luca Piras and Michael Riegler and P\aa l Halvorsen and Minh-Triet Tran and Mathias Lux and Cathal Gurrin and Duc-Tien Dang-Nguyen and Jon Chamberlain and Adrian Clark and Antonio Campello and Dimitri Fichou and Raul Berari and Paul Brie and Mihai Dogariu and Liviu Daniel \c{S}tefan and Mihai Gabriel Constantin},
      title = {{Overview of the ImageCLEF 2020}: Multimedia Retrieval in Medical, Lifelogging, Nature, and Internet Applications},
      booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction},
      series = {Proceedings of the 11th International Conference of the CLEF Association (CLEF 2020)},
      year = {2020},
      volume = {12260},
      publisher = {{LNCS} Lecture Notes in Computer Science, Springer},
      pages = {},
      month = {September 22-25},
      address = {Thessaloniki, Greece}


  • Obioma Pelka <obioma.pelka(at)>, University of Applied Sciences and Arts Dortmund, Germany
  • Christoph M. Friedrich <christoph.friedrich(at)>, University of Applied Sciences and Arts Dortmund, Germany
  • Alba García Seco de Herrera <alba.garcia(at)>,University of Essex, UK
  • Henning Müller <henning.mueller(at)>, University of Applied Sciences Western Switzerland, Sierre, Switzerland

Join our mailing list:
Follow @imageclef


[1] O. Pelka, S. Koitka, J. Rückert, F. Nensa und C. M. Friedrich „Radiology Objects in COntext (ROCO): A Multimodal Image Dataset“, Proceedings of the MICCAI Workshop on Large-scale Annotation of Biomedical data and Expert Label Synthesis (MICCAI LABELS 2018), Granada, Spain, September 16, 2018, Lecture Notes in Computer Science (LNCS) Volume 11043, Page 180-189, DOI: 10.1007/978-3-030-01364-6_20, Springer Verlag, 2018.