You are here

ImageCLEFcaption

Welcome

Description
Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines. Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. In this task, we cast the problem of image understanding as a cross-modality matching scenario in which visual content and textual descriptors need to be aligned and concise textual interpretations of medical images are generated. We work on the basis of a large-scale collection of figures from open access bio-medical journal articles (PubMed Central). Each image is accompanied by its original caption, constituting a natural testbed for this image captioning task.

News

  • 6.2.2017: Training data set is released.
  • 18.10.2016: ImageCLEFCaption Website goes live.

Concept Detection Task

As a first step to automatic image captioning and scene understanding, participating systems are tasked with identifying the presence of relevant biomedical concepts in medical images. Based on the visual image content, this subtask provides the building blocks for the scene understanding step by identifying the individual components from which full captions will be composed.

Caption Prediction Task

On the basis of the concept vocabulary detected in the first subtask as well as the visual information of their interaction in the image, participating systems are tasked with composing coherent captions for the entirety of an image. In this step, rather than the mere coverage of visual concepts, detecting the interplay of visible elements is crucial for recreating the original image caption.

Data

The training set for both subtasks contains 164,614 biomedical images extracted from scholarly articles on PubMed Central.
For the concept detection subtask, a file containing image ID and corresponding UMLS concepts is provided.
For the caption prediction subtask, a file containing image ID - caption pairs is provided.
Additionally, a validation set of 10,000 images is provided for both subtasks.
The test set will contain 10,000 images for both subtasks.

Evaluation methodology

Concept detection

Evaluation is conducted in terms of F1 scores between system predicted and ground truth concepts, using the following methodology and parameters:

  • The default implementation of the Python scikit-learn (v0.17.1-2) F1 scoring method is used. It is documented here.
  • A Python (3.x) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT concept sets
  • For each candidate-GT concept set, the y_pred and y_true arrays are generated. They are binary arrays indicating for each concept contained in both candidate and GT set if it is present (1) or not (0).
  • The F1 score is then calculated. The default 'binary' averaging method is used.
  • All F1 scores are summed and averaged over the number of elements in the test set (10'000), giving the final score.

The ground truth for the test set was generated based on the UMLS Full Release 2016AB.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.x, on a system where the scikit-learn (>= v0.17.1-2) Python library is installed. The script should be run like this:

/path/to/python3 evaluate-f1.py /path/to/candidate/file /path/to/ground-truth/file

Caption prediction

Evaluation is based on BLEU scores, using the following methodology and parameters:

  • The default implementation of the Python NLTK (v3.2.2) (Natural Language ToolKit) BLEU scoring method is used. It is documented here and based on the original article describing the BLEU evaluation method
  • A Python (3.6) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair
  • Each caption is pre-processed in the following way:
    • The caption is converted to lower-case
    • All punctuation is removed an the caption is tokenized into its individual words
    • Stopwords are removed using NLTK's "english" stopword list
    • Stemming is applied using NLTK's Snowball stemmer
  • The BLEU score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences. No smoothing function is used.
  • All BLEU scores are summed and averaged over the number of captions (10'000), giving the final score.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.6.x, on a system where the NLTK (v3.2.2) Python library is installed. The script should be run like this:

/path/to/python3.6 evaluate-bleu.py /path/to/candidate/file /path/to/ground-truth/file

Preliminary Schedule

  • 15.11.2016: registration opens for all ImageCLEF tasks (until 22.04.2016)
  • 01.02.2017: development data release starts
  • 15.03.2017: test data release starts
  • 05.05.2017: deadline for submission of runs by the participants
  • 15.05.2017: release of processed results by the task organizers
  • 26.05.2017: deadline for submission of working notes papers by the participants
  • 17.06.2017: notification of acceptance of the working notes papers
  • 01.07.2017: camera ready working notes papers
  • 11.-14.09.2017: CLEF 2017, Dublin, Ireland

Participant registration

Registration for ImageCLEF 2017 is now open and will stay open until at least 21.04.2017. To register please follow the steps below:

Once registered and the signature validated, data access details can be found in the ImageCLEF system -> Collections. Please note that depending on the task, before downloading the data, you may be required for signing some additional data usage agreements. Should you have any questions about the registration process, please contact Mihai Dogariu <dogariu_mihai8(at)yahoo.com>.

Submission instructions

Please note that each group is allowed a maximum of 10 runs per subtask.

Concept detection

For the submission of the concept detection task we expect the following format:

  • <Figure-ID><TAB><Concept-ID-1>,<Concept-ID-2>,<Concept-ID-n>

e.g.:

  • 1743-422X-4-12-1-4 C1,C6,C100
  • 1743-422X-4-12-1-3 C89,C374
  • 1743-422X-4-12-1-2 C8374

You need to respect the following constraints:

  • The separator between the figure ID and the concepts has to be a tabular whitespace
  • The separator between the UMLS concepts has to be a comma (,)
  • A maximum of 50 UMLS concepts per figure is accepted
  • Each figure ID of the testset must be included in the runfile exactly once (even if there are not concepts)
  • The name of the run file has to start with "DET"


Caption prediction

For the submission of the caption prediction task we expect the following format:

  • <Figure-ID><TAB><description>

e.g.:

  • 1743-422X-4-12-1-4   description of the first image in one single line
  • 1743-422X-4-12-1-3   description of the second image....
  • 1743-422X-4-12-1-2   descrition of the third image...
  • The name of the run file has to start with "PRED"

You need to respect the following constraints:

  • The separator between the figure ID and the description has to be a tabular whitespace
  • Each figure ID of the testset must be included in the runfile exactly once
  • You should not include special characters in the description.

Results

DISCLAIMER : The results presented below have not yet been analyzed
in-depth and are shown "as is".
Due to differences in the methods used by different groups, the results are shown
in 3 different rankings:
  • 1 for runs where no external resources were used
  • 1 for runs where external resources were used but it is certain that none of the test data was included
  • 1 for runs that used external resources which may include parts of the test data

The tables and rankings will be updated as new information is provided on the
methods used in the various runs.

Caption Prediction - No External Resources Used
Group name Run Run Type Mean BLEU score Rank
NLM 1494038340934__PRED_run_4_CNN_comb.txt Automatic 0.2247 1
NLM 1494038056289__PRED_run_3_CNN_239.txt Automatic 0.1384 2
NLM 1494037493960__PRED_run_2_CNN_92.txt Automatic 0.1131 3
Caption Prediction - External Resources Used, No Test Data Included
Group name Run Run Type Mean BLEU score Rank
NLM 1495446212270__PRED_X_Caption_run_1_baseline.txt Automatic 0.2646 1
Caption Prediction - External Resources Used, Test Data Potentially Included
Group name Run Run Type Mean BLEU score Rank
NLM 1494014231230__PRED_run_1_OpeniMethod.txt Automatic 0.5634 1
NLM 1494081858362__PRED_run_5_comb_all.txt Automatic 0.3317 2
Caption Prediction - Unknown
Group name Run Run Type Mean BLEU score Rank
AILAB 1493825734124__PRED_prna_run4.txt Automatic 0.3211 1
AILAB 1493824027725__PRED_prna_run1.txt Automatic 0.2638 2
isia 1493921574200__PRED test_13_svm_3_nn_dist_25_normal_noUNK Automatic 0.2600 3
isia 1493666388885__PRED test_5_svm_nn_dist_3000_nounk_modified_2 Automatic 0.2507 4
isia 1493922473076__PRED test_12_svm_3_nn_dist_25_normal Automatic 0.2454 5
isia 1494002110282__PRED test_11_svm_2_nn_dist_25_normal_noUNK Automatic 0.2386 6
isia 1493922527122__PRED test_10_svm_2_nn_dist_25_normal Automatic 0.2315 7
isia 1493831729114__PRED test_9_svm_three_nn_3000_noUNK Automatic 0.2240 9
isia 1493745561070__PRED test_6_svm_three_parts Automatic 0.2193 10
isia 1493715950351__PRED test_2_svm_two Automatic 0.1953 11
isia 1493528631975__PRED test_1_wc5sl70 Automatic 0.1912 12
AILAB 1493825504037__PRED_prna_run3.txt Automatic 0.1801 13
isia 1493831517474__PRED test_8_svm_two_remove_UNK Automatic 0.1684 14
AILAB 1493824818237__PRED_prna_run2.txt Automatic 0.1107 17
BMET 1493702564824__PRED_merge_01.txt Automatic 0.0982 18
BMET 1493698682901__PRED_3layer_998981.txt Automatic 0.0851 19
BMET 1494020619666__PRED_437805.txt Automatic 0.0826 20
Biomedical Computer Science Group 1493885614229__PRED_BCSG_Sub09.csv Automatic 0.0749 21
Biomedical Computer Science Group 1493885575289__PRED_BCSG_Sub08.csv Automatic 0.0675 22
BMET 1493701062845__PRED_1499176.txt Automatic 0.0656 23
Biomedical Computer Science Group 1493885210021__PRED_BCSG_Sub01.csv Automatic 0.0624 24
Biomedical Computer Science Group 1493885397459__PRED_BCSG_Sub04.csv Automatic 0.0537 25
Biomedical Computer Science Group 1493885352146__PRED_BCSG_Sub03.csv Automatic 0.0527 26
Biomedical Computer Science Group 1493885286358__PRED_BCSG_Sub02.csv Automatic 0.0411 27
Biomedical Computer Science Group 1493885541193__PRED_BCSG_Sub07.csv Automatic 0.0375 28
Biomedical Computer Science Group 1493885499624__PRED_BCSG_Sub06.csv Automatic 0.0365 29
Biomedical Computer Science Group 1493885708424__PRED_BCSG_Sub10.csv Automatic 0.0326 30
Biomedical Computer Science Group 1493885450000__PRED_BCSG_Sub05.csv Automatic
0.0200 31

 

Concept Detection - No External Resources Used
Group Name Run Run Type Mean F1 Score Rank
Aegean AI Lab 1491857120689__DET_ConceptDetectionTesting2017-results.txt Automatic 0.1583 1
Information Processing Laboratory 1494006128917__DET_LFS_PKNN_DSIFT_GBOC Automatic 0.1436 2
Information Processing Laboratory 1494006074473__DET_LFS_PKNN_CEDD4x4_DSIFT_GBOC Automatic 0.1418 3
Information Processing Laboratory 1494009510297__DET_LFS_RWR_DSIFT_GBOC Automatic 0.1417 4
Information Processing Laboratory 1494006054264__DET_LFS_PKNN_FCTH4x4_DSIFT_GBOC Automatic 0.1415 5
Information Processing Laboratory 1494009412127__DET_LFS_RWR_CEDD4x4_DSIFT_GBOC Automatic 0.1414 6
Information Processing Laboratory 1494009455073__DET_LFS_RWR_FCTH4x4_DSIFT_GBOC Automatic 0.1394 7
Information Processing Laboratory 1494006225031__DET_RWR_DSift_Top100_L2_SqrtNorm_L1Norm.txt Automatic 0.1365 8
Information Processing Laboratory 1494006181689__DET_PKNN_DSift_Top100_L2_SqrtNorm_L1Norm.txt Automatic 0.1364 9
Information Processing Laboratory 1494006414840__DET_RWR_gboc_Top100_L2_SqrtNorm_L1Norm.txt Automatic 0.1212 10
Information Processing Laboratory 1494006360623__DET_PKNN_gboc_Top100_L2_SqrtNorm_L1Norm.txt Automatic 0.1208 11
MEDGIFT UPB 1496826981029__DET_CORRECTED_medgift_baseline.txt Automatic 0.0893 12
NLM 1494013963830__DET_run_8_comb1_CNN2.txt Automatic 0.0880 13
NLM 1494014008563__DET_run_9_comb2_CNN2Meka.txt Automatic 0.0868 14
NLM 1494013621939__DET_run_6_CNN_GoogLeNet_92Cuis.txt Automatic 0.0811 15
NLM 1494013664037__DET_run_7_CNN_GoogLeNet_239Cuis.txt Automatic 0.0695 16
mami 1496127572481__DET_CORRECTED_mami_resulat.txt Feedback or/and human assistance 0.0462 17
MEDGIFT UPB 1493803509469__DET_ResNet152_SCEL_t_0.06.txt Automatic 0.0028 18
NLM 1494012725738__DET_run_5_Meka_CEDD.txt Automatic 0.0012 19
mami 1493631868847__DET_submisionlotof0.txt Feedback or/and human assistance 0.0000 20
Concept Detection - External Resources Used, No Test Data Included
Group Name Run Run Type Mean F1 Score Rank
NLM 1495446212270__DET_X_Concept_run_1_baseline.txt Automatic 0.0162 1
Concept Detection - External Resources Used, Test Data Potentially Included
Group Name Run Run Type Mean F1 Score Rank
NLM 1494012568180__DET_run_1_openI_MetaMapLite_1.txt Automatic 0.1718 1
NLM 1494012586539__DET_run_2_openI_MetaMapLite_2.txt Automatic 0.1648 2
NLM 1494014122269__DET_run_10_comb3_CNN2MekaOpenI.txt Automatic 0.1390 3
NLM 1494012605475__DET_run_3_openI_MetaMapLite_3.txt Automatic 0.1228 4
Concept Detection - Unknown
Group Name Run Run Type Mean F1 Score Rank
AILAB 1493823116836__DET_prna_run1_processed.txt Automatic 0.1208 13
BMET 1493791786709__DET_merge_01.txt Automatic 0.0958 15
BMET 1493791318971__DET_3616832.txt Automatic 0.0880 16
BMET 1493698613574__DET_958069.txt Automatic 0.0838 19
Morgan CS 1494060724020__DET_Morgan_result_concept_from_train_Kmean300_top15.csv Manual 0.0498 22
BioinformaticsUA 1493841144834__DET_0503192045.txt Not applicable 0.0488 23
BioinformaticsUA 1493995613907__DET_0504234124-0.txt Not applicable 0.0463 24
Morgan CS 1494049613114__DET_Morgan_result_concept_from_val_Kmean50_top15.csv Not applicable 0.0461 25
Morgan CS 1494048615677__DET_Morgan_result_concept_from_train_Kmean_top20.csv Not applicable 0.0434 26
BioinformaticsUA 1493976564810__DET_0505041340-0.txt Not applicable 0.0414 27
Morgan CS 1494048330426__DET_Morgan_result_concept_from_CBIR.csv Automatic 0.0273 28
AILAB 1493823633136__DET_prna_run2_processed.txt Automatic 0.0234 29
AILAB 1493823760708__DET_prna_run3_processed.txt Automatic 0.0215 30

Citations

  • When referring to the ImageCLEFcaption 2017 task general goals, general results, etc. please cite the following publication which will be published by September 2017:
    • Carsten Eickhoff, Immanuel Schwall, Alba García Seco de Herrera and Henning Müller. Overview of ImageCLEFcaption 2017 - the Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images, CLEF working notes, CEUR, 2017.
    • BibTex:

      @Inproceedings{ImageCLEFoverview2017,
        author = {Eickhoff, Carsten and Schwall, Immanuel and Garc\'ia Seco de Herrera, Alba and M\"uller, Henning},
        title = {Overview of {ImageCLEFcaption} 2017 - the Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images},
        booktitle = {CLEF2017 Working Notes},
        series = {{CEUR} Workshop Proceedings},
        year = {2017},
        volume = {},
        publisher = {CEUR-WS.org $<$http://ceur-ws.org$>$},
        pages = {},
        month = {September 11-14},
        address = {Dublin, Ireland},

      }

Contact

Join our mailing list: https://groups.google.com/d/forum/imageclefcaption

Acknowledgements