You are here

Visual Question Answering in the Medical Domain

Welcome to the 3rd edition of the Medical Domain Visual Question Answering Task!



With the increasing interest in artificial intelligence (AI) to support clinical decision making and improve patient engagement, opportunities to generate and leverage algorithms for automated medical image interpretation are currently being explored. Since patients may now access structured and unstructured data related to their health via patient portals, such access also motivates the need to help them better understand their conditions regarding their available data, including medical images.

The clinicians' confidence in interpreting complex medical images could also be enhanced by a “second opinion” provided by an automated system. In addition, patients may be interested in the morphology/physiology and disease-status of anatomical structures around a lesion that has been well characterized by their healthcare providers – and they may not necessarily be willing to pay significant amounts for a separate office- or hospital visit just to address such questions. Although patients often turn to search engines (e.g. Google) to disambiguate complex terms or obtain answers to confusing aspects of a medical image, results from search engines may be nonspecific, erroneous and misleading, or overwhelming in terms of the volume of information.


  • 10/04/2020: VQA VQG 
  • 25/02/2020: Training and validation datasets released.
  • 31/01/2020: AICrowd projects VQA VQG 
  • 10/12/2019: VQA-Med website goes live.

Tasks Description

In continuation of the two previous editions, this year’s task on visual question answering (VQA) consists in answering natural language questions from the visual content of associated radiology images. This year, we will focus particularly on questions about abnormalities.

The visual question generation (VQG) task is introduced for the first time in this third edition of the VQA-Med challenge. The task consists in generating relevant natural language questions about radiology images using their visual content.

  • VQA-Med 2020 Overview Paper:
  • Github Project (data & evaluation code):
  • Schedule

    • 25 February 2020: Release of the training and validation datasets
    • 10 April 2020: Release of the test sets 
    • 05 June 2020: Run submission deadline
    • 15 June 2020: Release of the processed results by the task organizers
    • 10 July 2020: Deadline for submitting the working notes papers by the participants
    • 07 August 2020: Notification of acceptance of the working notes papers
    • 21 August 2020: Camera-ready copy of the working notes 
    • 22-25 September 2020: The CLEF conference will be online

    Participant Registration

    Please refer to the general ImageCLEF registration instructions

    Tasks and Data

    Task 1: Visual Question Answering (VQA)

    The datasets are available on the AICrowd project under the “Resources” tab:

    • The training set includes 4,000 radiology images with 4,000 associated Question-Answer (QA) pairs.
    • The validation set includes 500 radiology images with 500 QA pairs.
    • The VQA test set includes 500 radiology images with 500 associated questions.
    • The VQA-Med-2019 datasets could be used as additional training data:

    Task 2: Visual Question Generation (VQG)

    The datasets are available on the AICrowd project under the “Resources” tab:

    • The training set: 780 radiology images with 2,156 associated questions.
    • The validation set: 141 radiology images with 164 questions.
    • The VQG test set: 80 radiology images.
    • NB. We provided the answers as additional annotations if needed to train the VQG systems, we will NOT provide the answers in the test set.
    • Test set: Participants will be tasked with generating distinct questions that are relevant to the visual content of the test images (minimum 1 and maximum 7 questions for each test image).

    The VQA-Med-2019 and VQA-Med-2018 datasets could be used as additional training data:

    Evaluation Methodology

    The following preprocessing methodology would be applied before running the evaluation metrics on each answer for the visual question answering task:

    • Each answer is converted to lower-case
    • All punctuations are removed and the answer is tokenized to individual words

    The evaluation would be conducted based on the following metrics:

    1. Accuracy (Strict)
      We use an adapted version of the accuracy metric from the general domain VQA task that considers exact matching of a participant provided answer and the ground truth answer.
    2. BLEU
      We use the BLEU metric [1] to capture the similarity between a system-generated answer and the ground truth answer. The overall methodology and resources for the BLEU metric are essentially similar to the ImageCLEF 2017 caption prediction task.
    3. References:

      [1] Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.

      Submission Instructions

      Task 1: Visual Question Answering

    • Each team is allowed to submit a maximum of 5 runs.
    • We expect the following format for the result submission file: <Image-ID><|><Answer>

      For example:

      rjv03401|answer of the first question in one single line
      AIAN-14-313-g002|answer of the second question
      wjem-11-76f3|answer of the third question

    • You need to respect the following constraints:

      • The separator between <Image-ID> and <Answer> has to be the pipe character (|).
      • Each <Image-ID> of the test set must be included in the run file exactly once.
      • All <Image-ID> must be present in a participant’s run file in the same order as the given test file.

    Task 2: Visual Question Generation

    • Each team is allowed to submit a maximum of 5 runs.
    • Each run can include from 1 to 7 questions per image.
    • We expect the following format for the result submission file: <Image-ID><|><Question1><|><Question2><|>...<|><QuestionN> (N≤7)

    For both tasks, participants are allowed to use other resources asides from the official training/validation datasets, however, the use of the additional resources must have to be explicitly stated. For a meaningful comparison, we will separately group systems who exclusively use the official training data and who incorporate additional sources.

    CEUR Working Notes

    All participating teams with at least one graded submission should submit a CEUR working notes paper.

    All the papers have to be submitted at:
    ("ImageCLEF - Multimedia Retrieval in CLEF" track)

    The working notes are prepared using the LNCS template available at:
    However, CEUR-WS asks to include the following copyright box in each paper:
    Copyright c 2020 for this paper by its authors. Use permitted under
    Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CLEF 2020, 22-25 September 2020, Thessaloniki, Greece.

    To facilitate authors, we have prepared a LaTex template you can download at:


    When referring to the ImageCLEF VQA-Med 2020 task general goals, evaluation, dataset, results, etc. please cite the following publication which will be published by September 2020:

    author = {Asma {Ben Abacha} and Vivek V. Datla and Sadid A. Hasan and Dina Demner-Fushman and Henning M\"uller},
    title = {Overview of the VQA-Med Task at ImageCLEF 2020: Visual Question Answering and Generation in the Medical Domain}, 
    booktitle = {CLEF 2020 Working Notes},
    series = {{CEUR} Workshop Proceedings},
    year = {2020},
    volume = {},
    publisher = {},
    pages = {},
    month = {September 22-25},
    address = {Thessaloniki, Greece}

    When referring to the ImageCLEF 2020 tasks in general, please cite the following publication which will be published by September 2020:

    author = {Bogdan Ionescu and Henning M\"uller and Renaud P\'{e}teri
    and Asma {Ben Abacha} and Vivek Datla and Sadid A. Hasan and Dina
    Demner-Fushman and Serge Kozlovski and Vitali Liauchuk and Yashin
    Dicente Cid and Vassili Kovalev and Obioma Pelka and Christoph M.
    Friedrich and Alba Garc\'{\i}a Seco de Herrera and Van-Tu Ninh and
    Tu-Khiem Le and Liting Zhou and Luca Piras and Michael Riegler and
    P\aa l Halvorsen and Minh-Triet Tran and Mathias Lux and Cathal Gurrin
    and Duc-Tien Dang-Nguyen and Jon Chamberlain and Adrian Clark and
    Antonio Campello and Dimitri Fichou and Raul Berari and Paul Brie and
    Mihai Dogariu and Liviu Daniel \c{S}tefan and Mihai Gabriel
    title = {{Overview of the ImageCLEF 2020}: Multimedia Retrieval in
    Medical, Lifelogging, Nature, and Internet Applications},
    booktitle = {Experimental IR Meets Multilinguality, Multimodality, and
    series = {Proceedings of the 11th International Conference of the CLEF
    Association (CLEF 2020)},
    year = {2020},
    volume = {12260},
    publisher = {{LNCS} Lecture Notes in Computer Science, Springer},
    pages = {},
    month = {September 22-25},
    address = {Thessaloniki, Greece}


    • Asma Ben Abacha <asma.benabacha(at)>, National Library of Medicine, USA
    • Vivek Datla <vivek.datla(at)>, Philips Research Cambridge, USA
    • Sadid A. Hasan <sadidhasan(at)>, CVS Health, USA
    • Dina Demner-Fushman <ddemner(at)>, National Library of Medicine, USA
    • Henning Müller <henning.mueller(at)>, University of Applied Sciences Western Switzerland, Sierre, Switzerland

    Join our mailing list: