You are here

Visual Question Answering in the Medical Domain

Welcome to the 3rd edition of the Medical Domain Visual Question Answering Task!



With the increasing interest in artificial intelligence (AI) to support clinical decision making and improve patient engagement, opportunities to generate and leverage algorithms for automated medical image interpretation are currently being explored. Since patients may now access structured and unstructured data related to their health via patient portals, such access also motivates the need to help them better understand their conditions regarding their available data, including medical images.

The clinicians' confidence in interpreting complex medical images could also be enhanced by a “second opinion” provided by an automated system. In addition, patients may be interested in the morphology/physiology and disease-status of anatomical structures around a lesion that has been well characterized by their healthcare providers – and they may not necessarily be willing to pay significant amounts for a separate office- or hospital visit just to address such questions. Although patients often turn to search engines (e.g. Google) to disambiguate complex terms or obtain answers to confusing aspects of a medical image, results from search engines may be nonspecific, erroneous and misleading, or overwhelming in terms of the volume of information.


  • 10/12/2019: VQA-Med website goes live.
  • 31/01/2020: AICrowd projects VQA & VQG go public.

Tasks Description

In continuation of the two previous editions, this year’s task on visual question answering (VQA) consists in answering natural language questions from the visual content of associated radiology images. This year, we will focus particularly on questions about abnormalities.

The visual question generation (VQG) task is introduced for the first time in this third edition of the VQA-Med challenge. The task consists in generating relevant natural language questions about radiology images using their visual content.


  • 25/02/2020: Release of the training and validation datasets
  • 10/04/2020: Release of the test sets 
  • 10/05/2020: Run submission deadline
  • 17/05/2020: Release of the processed results by the task organizers
  • 24/05/2020: Deadline for submitting the working notes papers by the participants
  • 14/06/2020: Notification of acceptance of the working notes papers
  • 28/06/2020: Camera-ready copy of the working notes 
  • 22-25/09/2020: CLEF 2020, Thessaloniki, Greece

Participant Registration

Please refer to the general ImageCLEF registration instructions



Evaluation Methodology

The following preprocessing methodology would be applied before running the evaluation metrics on each answer for the visual question answering task:

  • Each answer is converted to lower-case
  • All punctuations are removed and the answer is tokenized to individual words

The evaluation would be conducted based on the following metrics:

  1. Accuracy (Strict)
    We use an adapted version of the accuracy metric from the general domain VQA task that considers exact matching of a participant provided answer and the ground truth answer.
  2. BLEU
    We use the BLEU metric [1] to capture the similarity between a system-generated answer and the ground truth answer. The overall methodology and resources for the BLEU metric are essentially similar to the ImageCLEF 2017 caption prediction task.
  3. References:

    [1] Papineni, K.; Roukos, S.; Ward, T.; Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.

    Submission Instructions

  • Each team is allowed to submit a maximum of 10 runs.
  • We expect the following format for the result submission file: <Image-ID><|><Answer>

    For example:

    rjv03401|answer of the first question in one single line
    AIAN-14-313-g002|answer of the second question
    wjem-11-76f3|answer of the third question

  • You need to respect the following constraints:

    • The separator between <Image-ID> and <Answer> has to be the pipe character (|).
    • Each <Image-ID> of the test set must be included in the run file exactly once.
    • All <Image-ID> must be present in a participant’s run file in the same order as the given test file.

  • Participants are allowed to use other resources asides from the official training/validation datasets, however the use of the additional resources must have to be explicitly stated. For meaningful comparison, we will separately group systems who exclusively use the official training data and who incorporate additional sources.

CEUR Working Notes





  • Asma Ben Abacha <asma.benabacha(at)>, National Library of Medicine, USA
  • Vivek Datla <vivek.datla(at)>, Philips Research Cambridge, USA
  • Sadid A. Hasan <sadidhasan(at)>, CVS Health, USA
  • Joey Liu <joey.liu(at)>, Philips Research Cambridge, USA
  • Dina Demner-Fushman <ddemner(at)>, National Library of Medicine, USA
  • Henning Müller <henning.mueller(at)>, University of Applied Sciences Western Switzerland, Sierre, Switzerland

Join our mailing list: