You are here



While deep neural network methods have proven their predictive power in a large number of tasks, there still are a number of particular domains where a single deep learning network is not enough for attaining high precision. Of course, this phenomenon has further repercussions, as it may impede future development or even market integration and adoption for methods that target those particular tasks and domains. Late fusion (also called ensembling or decision-level fusion) represents one of the methods that researchers in machine learning employ in order to increase the performance of single-system approaches. It consists of using a series of weaker learner methods called inducers, that are trained and tested on the dataset, whose prediction outputs are combined in the final step, via a fusion method (also called ensembling method or strategy) in order to create a new and improved set of predictions. These systems have a long history and are shown to be particularly useful in scenarios where the performance of single-system approaches is not considered satisfactory.

The usefulness of ensembling methods has been proved in a large number of tasks in the current literature. To this point, these approaches have been successful even with a low number of inducers, in traditional tasks such as video action recognition [Sudhakaran2020]. However, the ImageCLEFfusion 2023 task proposes some differences to these kinds of approaches.

First of all, we propose to focus this task on the prediction of a couple of subjective concepts, where ground truth is not absolute and may be different for different annotators. Therefore, we choose the prediction of media interestingness and search result diversification. Secondly, we wish to explore the power of late fusion approaches as much as possible, and therefore propose to provide a set of prediction results extracted from a very large number of inducers.

We are interested in exploring a number of aspects of fusion approaches for this task, including but not limited to: the performance of different fusion methods, methods for selecting inducers from a larger given set of inducers, the exploitation of positive and negative correlations between inducers, etc.

For this second edition of the task, we propose going forward with the two datasets we previously explored, based on regression and retrieval and add new data, representing a new machine learning paradigm, namely multi-class labeling.

Recommended reading

There are many published works on general fusion systems, however, we recommend two valuable works that analyze the current literature on ensembling [Gomes2017, Sagi2018]. Also, a number of works analyze a set of novel approaches that directly use deep neural networks as the primary ensembling methods [Ştefan2020, Constantin2021a]. Finally, [Constantin2022] presents an in-depth analysis of the way ensembling methods can be applied to the prediction of media interestingness. This is by no means an exhaustive list of works on late fusion, but we believe it is a strong starting point for studying this domain.


More information will be added soon!

Task Description

In a general sense, ensembling systems are represented by an algorithm or function F that, given a set S composed of M samples, and a set A composed of N inducers that output a vector of N predictions for each sample, is able to create a newer and better set of predictions for the set of samples by combining the outputs for each individual sample.

Fusion scheme

As mentioned, we provide two types of data for ImageCLEFfusion, generating two tasks, related to:

  • media interestingness (ImageCLEFfusion-int), a regression task
  • result diversification (ImageCLEFfusion-div), a retrieval task
  • medical image captioning (ImageCLEFfusion-cap), a multi-class labeling task

For ImageCLEFfusion, we will provide the outputs of the inducers associated with the media samples. The use of external data is prohibited for the two tasks, as well as the development and the use of any additional inducers other than the ones we provide. In this way we want to ensure a fair comparison of the fusion method and inducer selection principles without any variations in the inducer set.


ImageCLEFfusion-int. The data for this task is extracted and corresponds to the Interestingness10k dataset [Constantin2021b]. We will provide output data from 33 inducers, while 1826 samples will be used for the development set, and 609 samples will be used for the testing set.

ImageCLEFfusion-div. The data for this task is extracted and corresponds to the Retrieving Diverse Social Images Task dataset [Ionescu2020]. We will provide outputs data from 117 inducers, while 104 queries will be used for the development set, and 35 samples will be used for the testing set.

ImageCLEFfusion-cap. More information will be added soon!

Evaluation methodology

More information will be added soon!

Participant registration

Please refer to the general ImageCLEF registration instructions

Preliminary Schedule

More information will be added soon!

Submission Instructions

More information will be added soon!


More information will be added soon!

CEUR Working Notes

All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.


More information will be added soon!




  • Liviu-Daniel Ștefan, <liviu_daniel.stefan(at)>, Politehnica University of Bucharest, Romania
  • Mihai Gabriel Constantin, <mihai.constantin84(at)>, Politehnica University of Bucharest, Romania
  • Mihai Dogariu, <mihai.dogariu(at)>, Politehnica University of Bucharest, Romania
  • Bogdan Ionescu <bogdan.ionescu(at)>, Politehnica University of Bucharest, Romania


This task is supported under project AI4Media, A European Excellence Centre for Media, Society and Democracy, H2020 ICT-48-2020, grant #951911.