Motivation
While deep neural network methods have proven their predictive power in a large number of tasks, there still are a number of particular domains where a single deep learning network is not enough for attaining high precision. Of course, this phenomenon has further repercussions, as it may impede future development or even market integration and adoption for methods that target those particular tasks and domains. Late fusion (also called ensembling or decision-level fusion) represents one of the methods that researchers in machine learning employ in order to increase the performance of single-system approaches. It consists of using a series of weaker learner methods called inducers, that are trained and tested on the dataset, whose prediction outputs are combined in the final step, via a fusion method (also called ensembling method or strategy) in order to create a new and improved set of predictions. These systems have a long history and are shown to be particularly useful in scenarios where the performance of single-system approaches is not considered satisfactory.
The usefulness of ensembling methods has been proved in a large number of tasks in the current literature. To this point, these approaches have been successful even with a low number of inducers, in traditional tasks such as video action recognition [Sudhakaran2020]. However, the ImageCLEFfusion 2023 task proposes some differences to these kinds of approaches.
First of all, we propose to focus this task on the prediction of a couple of subjective concepts, where ground truth is not absolute and may be different for different annotators. Therefore, we choose the prediction of media interestingness and search result diversification. Secondly, we wish to explore the power of late fusion approaches as much as possible, and therefore propose to provide a set of prediction results extracted from a very large number of inducers.
We are interested in exploring a number of aspects of fusion approaches for this task, including but not limited to: the performance of different fusion methods, methods for selecting inducers from a larger given set of inducers, the exploitation of positive and negative correlations between inducers, etc.
For this second edition of the task, we propose going forward with the two datasets we previously explored, based on regression and retrieval and add new data, representing a new machine learning paradigm, namely multi-class labeling.
Recommended reading
There are many published works on general fusion systems, however, we recommend two valuable works that analyze the current literature on ensembling [Gomes2017, Sagi2018]. Also, a number of works analyze a set of novel approaches that directly use deep neural networks as the primary ensembling methods [Ştefan2020, Constantin2021a]. Finally, [Constantin2022] presents an in-depth analysis of the way ensembling methods can be applied to the prediction of media interestingness. This is by no means an exhaustive list of works on late fusion, but we believe it is a strong starting point for studying this domain.
News
More information will be added soon!
Task Description
In a general sense, ensembling systems are represented by an algorithm or function F that, given a set S composed of M samples, and a set A composed of N inducers that output a vector of N predictions for each sample, is able to create a newer and better set of predictions for the set of samples by combining the outputs for each individual sample.

As mentioned, we provide two types of data for ImageCLEFfusion, generating two tasks, related to:
- media interestingness (ImageCLEFfusion-int), a regression task
- result diversification (ImageCLEFfusion-div), a retrieval task
- medical image captioning (ImageCLEFfusion-cap), a multi-class labeling task
For ImageCLEFfusion, we will provide the outputs of the inducers associated with the media samples. The use of external data is prohibited for the two tasks, as well as the development and the use of any additional inducers other than the ones we provide. In this way we want to ensure a fair comparison of the fusion method and inducer selection principles without any variations in the inducer set.
Data
ImageCLEFfusion-int. The data for this task is extracted and corresponds to the Interestingness10k dataset [Constantin2021b]. We will provide output data from 33 inducers, while 1826 samples will be used for the development set, and 609 samples will be used for the testing set.
ImageCLEFfusion-div. The data for this task is extracted and corresponds to the Retrieving Diverse Social Images Task dataset [Ionescu2020]. We will provide outputs data from 117 inducers, while 104 queries will be used for the development set, and 35 samples will be used for the testing set.
ImageCLEFfusion-cap. More information will be added soon!
Evaluation methodology
More information will be added soon!
Participant registration
Please refer to the general ImageCLEF registration instructions
Preliminary Schedule
More information will be added soon!
Submission Instructions
More information will be added soon!
Results
More information will be added soon!
CEUR Working Notes
All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
Citations
More information will be added soon!
References
- [Sudhakaran2020] Sudhakaran, S., Escalera, S., & Lanz, O. (2020). Gate-shift networks for video action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1102-1111).
- [Sagi2018] Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
- [Gomes2017] Gomes, H. M., Barddal, J. P., Enembreck, F., & Bifet, A. (2017). A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), 1-36.
- [Ştefan2020] Ştefan, L. D., Constantin, M. G., & Ionescu, B. (2020, June). System Fusion with Deep Ensembles. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 256-260).
- [Constantin2021a] Constantin, M. G., Ştefan, L. D., & Ionescu, B. (2021, June). DeepFusion: Deep Ensembles for Domain Independent System Fusion. In the International Conference on Multimedia Modeling (pp. 240-252). Springer, Cham.
- [Constantin2022] Constantin, M. G., Ştefan, L. D., & Ionescu, B. (2022). Exploring Deep Fusion Ensembling for Automatic Visual Interestingness Prediction. In Human Perception of Visual Information (pp. 33-58). Springer, Cham.
- [Constantin2021b] Constantin, M. G., Ştefan, L. D., Ionescu, B., Duong, N. Q., Demarty, C. H., & Sjöberg, M. (2021). Visual Interestingness Prediction: A Benchmark Framework and Literature Review. International Journal of Computer Vision, 1-25.
- [Ionescu2020] Ionescu, B., Rohm, M., Boteanu, B., Gînscă, A. L., Lupu, M., & Müller, H. (2020). Benchmarking Image Retrieval Diversification Techniques for Social Media. IEEE Transactions on Multimedia, 23, 677-691.
Contact
Organizers:
- Liviu-Daniel Ștefan, <liviu_daniel.stefan(at)upb.ro>, Politehnica University of Bucharest, Romania
- Mihai Gabriel Constantin, <mihai.constantin84(at)upb.ro>, Politehnica University of Bucharest, Romania
- Mihai Dogariu, <mihai.dogariu(at)upb.ro>, Politehnica University of Bucharest, Romania
- Bogdan Ionescu <bogdan.ionescu(at)upb.ro>, Politehnica University of Bucharest, Romania
Acknowledgments
This task is supported under project AI4Media, A European Excellence Centre for Media, Society and Democracy, H2020 ICT-48-2020, grant #951911.