Welcome to the 3rd edition of the ToPicto Task!
Motivation
Several genetic diseases, such as Rett syndrome, can result in language impairment, thereby interfering with the development of language skills such as speaking, listening, reading, and writing. Both language production and comprehension are impaired. Language impairment may also arise from incidents such as a car accident or a stroke, leading to aphasia — a partial or complete loss of the ability to express oneself or understand written and spoken language. In these particular cases, Augmentative and Alternative Communication (AAC) can be implemented. AAC involves the use of pictograms to help individuals accurately convey their messages [1].
In AAC, Pictograms refer to an image representing a more or less concrete concept. It can be a single word, a named entity, or a polylexical expression among others (see the example with pictograms taken from ARASAAC, a collection featuring over 25,000 pictograms freely available under a Creative Commons CC-BY-NC-SA license).
Using pictograms as a communication aid has proven effective in visualizing syntax, manipulating words, and facilitating language access [2,3]. Moreover, the use of AAC has a positive social impact for individuals with language impairment. The “Croix-Rouge” (French Red Cross) has identified a reduction in stress, an improvement in autonomy and health, and greater serenity and enjoyment in daily life [4]. However, not everyone has prior knowledge about AAC and pictograms. Yet, in a situation where a “verbal” person aims to communicate with an AAC user, a tool that converts the two modalities — speech and text — into a sequence of pictograms is essential. By providing a relevant and comprehensible sequence of pictograms for the impaired person, the communication between the two parties can be initiated.
The goal of ToPicto is to bring together linguists, computer scientists, and translators to develop new translation methods to translate either speech or text into a corresponding sequence of pictograms.
Preliminary Schedule
- 26.01.2026: Registration opens for all ImageCLEF tasks
- 23.04.2026: Registration closes for all ImageCLEF tasks
- 13.02.2026: Development dataset released
- 09.03.2026: Test dataset released
- 07.05.2026: Deadline for submitting participant runs
- 14.05.2026: Release of the processed results by the task organizers
- 28.05.2026: Submission of participant papers [CEUR-WS]
- 30.06.2026: Notification of acceptance
- 21.09.2026: CLEF 2026, Jena, Germany
Task Description
ImageCLEFToPicto 2026 consists of two substaks:
Text-to-Picto
The participants will be requested to develop solutions for translating text into a sequence of pictogram terms, with each of them linked to a unique pictogram image from ARASAAC.

Text-to-Picto task focuses on the automatic generation of a corresponding sequence of pictogram terms from an English text. This challenge can be seen as a translation problem, where the source language is English, and the target language is English pictogram terms.
The providing translation has to follow the specifications regarding a translation in pictograms, understandable by AAC users.
Next-Pictogram Prediction
This subtask focuses on predicting the most probable next pictogram in a given sequence.
Data
The data for the task is sourced from the CommonVoice v.15 corpus [5]. Common Voice is a corpus of speech data recorded by users on the Common Voice platform, and is based on text from various public-domain sources, including blog posts, old books, movies, and other public speech corpora. This type of text mimics the interactions between individuals who rely on pictograms due to language impairments and conversation partners.
For ToPicto, we provide a corresponding sequence of terms linked to a pictogram from the oral transcription.
Text-to-Picto
- Input: a JSON file with the following information (only for training and validation data, for test you will be only given the id and src):
| Tag |
Definition |
Example |
| id |
unique identifier of each utterance |
cefc-tcof-Acc_del_07-1 |
| src |
source of the utterance - text from oral transcription |
you cannot know |
| tgt |
target of the utterance - sequence of pictogram terms (tokens) |
you be_able_to know not |
| pictos |
a list of pictogram identifiers linked to each pictogram terms (the size is the same as the target output)*
|
[6625, 35949, 16885, 5526] |
|
|
|
- Output: a JSON file with the following information:
| Tag |
Definition |
Example |
| id |
unique identifier of each utterance |
cefc-tcof-Acc_del_07-1 |
| hyp |
hypothesis given by your system / model corresponding to the sequence of pictogram terms |
you know not |
Test set
The statistics of the test set are given below.
The table provides information about the source texts (src) and the target pictogram sequences (tgt).
| Metric |
Value |
| Number of utterances |
4,306 |
| Min length (words in 'src') |
2 |
| Max length (words in 'src') |
104 |
| Average length (words in 'src') |
7.94 |
| Unique tokens in 'tgt' |
2,913 |
Unique pictos in 'picto_id' |
3,713 |
TBA
Evaluation Methodology
The evaluation is conducted using sacreBLEU [7], METEOR [8], and the Picto-term Error Rate (PictoER) [9]. For all three metrics, the evaluation involves comparing the hypothesis (hyp) with the target (tgt), i.e., the sequence of pictogram terms.
- SacreBLEU measures the number of common n-grams between the translation hypothesis (hyp) and the reference translation (tgt).
- METEOR performs an alignment between the translation hypothesis and the reference translations, going beyond simple word matching. It takes into account not only direct matches but also those based on synonyms, morphological variations (such as lemmas and word roots), and even paraphrases. The evaluation is more nuanced because it captures additional semantic information that is not encoded in the BLEU score.
- PictoER is a metric derived from WER. Instead of evaluating the number of errors at the word level, we focus on the number of errors of tokens, each linked to an ARASAAC pictogram.
How are the scores computed?
Let's take the following example with hyp the hypothesis given by your system:
- tgt: he plays three seasons with the team
- hyp: he plays three seasons on the team
| Metric |
sacreBLEU score (with N = 4, BP = 1.0): |
METEOR score (with Gamma = 0.5, Beta = 3, Nchunks = 2.0, Nunigrams = 7): |
PictoER (N = 7): |
| Details |
- unigram precision = 6/7 = 0.85
- bigram precision = 4/6 = 0.66
- trigram precision = 2/5 = 0.4
- 4-gram precision = 1/4 = 0.25
|
- precision (P) = 0.857
- recall (R) = 0.857
|
- substitution (S) = 1
- deletion (D) = 0
- insertion (I) = 0
|
| Score |
48.89 |
Fmean = (10 * P * R) / (R + 9P) = 0.857
Penalty = gamma * (Nchunks / Nunigrams)^Beta = 0.018
METEOR = Fmean * (1 - Penalty) = 84.1 |
(S + D + I) / N = 1 / 7 = 0.142 * 100 = 14.2 |
How to interpret the results?
- SacrebLEU [0 - 100]: the higher the better. A translation is considered good when the BLEU score is above 30.
- METEOR [0 - 100]: the higher the better. A translation is considered good when the BLEU score is above 40.
- PictoER [0 - 100]: the lower the better. It gives a quick overview of the number of incorrectly predicted tokens.
Visualization of the output
The target (tgt) or the hypothesis (hyp) provided by your model for each utterance should be a sequence of pictogram terms (tokens). Each token should correspond to a pictogram from ARASAAC. To visualize the output with the pictogram images, we developed a platform available here: https://huggingface.co/spaces/ToPicto/Visualize-Pictograms
How to use it:
1. Write a sequence of pictogram terms.
2. The platform will display the corresponding pictogram images.
Example:
- id: common_voice_fr_24203862
- src: common_voice_fr_24203862.wav
- tgt/hyp: quatre concert coordonner à ville et à new_york
TBA
Participant registration
Please refer to the general ImageCLEF registration instructions
Results
CEUR Working Notes
Citations
[1] Romski, M., & Sevcik, R. A. (2005). Augmentative communication and early intervention: Myths and realities. Infants & Young Chitdren, 18(3), 174-185.
[2] Cataix-Nègre, É. (2017). Communiquer autrement: Accompagner les personnes avec des troubles de la parole ou du langage : les communications alternatives. De Boeck Supérieur.
[3] Beukelman, D.R. and Mirenda, P. (2013). Augmentative and Alternative Communication: Supporting Children and Adults with Complex Communication Needs. Paul H. Brookes Pub.
[4] Communication alternative améliorée (CAA) : la Croix-Rouge française dévoile sa première étude d’impact social ! (2021). Croix-Rouge. Retrieved June 28, 2023, from https://www.croix-rouge.fr/actualite/communication-alternative-amelioree-caa-la-croix-rouge-francaise-devoile-sa-premiere-etude-d-impact-social-2513
[5] Ardila et al. (2020). Common Voice: A Massively-Multilingual Speech Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4218–4222, Marseille, France. European Language Resources Association.
[6] C. Benzitoun, J.-M. Debaisieux, H.-J. Deulofeu (2016). Le projet ORFÉO : un corpus d'études pour le français contemporain. Corpus n°15, p. 91-114.
[7] Post, M. (2018). A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers. Belgium, Brussels : Association for Computational Linguistics, p. 186-191.
[8] Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72).
[9] Woodard, J. P., & Nelson, J. T. (1982). An information theoretic measure of speech recognition performance. In Workshop on standardisation for speech I/O technology, Naval Air Development Center, Warminster, PA.
Contact
- Maja Hjuler — <maja-jonck.hjuler(at)univ-grenoble-alpes.fr>, Université Grenoble Alpes, LIG, France
- Diandra Fabre — <diandra.fabre(at)univ-grenoble-alpes.fr>, Université Grenoble Alpes, LIG, France
- Benjamin Lecouteux — <benjamin.lecouteux(at)univ-grenoble-alpes.fr>, Université Grenoble Alpes, LIG, France
- Didier Schwab — <didier.schwab(at)univ-grenoble-alpes.fr>, Université Grenoble Alpes, LIG, France
Acknowledgments