You are here


Image-based plant identification at global scale



It is estimated that there are more than 300,000 species of vascular plants in the world. Increasing our knowledge of these species is of paramount importance for the development of human civilization (agriculture, construction, pharmacopoeia, etc.), especially in the context of the biodiversity crisis. However, the burden of systematic plant identification by human experts strongly penalizes the aggregation of new data and knowledge. Since then, automatic identification has made considerable progress in recent years as highlighted during all previous editions of PlantCLEF. Deep learning techniques now seem mature enough to address the ultimate but realistic problem of global identification of plant biodiversity in spite of many problems that the data may present (a huge number of classes, very strongly unbalanced classes, partially erroneous identifications, duplications, variable visual quality, diversity of visual contents such as photos or herbarium sheets, etc). The PlantCLEF2022 challenge edition proposes to take a step in this direction by tackling a multi-image (and metadata) classification problem with a very large number of classes (80k plant species).

Data collection

The training dataset that will be used this year can be distinguished in 2 main categories: "trusted" and "web" (i.e. with or without species labels provided and checked by human experts), totaling 4M images on 80k classes.

The "trusted" training dataset is based on a selection of more than 2.9M images covering 80k plant species shared and collected mainly by GBIF (and EOL to a lesser extent). These images come mainly from academic sources (museums, universities, national institutions) and collaborative platforms such as inaturalist or Pl@ntNet, implying a fairly high certainty of determination quality. Nowadays, many more photographs are available on these platforms for a few thousand species, but the number of images has been globally limited to around 100 images per species, favouring types of views adapted to the identification of plants (close-ups of flowers, fruits, leaves, trunks, ...), in order to not unbalance the classes and to not explode the size of the training dataset.

In contrast, the second data set is based on a collection of web images provided by search engines Google and Bing. This initial collection of several million images suffers however from a significant rate of species identification errors and a massive presence of duplicates and images less adapted for visual identification of plants (herbariums, landscapes, microscopic views...), or even off-topic (portrait photos of botanists, maps, graphs, other kingdoms of the living, manufactured objects, ...). The initial collection has been then semi-automatically revised to drastically reduce the number of these irrelevant pictures and to maximise, as for the trusted dataset, close-ups of flowers, fruits, leaves, trunks, etc. The "web" dataset finally contains about 1.1 million images covering around 57k species.

Lastly, the test set will be a set of tens of thousands pictures verified by world class experts related to various regions of the world and taxonomic groups.

Task description

The task will be evaluated as a plant species retrieval task based on multi-image plant observations from the test set. The goal will be to retrieve the correct plant species among the top results of a ranked list of species returned by the evaluated system. The participants will first have access to the training set and a few months later, they will be provided with the whole test set.
The primary metrics used for the evaluation of the task will be the Macro Averaged Mean Reciprocal Rank (MA-MRR). The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:
where |Q| is the total number of query occurrences in the test set. However, given the long tail of the data distribution, in order to compensate for species that would be underrepresented in the test set, we will use a Macro-Averaged version of the MRR (average MRR per species).

How to participate ?

Go to the AIcrowd PlantCLEF challenge page :

  1. Each participant has to register on AIcrowd ( with username, email and password. A representative team name should be used
    as username.
  2. In order to be compliant with the CLEF requirements, participants also have to fill in the following additional fields on their profile:
    • First name
    • Last name
    • Affiliation
    • Address
    • City
    • Country
  3. This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs. Once set up, participants will have access to the dataset tab on the challenge's page. A LifeCLEF participant will be considered as registered for a task as soon as he/she has downloaded a file of the task's dataset via the dataset tab of the challenge.


A total of 8 participants submitted 45 runs. The results are encouraging despite the great difficulty of the challenge! Thanks again for all your efforts and your investment on this problem of great importance for a better knowledge of the biodiversity of plants.

Team run name Aicrowd name Filename MA-MRR
Mingle Xu Run 8 MingleXu submission_epoch80 0.62692
Mingle Xu Run 7 MingleXu submission_epoch77 0.62497
Mingle Xu Run 6 MingleXu submission_epoch67 0.61632
Neuon AI Run 7 neuon_ai 7_trusted_with_trusted_and_web_inceptionres_inception_ens 0.60781
Neuon AI Run 3 neuon_ai 3_trusted_and_web_inceptionres_inception_ens 0.60583
Neuon AI Run 4 neuon_ai 4_trusted_with_trusted_and_web_inceptionres_inception_ens 0.60381
Neuon AI Run 9 neuon_ai 9_trusted_with_trusted_and_web_inceptionres_inception_ens 0.60301
Mingle Xu Run 5 MingleXu submission_epoch49 0.60219
Neuon AI Run 8 neuon_ai 8_trusted_with_trusted_and_web_inceptionres_inception_ens_pretained 0.60113
Neuon AI Run 5 neuon_ai 5_trusted_and_web_inceptionres_inception_ens_triplet_dictionary 0.59892
Neuon AI Run 6 neuon_ai 6_trusted_with_trusted_and_web_inceptionres_ens 0.58874
Mingle Xu Run 4 MingleXu submission_epoch24 0.58110
Mingle Xu Run 3 MingleXu submission_epoch15 0.56772
Mingle Xu Run 2 MingleXu submission 0.55865
Neuon AI Run 2 neuon_ai 2_trusted_inceptionres_inception_ens 0.55358
Neuon AI Run 1 neuon_ai 1_trusted_inceptionres_5_labels 0.54613
Chans Temple Run 10 Chans_Temple_1 rs34n50e_attempt_lookup_ip_agg 0.51043
Chans Temple Run 9 Chans_Temple_1 rs34e_attempt_lookup_ip_agg 0.49994
Chans Temple Run 7 Chans_Temple_1 naive_attempt_lookup_i_agg 0.49075
Chans Temple Run 6 Chans_Temple_1 naive_attempt_lookup_agg 0.48804
Chans Temple Run 3 Chans_Temple_1 naive_attempt_lookup_agg 0.48804
Chans Temple Run 2 Chans_Temple_1 naive_attempt 0.48661
Chans Temple Run 8 Chans_Temple_1 hydra_attempt_lookup_i_agg 0.47447
Chans Temple Run 4 Chans_Temple_1 naive_attempt_logit_agg 0.47034
Bio Machina Run 5 BioMachina results-resnet50-webpretrained-trusted-epoch25 0.46010
Bio Machina Run 6 BioMachina resnet101-epoch=7-step=180424-web.trusted 0.45011
Bio Machina Run 7 BioMachina resnet101-epoch=10-step=248083-web-trusted 0.44910
Bio Machina Run 3 BioMachina results-3 0.43820
Bio Machina Run 2 BioMachina results 0.43813
Bio Machina Run 4 BioMachina results-resnet50-webpretrained-trusted-epoch10 0.43606
Bio Machina Run 1 BioMachina best_supr_hefficientnet_b4-ds=trusted-epoch=15-train_loss=1.81-train_acc=0.57--val_loss=2.33-val_acc=0.49 0.41950
Bio Machina Run 8 BioMachina results 0.41240
Chans Temple Run 1 Chans_Temple_1 sanity_baseline 0.00036
klssncse Run 1 klssncse output 0.00029
Chans Temple Run 5 Chans_Temple_1 naive_attempt_lookup_i_agg 0.00019
klssncse Run 3 klssncse submission-kl 0.00018
SVJ-SSN-CSE Run 3 SVJ-SSN-CSE res_final 0.00015
WeSeePlants Run 1 WeSeePlants FinalResult 0.00007
klssncse Run 2 klssncse FinalSubmission_File 0.00005
SVJ-SSN-CSE Run 1 SVJ-SSN-CSE res 0.00005
klssncse Run 4 klssncse submission-kl 0.00003
SSN Lekshmi Run 1 SSN_Lekshmi submission-kl 0.00003
Mingle Xu Run 1 MingleXu results 0.00003
SVJ-SSN-CSE Run 2 SVJ-SSN-CSE res 0.00000


CEUR Working Notes

For detailed instructions, please refer to>.
A summary of the most important points:

  • All participating teams with at least one graded submission, regardless of the score, should submit a CEUR working notes paper.
  • Submission of reports is done through EasyChair – please make absolutely sure that the author (names and order), title, and affiliation information you provide in EasyChair match the submitted PDF exactly
  • Strict deadline for Working Notes Papers: 27 May 2022
  • Strict deadline for CEUR-WS Camera Ready Working Notes Papers: 1 July 2022
  • Templates are available here
  • Working Notes Papers should cite both the LifeCLEF 2022 overview paper as well as the PlantCLEF task overview paper, citation information will be added in the Citations section below as soon as the titles have been finalized.

  • Schedule

    • Jan 2022: registration opens for all LifeCLEF challenges
    • 15 February 2022: training data release
    • 1 April 2022: test data release and opening of the submission system
    • 15 May 2022: closing the submission system and release of processed results by the task organizers (online)
    • 27 May 2022: deadline for submission of working notes papers by the participants
    • 13 June 2022: notification of acceptance of working note papers [CEUR-WS proceedings]
    • 1 July 022: camera ready working notes papers of participants and organizers
    • 5-9 Sept 2022: CLEF 2022 Università di Bologna
Image icon small256fuzzr.gif9.29 MB
Image icon MRR.png7.57 KB
Image icon PlantCLEF2022_results_1.png79.71 KB