You are here


Image-based plant identification at global scale



It is estimated that there are more than 300,000 species of vascular plants in the world. Increasing our knowledge of these species is of paramount importance for the development of human civilization (agriculture, construction, pharmacopoeia, etc.), especially in the context of the biodiversity crisis. However, the burden of systematic plant identification by human experts strongly penalizes the aggregation of new data and knowledge. Since then, automatic identification has made considerable progress in recent years as highlighted during all previous editions of PlantCLEF. Deep learning techniques now seem mature enough to address the ultimate but realistic problem of global identification of plant biodiversity in spite of many problems that the data may present (a huge number of classes, very strongly unbalanced classes, partially erroneous identifications, duplications, variable visual quality, diversity of visual contents such as photos or herbarium sheets, etc).

Data collection

The training dataset that will be used this year can be distinguished in 2 main categories: labeled and unlabeled (i.e. with or without species labels provided and checked by humans). The labeled training dataset will be based on a dataset of more than 5M images covering more than 290k plant species based on a web crawl with Google and Bing search engines and the Encyclopedia of Life webportal. All datasets provided in previous editions of PlantCLEF can also be used and the use of external data will be encouraged, notably via the gbif-dl package which facilitates the download of media data from the world's largest biodiversity database GBIF by wrapping its public API. The unlabeled training dataset will be based on more than 9 million pictures coming from the Pl@ntNet platform (associated with a pseudo-label but without human verification). Finally, the test set will be a set of tens of thousands pictures verified by world class experts related to various regions of the world and taxonomic groups.

Task description

The task will be evaluated as a plant species retrieval task based on multi-image plant observations from the test set. The goal will be to retrieve the correct plant species among the top results of a ranked list of species returned by the evaluated system. The participants will first have access to the training set and a few months later, they will be provided with the whole test set. Semi-supervised or unsupervised approaches will be strongly encouraged and a starter package with a pre-trained model based on this type of method exploiting the unlabeled training dataset will be provided.

How to participate ?

Go to the AIcrowd PlantCLEF challenge page (the link will be provided soon). In the mean time:

  1. Each participant has to register on AIcrowd ( with username, email and password. A representative team name should be used
    as username.
  2. In order to be compliant with the CLEF requirements, participants also have to fill in the following additional fields on their profile:
    • First name
    • Last name
    • Affiliation
    • Address
    • City
    • Country
  3. This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs. Once set up, participants will have access to the dataset tab on the challenge's page. A LifeCLEF participant will be considered as registered for a task as soon as he/she has downloaded a file of the task's dataset via the dataset tab of the challenge.

Image icon small256fuzzr.gif9.29 MB