You are here

PlantCLEF 2021

banniere

Motivation

For several centuries, botanists have collected, catalogued and systematically stored plant specimens in herbaria. These physical specimens are used to study the variability of species, their phylogenetic relationship, their evolution, or phenological trends. One of the key step in the workflow of botanists and taxonomists is to find the herbarium sheets that correspond to a new specimen observed in the field. This task requires a high level of expertise and can be very tedious. Developing automated tools to facilitate this work is thus of crucial importance. More generally, this will help to convert these invaluable centuries-old materials into FAIR data.

Data collection

The task will rely on a large collection of more than 60,000 herbarium sheets that were collected in French Guyana (i.e. from the Herbier IRD de Guyane ) and digitized in the context of the e-ReColNat project. iDigBio (the US National Resource for Advancing Digitization of Biodiversity Collections) hosts millions of images of herbarium specimens. Several tens of thousands of these images, illustrating the French Guyana flora, will be used for the PlantCLEF task this year. A valuable asset of this collection is that several herbarium sheets are accompanied by a few pictures of the same specimen in the field. For the test set, we will use in-the-field pictures coming different sources including Pl@ntNet and Encyclopedia of Life.

The training dataset of the LifeCLEF2021 Plant Identification challenge will be based on the same visual data used during the previous LifeCLEF 2020 Plant Identification challenge but will also introduce new data related to 5 "traits" covering exhaustively all the 1000 species of the challenge. Traits are a very valuable information that can potentially help improve prediction models. Indeed, it can be assumed that species which share the same traits also share to some extent common visual appearances. This information can then potentially be used to guide the learning of a model through auxiliary loss functions for instance.
These data were collected through the Encyclopedia of Life API. The 5 most exhaustive traits ("plant growth form", "habitat", "plant lifeform", "trophic guild" and "woodiness") were verified and completed by experts of the Guyanese flora, so that each of the 1000 species have a value for each trait.

Task description

The challenge will be evaluated as a cross-domain classification task. The training set will consist of herbarium sheets whereas the test set will be composed of field pictures. To enable learning a mapping between the herbarium sheets domain and the field pictures domain, we will provide both herbarium sheets and field pictures for a subset of species. The metrics used for the evaluation of the task will be the classification accuracy and the Mean Reciprocal Rank.

How to participate ?

A direct link to the related challenge on AIcrowd platform will be provided soon. In the mean time:

  1. Each participant has to register on AIcrowd (https://www.aicrowd.com/) with username, email and password. A representative team name should be used
    as username.
  2. In order to be compliant with the CLEF requirements, participants also have to fill in the following additional fields on their profile:
    • First name
    • Last name
    • Affiliation
    • Address
    • City
    • Country
  3. This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs. Once set up, participants will have access to the dataset tab on the challenge's page. A LifeCLEF participant will be considered as registered for a task as soon as he/she has downloaded a file of the task's dataset via the dataset tab of the challenge.

Reward

The winner of each of the four LifeCLEF 2020 challenges will be offered a cloud credit grants of 5k USD as part of Microsoft's AI for earth program.
Pl@ntNet

Credits

Pl@ntNet      iDigBio