Crowdsourced initiatives such as iNaturalist, Tela Botanica, or iSpot produce big amounts of biodiversity data that are intended in the long term, to renew today’s ecological monitoring approaches with much more timely and cheaper raw input data. At the same time, with the recent advances in computer vision, we see the emergence of more and more effective mobile search tools allowing to set-up large scale data collection platforms such as the popular Pl@ntNet initiative. This platform is already being used by more than 500K people who produce tens of thousands of validated plant observations each year. This explicitly shared and validated data is only the tip of the iceberg. The real potential relies on the millions of raw image queries submitted by the users of the mobile application for which there is no human validation. People make such requests to get information on a plant along a hike or something they find in their garden but not know anything about. Allowing the exploitation of such contents in a fully automatic way could scale up the world-wide collection of plant observations by several orders of magnitude, and potentially bring a valuable resource for ecological monitoring studies.
Data collection and evaluated challenge
The test data to be analyzed is a large sample of the raw query images submitted by the users of the mobile application Pl@ntNet (iPhone & Androïd), covering a large number of wild plant species mostly coming from the Western Europe Flora and the North American Flora, but also plant species used all around the world as cultivated or ornamental plants, or even endangered species precisely because of their non-regulated commerce.
As training data, we will provide two main sets based both on the same list of 10 000 plant species:
- a “trusted” training set based on the online collaborative Encyclopedia Of Life (EoL)
- a “noisy” training set built through from web crawlers (more exactly from google and bing image search results)
The main idea of providing both datasets is to evaluate to what extent machine learning and computer vision techniques can learn from noisy data compared to trusted data (as usually done in supervised classification).
Pictures of EoL are themselves coming from several public databases (such as Wikimedia, iNaturalist, Flickr) or from some Institutions or less formal websites dedicated to botany. All the pictures can be potentially revised and rated on the EOL website.
On the other side, the noisy training set will contain more images for a lot of species, but with several type and level of noises which are basically impossible to automatically entirely control and clean: a picture can be associated to the wrong species but the correct genus or family, a picture can be a portrait of a botanist working on the species, the pictures can be associated to the correct species but be a drawing or an herbarium sheet of a dry specimen, etc.
The task will consist of automatically detecting in the Pl@ntNet query images, specimens of plants belonging to the provided training data. More practically, the run file to be submitted has to contain as much lines as the number of predictions, each prediction being composed of an ObservationId (the identifier of a specimen that can be itself composed of several images), a ClassId, a Probability and a Rank (used in case of equal probabilities). Each line should have the following format:
where Probability is a scalar in [0,1] representing the confidence of the system in that recognition (Probability=1 means that the system is very confident) and Rank is an integer between in [1:100] (i.e. one single test ObservationId might be associated to at most 100 species predictions).
Here is a short fake run example respecting this format for only 3 observations:
Each participating group is allowed to submit up to 4 runs built from different methods. Semi-supervised, interactive or crowdsourced approaches are allowed but will be compared independently from fully automatic methods. Any human assistance in the processing of the test queries has therefore to be signaled in the submitted runs.
We encourage participants to compare the use of the noisy and the trusted training setswithin their runs and it will be required to mention which training set were used in each run (EOL, WEB or EOL+WEB). Please note that the two training sets can have some common pictures (even if we excluded EOL domain from the web crawl).
Participants are allowed to use complementary training data (e.g. for pre-training purposes) but at the condition that (i) the experiment is entirely re-produceable, i.e. that the used external resource is clearly referenced and accessible to any other research group in the world, (ii) the use of external training data or not is mentioned for each run, and (iii) the additional resource does not contain any of the test observations.
The used metric will be the Mean Reciprocal Rank (MRR). The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries (spatial occurrences in our case), ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:
where |Q| is the total number of query occurrences in the test set.
A secondary metric will be the MRR computed at the genus level.