You are here

GeoLifeCLEF 2019

Location-Based Species Recommendation

Registration and data access

  • Each participant has to register on (https://www.crowdai.org) with username, email and password. A representative team name should be used
    as username.
  • In order to be compliant with the CLEF requirements, participants also have to fill in the following additional fields on their profile:
    • First name
    • Last name
    • Affiliation
    • Address
    • City
    • Country
  • Once set up, participants will have access to the CrowdAI GeoLifeCLEF challenge's page

  • Usage scenario

    Automatically predicting the list of species that are the most likely to be observed at a given location is useful for many scenarios in biodiversity informatics. First of all, it could improve species identification processes and tools by reducing the list of candidate species that are observable at a given location (be they automated, semi-automated or based on classical field guides or flora). More generally, it could facilitate biodiversity inventories through the development of location-based recommendation services (typically on mobile phones) as well as the involvement of non-expert nature observers. Last but not least, it might serve educational purposes thanks to biodiversity discovery applications providing functionalities such as contextualized educational pathways.

    Challenge

    The aim of the challenge is to predict the list of species that are the most likely to be observed at a given location. Therefore, we will provide a large training set of species occurrences, each occurrence being associated to a multi-channel image characterizing the local environment. Indeed, it is usually not possible to learn a species distribution model directly from spatial positions because of the limited number of occurrences and the sampling bias. What is usually done in ecology is to predict the distribution on the basis of a representation in the environmental space, typically a feature vector composed of climatic variables (average temperature at that location, precipitation, etc.) and other variables such as soil type, land cover, distance to water, etc. The originality of GeoLifeCLEF is to generalize such niche modeling approach to the use of an image-based environmental representation space. Instead of learning a model from environmental feature vectors, the goal of the task will be to learn a model from k-dimensional image patches, each patch representing the value of an environmental variable in the neighborhood of the occurrence (see figure below for an illustration).

    Data

    Train data downloadable at http://otmedia.lirmm.fr/LifeCLEF/GeoLifeCLEF2019/.

    Check out the Python scripts to simplify formatting of the dataset for the learning process (provided here https://github.com/maximiliense/GLC19).

    This year, the train dataset is augmented compared to the 2018 edition. In a nutshell, it will first include the train and test georeferenced occurrences of plant species from last year (file GLC_2018.csv). Plus, a large amount of plant species occurrences with uncertain identifications are added (file PL_complete.csv). They come from automatic species identification of pictures produced in 2017-2018 by the smartphone application Pl@ntNet, where users are mainly amators botanists. A trusted extraction of this dataset is also provided (file PL_trusted.csv), insuring a reasonable level of identification certainty. Finally, species occurrences from other kingdoms (as mammals, birds, amphibias, insects, fungis etc.) were selected from the GBIF database (file noPlant.csv). 33 environmental rasters (directory rasters GLC19/) covering the French territory are made available this year, so that each occurrence may be linked to an environmental tensor via a participant customizable Python code. These environmental rasters were constructed from various open datasets including Chelsea Climate [1], ESDB soil pedology data [2,3,4], Corine Land Cover 2012 soil occupation data, CGIAR-CSI evapotranspiration data [5,6], USGS Elevation data (Data available from the U.S. Geological Survey.) and BD Carthage hydrologic data. The test occurrences data come from independents datasets of the French National Botanical Conservatories. A detailed description of the protocol used to build the datasets will soon be available for download on this page.

    External data
    Participants are allowed to use other external training data but at the condition that (i) the experiment is entirely re-produceable, i.e. that the used external ressource is clearly referenced and accessible to any other research group in the world, (ii) participants submit at least one run without external training data so that we can study the contribution of such ressources, (iii) the additional ressource does not contain any of the test observations.

    Metric

    The main evaluation criteria will be the accuracy based on the 30 first answers, also called Top30. It is the mean of the function scoring 1 when the good species is in the 30 first answers, and 0 otherwise, over all test set occurrences.

    Registration and data access

    Please refer to the general LifeCLEF registration instructions

    References

    [1] Karger, Dirk Nikolaus, Conrad, Olaf, Böhner, Jürgen, Kawohl, Tobias, Kreft, Holger, Soria-Auza,
    Rodrigo Wilber, Zimmermann, Niklaus, Linder, H Peter, & Kessler, Michael. 2016. Climatologies
    at high resolution for the earth’s land surface areas. arXiv preprint arXiv :1607.00217.
    [2] Panagos, Panos. 2006. The European soil database. GEO : connexion, 5(7), 32–33.
    [3] Panagos, Panos, Van Liedekerke, Marc, Jones, Arwyn, & Montanarella, Luca. 2012. European Soil
    Data Centre : Response to European policy support and public data requirements. Land Use Policy,
    29(2), 329–338.
    [4] Van Liedekerke, M, Jones, A, & Panagos, P. 2006. ESDBv2 Raster Library-a set of rasters derived
    from the European Soil Database distribution v2. 0. European Commission and the European Soil
    Bureau Network, CDROM, EUR, 19945.
    [5] Zomer, Robert J, Bossio, Deborah A, Trabucco, Antonio, Yuanjie, Li, Gupta, Diwan C, & Singh,
    Virendra P. 2007. Trees and water : smallholder agroforestry on irrigated lands in Northern India.
    Vol. 122. IWMI.
    [6] Zomer, Robert J, Trabucco, Antonio, Bossio, Deborah A, & Verchot, Louis V. 2008. Climate change
    mitigation : A spatial analysis of global land suitability for clean development mechanism afforestation
    and reforestation. Agriculture, ecosystems & environment, 126(1), 67–80.