You are here

BirdCLEF 2018

banniere

Usage scenario

As in 2017, two scenarios will be evaluated, (i) the identification of a particular bird specimen in a recording of it, and (ii), the recognition of all specimens singing in a long sequence (up to one hour) of raw soundscapes that can contain tens of birds singing simultaneously. The first scenario is aimed at developing new interactive identification tools, to help user and expert who is today equipped with a directional microphone and spend too much time observing and listening the birds to assess their population on the field. The soundscapes, on the other side, correspond to a passive monitoring scenario in which any multi-directional audio recording device could be used without or with very light user’s involvement, and thus efficient biodiversity assessment.

Subtask1: monophone recordings

Task overview

The goal of the task is to identify the species of the most audible bird (i.e. the one that was intended to be recorded) in each of the provided test recordings. Therefore, the evaluated systems have to return a ranked list of possible species for each of the 8,596 test recordings. Each prediction item (i.e. each line of the file to be submitted) has to respect the following format:
< MediaId;ClassId;Rank>

Each participating group is allowed to submit up to 4 runs built from different methods. Semi-supervised, interactive or crowdsourced approaches are allowed but will be compared independently from fully automatic methods. Any human assistance in the processing of the test queries has therefore to be signaled in the submitted runs.

Participants are allowed to use any of the provided metadata complementary to the audio content (.wav 44.1, 48 kHz or 96 kHz sampling rate), and will also be allowed to use any external training data but at the condition that (i) the experiment is entirely re-producible, i.e. that the used external resource is clearly referenced and accessible to any other research group in the world, (ii) participants submit at least one run without external training data so that we can study the contribution of such resources, (iii) the additional resource does not contain any of the test observations. It is in particular strictly forbidden to crawl training data from: www.xeno-canto.org

Dataset

The data collection will be the same as the one used in BirdCLEF 2017, mostly based on the contributions of the Xeno-Canto network. The training set contains 48,843 recordings covering 1500 species of central and south America (the largest bioacoustic dataset in the literature). It has a massive class imbalance with a minimum of four recordings for Laniocera rufescens and a maximum of 160 recordings for Henicorhina leucophrys. Recordings are associated to various metadata such as the type of sound (call, song, alarm, flight, etc.), the date, the location, textual comments of the authors, multilingual common names and collaborative quality ratings. The test set contains 8,596 recordings of the same type (mono-phone recordings). More details about that data can be found in the overview working note of BirdCLEF 2017.

Metric

The used metric will be the Mean Reciprocal Rank (MRR). The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:
MRR formula
where |Q| is the total number of query occurrences in the test set.

Subtask2: soundscape recordings

Task overview

The goal of the task is to localize and identify all audible birds within the provided soundscape recordings. Each soundscape will have to be divided into segments of 5 seconds, and a list of species associated to probability scores will have to be returned for each segment. Each prediction item (i.e. each line of the file) has to respect the following format:
< MediaId;TC1-TC2;ClassId;probability>
where probability is a real value in [0;1] decreasing with the confidence in the prediction, and where TC1-TC2 is a timecode interval with the format of hh:mm:ss with a length of 5 seconds (e.g.: 00:00:00-00:00:05, then 00:00:05-00:00:10).

Each participating group is allowed to submit up to 4 runs built from different methods. Semi-supervised, interactive or crowdsourced approaches are allowed but will be compared independently from fully automatic methods. Any human assistance in the processing of the test queries has therefore to be signaled in the submitted runs.

Participants are allowed to use any of the provided metadata complementary to the audio content (.wav 44.1, 48 kHz or 96 kHz sampling rate), and will also be allowed to use any external training data but at the condition that (i) the experiment is entirely re-producible, i.e. that the used external resource is clearly referenced and accessible to any other research group in the world, (ii) participants submit at least one run without external training data so that we can study the contribution of such resources, (iii) the additional resource does not contain any of the test observations. It is in particular strictly forbidden to crawl training data from: www.xeno-canto.org

Dataset

The training set contains 48,843 monophone recordings of the Xeno-Canto network covering 1500 species of central and south America (the largest bioacoustic dataset in the literature). It has a massive class imbalance with a minimum of four recordings for Laniocera rufescens and a maximum of 160 recordings for Henicorhina leucophrys. Recordings are associated to various metadata such as the type of sound (call, song, alarm, flight, etc.), the date, the location, textual comments of the authors, multilingual common names and collaborative quality ratings. Complementary to that data, a validation set of soundscapes with time-coded labels will be provided as training data. It will contain about 15 minutes of soundscapes split in 180 annotated segments of 5 seconds.
The test set itself will contain 6 hours of soundscapes split in 4462 segments of 5 seconds (to be processed as separate queries).

Metric

The used metric will be the classification mean Average Precision (c-mAP), considering each class c of the ground truth as a query. This means that for each class c, we will extract from the run file all predictions with ClassId=c, rank them by decreasing probability and compute the average precision for that class. We will then take the mean across all classes. More formally:
equation1
where C is the number of species in the ground truth and AveP(c) is the average precision for a given species c computed as:
equation1
where k is the rank of an item in the list of the predicted segments containing c, n is the total number of predicted segments containing c, P(k) is the precision at cut-off k in the list, rel(k) is an indicator function equaling 1 if the segment at rank k is a relevant one (i.e. is labeled as containing c in the ground truth) and nrel is the total number of relevant segments for c.

Registration and data access

Please refer to the general LifeCLEF registration instructions

AttachmentSize
cmap.gif1.94 KB