You are here

Robot Vision Task

Welcome to the website of the 3rd edition of the Robot Vision Challenge!

Mobile robot platform
used for data acquisition.

The third edition of the Robot Vision challenge is a continuation of two previous successful events. The RobotVision challenge was presented for the first time in 2009 and attracted considerable attention, with 7 participating groups and a total of 27 submitted runs. The second edition of the challenge was held in conjunction with ICPR 2010 and saw an increase in participation, with 9 participating groups and 34 submitted runs. As in case of the previous events, the challenge will address the problem of visual place classification, this time with a special focus on generalization.


  • 09.07.2010 - Final results of the challenge released!
  • 15.06.2010 - Test data released!
  • 03.05.2010 - Training and validation data released!


  • Andrzej Pronobis, Royal Institute of Technology, Stockholm, Sweden,
  • Barbara Caputo, IDIAP Research Institute, Martigny, Switzerland,
  • Henrik I. Christensen, Georgia Institute of Technology, Atlanta, GA, USA,
  • Marco Fornoni, IDIAP Research Institute, Martigny, Switzerland,

Contact person

Should you have any questions regarding the contest, please contact Andrzej Pronobis.


The ability to represent knowledge about space and its position therein is crucial for a mobile robot. To this end, topological and semantic descriptions are gaining popularity for augmenting purely metric space representations. Enhancing the space representation to be more meaningful from the point of view of spatial reasoning and human-robot interaction have been at the forefront of the issues being addressed. Indeed, in the concrete case of indoor environments, the ability to understand the existing topological relations and associate semantics terms such as "corridor" or "office" with places, gives a much more intuitive idea of the position of the robot than global metric coordinates. Vision has become in the last years the preferred sensor for capturing this type of information, as a large share of the semantic description of a place is encoded in its visual appearance.

The third edition of the challenge will focus on the problem of visual place classification, with a special focus on generalization. Participants will be asked to classify rooms and functional areas on the basis of image sequences, captured by a stereo camera mounted on a mobile robot within an office environment. The test sequence will be acquired within the same building but at a different floor than the training sequence. It will contain rooms of the same categorical type ("corridor", "office", "bathroom") and it will also contain room categories not seen in the training sequence ("meeting room", "library"). The system built by participants should be able to answer the question "where are you?" when presented with a test sequence imaging a room category seen during training, and it should be able to answer "I do not know this category" when presented with a new room category.

Important dates

  • 22.02.2010 - Registration open for all CLEF tasks
  • 03.05.2010 - Training and validation data and task release
  • 15.06.2010 - Test data release
  • 30.06.2010 - Submission of runs
  • 08.07.2010 - Release of results
  • 15.08.2010 - Submission of working notes papers to the workshop
  • 22.10.2010 - IROS workshop (tentative)


Task 1 (Obligatory)
Groups Runs
# Group Score
1 CVG 677.0
2 Idiap MULTI 662.0
3 NUDT 638.0
4 Centro Gustavo Stefanini 253.0
5 CAOR 62.0
6 DYNILSIS -20.0
7 UAIC2010 -77.0
# Group Score
1 CVG 677.0
2 Idiap MULTI 662.0
3 Idiap MULTI 657.0
4 Idiap MULTI 645.0
5 Idiap MULTI 644.0
6 CVG 643.0
7 CVG 639.0
8 NUDT 638.0
9 Idiap MULTI 637.0
10 Idiap MULTI 636.0
11 Idiap MULTI 629.0
12 Idiap MULTI 628.0
13 Idiap MULTI 620.0
14 Idiap MULTI 612.0
15 Idiap MULTI 605.0
16 CVG 599.0
17 Idiap MULTI 596.0
18 Centro Gustavo Stefanini 253.0
19 Centro Gustavo Stefanini 228.0
20 Centro Gustavo Stefanini 185.0
21 Centro Gustavo Stefanini 90.0
22 CAOR 62.0
23 CAOR 60.0
24 CAOR 54.0
25 Centro Gustavo Stefanini 52.0
26 Centro Gustavo Stefanini 48.0
27 Centro Gustavo Stefanini 9.0
28 Centro Gustavo Stefanini 5.0
29 DYNILSIS -20.0
30 UAIC2010 -77.0
31 Centro Gustavo Stefanini -131.0
32 Centro Gustavo Stefanini -172.0
33 Centro Gustavo Stefanini -342.0
34 Centro Gustavo Stefanini -391.0
35 Centro Gustavo Stefanini -560.0
36 Centro Gustavo Stefanini -618.0
37 Centro Gustavo Stefanini -624.0
38 Centro Gustavo Stefanini -926.0
39 Centro Gustavo Stefanini -971.0
40 Centro Gustavo Stefanini -1092.0
41 Centro Gustavo Stefanini -1101.0
42 Centro Gustavo Stefanini -1206.0

Task 2 (Optional)
Groups Runs
# Group Score
1 Idiap MULTI 2052.0
2 CAOR 62.0
3 DYNILSIS -67.0
# Group Score
1 Idiap MULTI 2052.0
2 Idiap MULTI 1770.0
3 Idiap MULTI 1361.0
4 Idiap MULTI 1284.0
5 Idiap MULTI 1262.0
6 Idiap MULTI 1190.0
7 Idiap MULTI 1028.0
8 Idiap MULTI 1019.0
9 Idiap MULTI 963.0
10 Idiap MULTI 916.0
11 Idiap MULTI 886.0
12 Idiap MULTI 682.0
13 CAOR 62.0
14 CAOR 57.0
15 CAOR 57.0
16 DYNILSIS -67.0

How to register for the task

Due to database restrictions it is necessary to sign a user agreement. Please print the document, sign it and send it via fax to Henning Müller. (For detailed instructions see the explanation in the document).

Finally, to register, please use the registration system available here. Registration is free of charge. If you already have a login from the ImageCLEF 2009 or the ICPR competition you can migrate it to ImageCLEF 2010 here.

The Task

Participants are given training data consisting of a sequence of stereo images (please note that it is not required to use stereo information in the contest and monocular vision systems relying on either the left or the right camera can be used). The training sequence was recorded using a mobile robot that was manually driven through several rooms of a typical indoor office environment (see the picture above illustrating the robot platform used). The acquisition was performed under fixed illumination conditions and at a given time. Each image in the training sequence is labeled and assigned to an ID and a semantic category of the area (usually a room) in which it was acquired.

The challenge is to build a system able to answer the question 'where are you?' (I'm in the kitchen, in the corridor, etc.) when presented with test sequence containing images acquired in a different environment (different floor of the same building) containing areas belonging to the semantic categories observed previously (present in the training sequence) or to new semantic categories (not imaged in the training sequence). The test images were acquired under similar illumination settings as the training data, but in a different office environment. The system should assign each test image to one of the semantic categories of the areas that were present in the training sequence or indicate that the image belongs to an unknown semantic category not included during training. Moreover, the system can refrain from making a decision (e.g. in the case of lack of confidence).

We consider two separate tasks, task 1 (obligatory) and task 2 (optional). In task 1, the algorithm must be able to provide information about the location of the robot separately for each test image, without relying on information contained in any other image (e.g. when only some of the images from the test sequences are available or the sequences are scrambled). In task 2, the algorithm is allowed to exploit continuity of the sequences and rely on the test images acquired before the classified image (images acquired after the classified image cannot be used). The same training, validation and testing sequences are used for both tasks. The reported results will be compared separately and winners will be announced for both tasks.

The competition starts with the release of annotated training and validation data. Moreover, the participants will be given a tool for evaluating performance of their algorithms. The test image sequences will be released later (see the schedule). The test sequences were acquired in a different environment than the training and validation sequences (one more floor of the same building), under similar conditions, and contain additional rooms belonging to semantic categories that were not imaged previously. The algorithms trained on the training sequence will be used to annotate each of the test images. The same tools and procedure as for the validation will be used to evaluate and compare the performance of each method during testing.

Detailed information about the data used for the competition, the experimental procedure as well as tools and criteria used to evaluate the performance of the algorithms can be found below.

Data Set

Characteristics of the Data

The image sequences used for the contest are taken from the previously unreleased COLD-Stockholm database. The sequences were acquired using the MobileRobots PowerBot robot platform equipped with a stereo camera system consisting of two Prosilica GC1380C cameras. Please note that either monocular or stereo vision system can be used in the contest. Download links for the sequences are available below. The acquisition was performed on three different floors of an office environment, consisting of 36 areas (usually corresponding to separate rooms) belonging to 12 different semantic and functional categories.

The robot was manually driven through the environment while continuously acquiring images at a rate of 5fps. Each data sample was then labeled as belonging to one of the areas according to the position of the robot during acquisition (rather than contents of the images). The video below presents the acquisition procedure as well as parts of image sequences showing the interiors of the rooms and variations captured in the images.

Acquisition of the COLD-Stockholm Database

Image Sequences

Each image sequence is stored as a set of JPEG files in a separate TAR archive. Windows users can use one of the free archive managers such as PeaZip to decompress the archives. Complete information about each image is encoded in its filename. The naming convention used to generate the image filenames is explained below:


  • {frame_number} - Number of the frame in the sequence
  • {camera} - Indicates the camera used to acquire the image ('Left' or 'Right')
  • {area_id} - Label indicating the ID of the area in which the image was acquired.
  • {area_category} - Label indicating the category of the area in which the image was acquired.

Three sequences were selected for the contest. There is one training sequence, one sequence that should be used for validation and one sequence for testing. The training and validation sequences are available for download, the test sequence will be released according to the schedule. Information about the sequences as well as download links are available below:

  • training - Sequence acquired in 11 areas, on the 6th floor of the office building, during the day, under cloudy weather. The robot was driven through the environment following a similar path as for the test and validation sequences and the environment was observed from many different viewpoints (the robot was positioned at multiple points and performed 360 degree turns).
  • validation - Sequence acquired in 11 areas, on the 5th floor of the office building, during the day, under cloudy weather. Similar path was followed as for the training sequence; however without making the 360 degree turns.
  • testing (available through the submission system) - Sequence acquired in 14 areas, on the 7th floor of the office building, during the day, under cloudy weather. The robot followed similar path as in case of the validation sequence.

Additional Resources

The camera calibration data are available for the stereo image sequences. The camera calibration has been performed for both cameras independently and within the stereo setup using the Camera Calibration Toolbox for Matlab. The calibration data are stored within Matlab .mat files as produced by the toolbox. The data are available for download compressed using ZIP or Tar/Gz.

Experimental Procedure

This section describes the experimental procedure that is suggested for the validation experiments. This procedure allows to test the algorithms in a scenario very similar to the one considered for the final test run.

As it was mentioned in the description of the task, the algorithms must be trained on a single training data sequence and tested on another sequence. Each image in the sequence should be assigned to one of the area categories available during training or marked as an unknown category (it is also possible to refrain from a decision). In order to make the validation experiments similar to the final test experiment, the training and validation sequences contain areas belonging to different semantic categories:

  • Training sequence:
    • Corridor
    • Kitchen
    • LargeOffice
    • MeetingRoom
    • PrinterArea
    • RecycleArea
    • SmallOffice
    • Toilet
  • Validation sequence:
    • Corridor
    • Elevator
    • LargeMeetingRoom
    • LargeOffice
    • MeetingRoom
    • PrinterArea
    • RecycleArea
    • SmallOffice
    • Toilet

Moreover, it is possible to simulate a run in case of which some of the training categories are missing and the algorithm is trained only on a subset of the areas present in the training sequence. The performance evaluation script allows to simulate such scenario.

The following can be an example of a single experiment:

  • Training on training, only the following rooms: Corridor, Kitchen, MeetingRoom, PrinterArea, SmallOffice, Toilet
  • Testing on validation, all rooms

Performance Evaluation

Performance Measure

The following rules are used when calculating the final score for a run:

  • +1.0 points for each correctly classified image belonging to one of the known categories.
  • -1.0 points for each misclassified image belonging to one of the known or unknown categories.
  • 0.0 points for each image that was not classified (the algorithm refrained from the decision).
  • +2.0 points for a correct detection of a sample belonging to an unknown category (true positive).
  • -2.0 points for an incorrect detection of a sample belonging to an unknown category (false positive).

Performance Evaluation Script

Python module/script is provided for evaluating performance of the algorithms on the test/validation sequence. The script and some examples are available:

Python is required in order to use the module or execute the script. Python is available for Unix/Linux, Windows, and Mac OSX and can be downloaded from The knowledge of Python is not required in order to simply run the script; however, basic knowledge might be useful since it can also be integrated with other scripts as a module. A good quick guide to Python can be found at

The archive contains three files:

  • - the main Python script/module
  • - small example illustrating how to use as a module
  • example.results - example of a file containing fake, fully correct results for the validation sequence

When using the script/module, the following codes should be used to represent a room category:

  • Corridor
  • Kitchen
  • LargeOffice
  • MeetingRoom
  • PrinterArea
  • RecycleArea
  • SmallOffice
  • Toilet
  • Unknown - room not available during training
  • empty string - no result provided

The script calculates the final score by comparing the results to the groundtruth encoded as part of its contents. The score is calculated for one set of training/validation/testing sequences.

Using as a script can simply be executed as a script. Given that Python is already installed, running the script without any parameters will produce the following usage note:

|                                |
| RobotVision@ImageCLEF'10 Performance Evaluation Script |
| Author: Andrzej Pronobis, Marco Fornoni                |

Error: Incorrect command line arguments.

Usage: ./ [Options] <results_file> <test_sequence>

  <results_file>  - Path to the results file. Each line in the file
                    represents a classification result for a single
                    image and should be formatted as follows:
                    <frame_number> <room_id>
  <test_sequence> - ID of the test sequence: 'training' or 'validation'

  -u, --unknown <room_ids> - Treat rooms <room_ids> as unknown.
                             <room_ids> should contain a list of IDs
                             of rooms that should be treated as unknown,
                             separated by '-' e.g. LargeOffice-RecycleArea

In Linux, it is sufficient to make the executable (chmod +x ./ and then type ./ in the console. In Windows, the .py extension is usually assigned to the Python interpreter and typing in the console (cmd) is sufficient to produce the note presented above.

In order to obtain the final score for a given test sequence, run the script with the parameters described above e.g. as follows: -u Corridor example.results validation

The command will produce the score for the results taken from the example.results file obtained for the validation sequence. The option -u specifies that the corridor (room ID 'Corridor') should be treated as unknown during training. The outcome should be as follows:

  Calculating the score... 
  Final score: 988.0

Each line in the results file should represent a classification result for a single image. Since each image can be uniquely identified by its frame number, each line should be formatted as follows:
<frame_number> <area_label>
As indicated above, <area_label> can be left empty and the image will not contribute to the final score (+0.0 points). The file example.results contains fake, fully correct results for the validation sequence. The score 988.0 is the result of most of the images being correctly classified and all the images acquired in the corridor being misclassified as 'Corridor' while they should be marked as unknown 'Unknown'.

Using as a module in other scripts can also be used as a module within other Python scripts. This might be useful in case when the results are calculated using Python and stored as a list. In order to use the module, import it as shown in the script and execute the evaluate function.

The function evaluate is defined as follows:

def evaluate(results, testSequence, unknownRooms = [])

The function returns the final score for the given results and test sequence ID. As in case of the script, it is possible to specify that some of the rooms in the test sequence should be treated as unknown i.e. unavailable during training. Additionally, the function returns the number of images for which results were not provided.

The function should be executed as follows:
score, missing = robotvision.evaluate(results, testSequence, unknownRooms)
with the following parameters:

  • results - results table of the following format:
    results = [ ("<frame_numer1>", "<area_label1>"), ..., ("<frame_numberN>", "<area_labelN>") ]
  • testSequence - ID of the test sequence, use either "validation" or "testing"
  • unknownRooms - a list of IDs of rooms that should be treated as unknown e.g. unknownRooms = ["LargeOffice", "RecycleArea"]

Submission of Results

Each participant can submit several sets of results e.g. for different algorithms. As mentioned above, it is obligatory to submit results for task 1 and optional for task 2. When submitting the results, it is important to indicate the correct task to which the results correspond. Results submitted for an incorrect task or submitted without indicating to which task they correspond will be disqualified. It is allowed to submit 20 different runs per task.

The format accepted by the performance evaluation script should be used for the submitted result files. Therefore, the format of the files should be as follows:

  • Each line in the file should report a result for a single frame (note that a pair of stereo images corresponds to one frame).
  • There are 2741 images in the test sequence, therefore there should be 2741 lines in the file.
  • Each line should have the following format: <frame_no> <label>
  • <frame_no> should be an integer between 1 and 2741.
  • <label> can either be empty or one of the following: Unknown, Corridor, Kitchen, LargeOffice, MeetingRoom, PrinterArea, RecycleArea, SmallOffice, Toilet

Please use the submission system to submit the results. Select Runs->Submit a run and fill in the form as follows:

  • Select the RobotVision track.
  • In the "method description" box, describe the approaches and algorithms used to generate the results. If you submit multiple runs, describe the differences. The description should be clear to the organizers, and notes such as "run 3, with gamma=1" will not be accepted.
  • Retrieval type: Not applicable
  • Language: Automatic annotation with visual information only
  • Run type: Not applicable
  • In the "other information" box, clearly specify the task to which the submitted run corresponds (task 1 - obligatory or task 2 - optional).

The gate will close at 30.06.2010 at 00:00 CET.