You are here

Robot vision

Welcome to the website of the 6th edition of the Robot Vision challenge!

The 6th edition of the Robot Vision challenge follows five previous successful events. As for the previous editions, the challenge will address the problem of semantic place classification using visual and depth information including object recognition.


  • 01.11.2013: The task has been released
  • 01.04.2014: Test data has been released
  • 09.04.2014: Validation data has been released
  • 15.04.2014: Submission system is now open
  • 01.05.2014: Submission deadline has been extended --> new date: May 5th
  • 15.05.2014: Results have been released



RobotVision Challenge
#GroupScore RoomsScore ObjectsSCORE TOTAL
1 NUDT 1075.50 3357.75 4430.25
2 UFMS 219.00 1519.75 1738.75
3 Baseline Results 67.5 186.25 253.75
4 AEGEAN -405 -995 -1400

#GroupSCORE TOTALRun name
1 NUDT 4430.25 1398246907602__resultfile_10.results
2 NUDT 4415.5 1397632548710__resultfile_2.results
3 NUDT 4412.25 1398157311400__resultfile_9.results
4 NUDT 4383.00 1398046098001__resultfile_6.results
5 NUDT 4352.50 1398046133341__resultfile_7.results
6 NUDT 4346.75 1397810034072__resultfile_5.results
7 NUDT 4157.25 1397631915396__resultfile_1.results
8 NUDT 4154.25 1398157207503__resultfile_8.results
9 NUDT 3995.75 1397787390352__resultfile_4.results
10 NUDT 3878.00 1397787357406__resultfile_3.results
11 UFMS 1738.75 1398810880439__cppp-ufms-k400-texton150.txt
12 UFMS 1666.00 1398810731689__cppp-ufms-k300-texton50.txt
13 UFMS 1652.50 1398810785755__cppp-ufms-k300-texton150.txt
14 UFMS 1633.00 1398810822410__cppp-ufms-k400-texton50.txt
15 Baseline Results 253.75 Use of the provided script (Color&Depth histogram + SVM)
16 AEGEAN -1400.00 1398077739795__roomsResults.txt
17 AEGEAN -1406.00 1398146557793__roomsResults.txt

Task overview

The proposal for Robot Vision 2014 task is focused on the use of multimodal information (visual and depth images) with application to semantic localization and object recognition. The main objective of this edition is to address the problem of robot localization in parallel to object recognition from a semantic point of view. Both problems are inherently related: the objects present in a scene can help to determine the room category and vice versa. Solutions presented should be as general as possible while specific proposals are not desired. In addition to the use of visual data, depth images acquired from a Microsoft Kinect device will be used, which has demonstrated to be a de facto standard in the use of depth images. In this new edition of the task, we will introduce strong variations between training and test scenarios, increasing the range application of participant proposals. Thanks to these changes, solutions presented in the task will be expected to solve the proposed challenge but also to solve the object recognition and localization problems in any environment.


An increasing amount of autonomous robots are expected to be built in the future to allow for ambient assisted living, responding to an aging population and a decreasing work force. In this context robots need to be able to adapt to environments, being able to localize their position based on integrated cameras and also recognize the objects that are suitable for manipulation. Place localization and object recognition can happen under very different lighting conditions (sunny and cloudy days, artificial lighting or darkness), and changing configurations on the environment distribution. That is, similar room categories and objects should be recognized in different buildings or environments, even when some of these environments have not been imaged previously. The detection of objects and the use of multimodal data through distance sensors underline the multimodal nature of the task.

What to do

A sequence of test frames has to be annotated. Each test frame consists of a visual image and a depth image (.pcd format) and the following information has to be provided: the semantic class of the room where the frame was acquired (corridor, bathroom, ...) and the list of pre-defined objects that can be identified in the scene (trash, computer, chair, ...)

Dataset description

There will be a single training sequence with 5000 visual images and 5000 depth images. These are all the rooms/categories that appear in the database:
  • Corridor
  • Hall
  • ProfessorOffice
  • StudentOffice
  • TechnicalRoom
  • Toilet
  • Secretary
  • VisioConferene
  • Warehouse
  • ElevatorArea

Sample images for all the room categories listed in the dataset


These are all the objects that can appear in any image of the database:
  • Extinguisher
  • Phone
  • Chair
  • Printer
  • Urinal
  • Bookself
  • Trash
  • Fridge

Sample images for all the objects listed in the dataset

Registering for the task and accessing the data

To participate in this task, please register by following the instructions found in the main ImageCLEF 2014 webpage.

All training datasets can be downloaded from

All frames are free for download without user or password. In addition to the dataset, we have created a useful Matlab script to be used as template for participants proposals. It performs all the steps for generating challenge solutions (features generation, training, classification and performance evaluation) and can be downloaded from the following link. The script processes a tiny dataset with 50 frames for training and 20 for test.

Validation data

In order to allow participants to evaluate their proposals with new sequences, we have created a new validation sequence (1500 images). This sequence includes new images of a non-previously seen building that presents similar room categories and objects as for the training sequences. Test sequence also includes images from this new building. Therefore, participants proposals are expected to cope with this situation.
This new dataset can be downloaded from

Test data

The test dataset can be downloaded from

All frames are free for download without user or password.

Participants have to process the test dataset and classify the room category and objects appearance for each frame. This is done by uploading submission files to the ImageCLEF system (see section below). Submission files can be directly generated when using the MATLAB script. Such script is similar to the original one but it includes the configuration for using test sequence.
If participants prefer to directly generate valid submission files, they have to present the following format:
14 Unknown Extinguisher !Chair !Printer Bookshelf Urinal Trash Phone Fridge
15 StudentOffice Extinguisher !Chair !Printer Bookshelf Urinal Trash Phone Fridge
where Unknown is used when the room category is not classified. !Object is used to select that the Object does not appear in the scene. Object is used to select that the Object appears in the scene.

Submission instructions

The submissions will be received through the ImageCLEF 2014 system, going to "Runs" and then "Submit run" and select track "ImageCLEF:RobotVision".

Evaluation methodology

For each frame in the test sequence, participants have to provide information related to the class/room category but also related to the presence of the objects listed in the dataset. The number of times a specific object appears in a frame it is not relevant. The final score for a run will be the sum of all the scores obtained for the frames included in the test sequence.
The following rules are used when calculating the final score for a frame:

Class/Room Category

  • The class/room category has been correctly classified: +1.0 points
  • The class/room category has been wrongly classified: -0.5 points
  • The class/room category has not been classified: 0.0 points


  • For each object correctly detected (True Positive): +1.0 points
  • For each object incorrectly detected (False Positive): -0.25 points points
  • For each object correctly detected as not present (True Negative) : 0.0 points
  • For each object incorrectly detected as not present (False Negative) : -0.25 points points


Three example of performance evaluation for a single test frame are exposed in the following lines.
Real values for the frame: A TechnicalRoom with two type of objects appearing in the scene: Phone and Printer. Maximum Score: 3.0
Class / Room CategoryExtinguisherPhoneChairPrinterUrinalBookselfTrashFridge
User decision a) It is TechnicalRoom with two type of objects appearing in the scene: Phone and Trash. Total Score: 1.5
Class / Room CategoryExtinguisherPhoneChairPrinterUrinalBookselfTrashFridge
User decision b) An Unknown Room with two type of objects appearing in the scene: Phone and Printer. Total Score: 2.0
Class / Room CategoryExtinguisherPhoneChairPrinterUrinalBookselfTrashFridge
User decision c) A Corridor with two type of objects appearing in the scene: Extinguiser and Printer. Total Score: 0.0
Class / Room CategoryExtinguisherPhoneChairPrinterUrinalBookselfTrashFridge



ImageCLEF lab and all its tasks are part of the Cross Language Evaluation Forum: CLEF 2014
CLEF 2014 CLEF Initiative