The fourth edition of the RobotVision challenge will focus on the problem of multi-modal place classification. Participants will be asked to classify functional areas on the basis of image sequences, captured by a perspective camera and a kinect mounted on a mobile robot within an office
environment. Therefore, participants will have available visual (RGB) images and depth images generated from 3D cloud points. The test sequence will be acquired within the same building and floor but there can be variations in the lighting conditions (sunny, cloudy, night) or the acquisition order (clockwise and counter clockwise). Best proposals will be awarded with economic (and academic) prices.
Two different tasks will be considered in this edition: task 1 and task 2. For both tasks, participants should be able to answer the question "where are you?" when presented with a test sequence imaging a room category seen during training.
The main difference between both tasks will be the presence (or lack) of kidnappings in the final test sequence. The importance of kidnappings is explained below.
For both tasks (task 1 and 2), participants have to process the test sequence frame by frame (mandatory task). They can also take advantage of the temporal continuity of the sequence (optional task).
The main novelty of this edition will be the availability of depth images, acquired with the kinect device. These images will be provided in addition to the visual images acquired with a perspective visual camera. Depth images are stored as visual one by using the openkinect library.
Participants are allowed to use additional tools to generate the 3D point cloud from these images.
![]() |
![]() |
| Visual Image | Depth Image |