You are here

Robot Vision 2012

Welcome to the website of the 4th edition of the Robot Vision Challenge!

The fourth edition of the Robot Vision challenge follows three previous successful events. As for the previous editions, the challenge will address the problem of visual place classification, this time with the use of images acquired with the kinect depth sensor.



Mobile robot platform
used for data acquisition.

News

  • 01/03/2012 - Registration for the task is now open.
  • 09/03/2012 - Best Submissions awards and travel scholarships
    • There will be awards for the highest ranked runs, as well as for the most innovative approach as described in the ImageCLEF working notes paper (for submission deadline/details please see below). We will also offer a limited number of travel scholarships for students who will be the first author of the working note paper describing a submitted run.
  • 02/04/2012 - Training sequences are now available.
  • 21/05/2012 - Test Data Release.
  • 22/06/2012 - Release of preliminary results.
    • The results for the task has been released. The winner for the obligatory and the optional task is the CIII UTN FRC group.
  • 16/07/2012 - Journal Publications.
    • We are pleased to announce that Machine Vision and Application has accepted our proposal for a special issue on the ImageCLEF 2012 Robot Vision Task! The call for papers is expected to be published by the end of this year/first quarter of the next
  • 06/09/2012 - All data is now available.
  • LogoFB You can follow Robot Vision 2012 on Facebook LogoFB

Organisers

Contact

For any doubt related to the task, please refer to Jesus Martinez Gomez using his email: jesus.martinez@uclm.es


Overview


The fourth edition of the RobotVision challenge will focus on the problem of multi-modal place classification. Participants will be asked to classify functional areas on the basis of image sequences, captured by a perspective camera and a kinect mounted on a mobile robot within an office environment. Therefore, participants will have available visual (RGB) images and depth images generated from 3D cloud points.


The test sequence will be acquired within the same building and floor but there can be variations in the lighting conditions (sunny, cloudy, night) or the acquisition procedure (clockwise and counter clockwise). This edition will have several awards for the best submissions, judged in terms of performance and scientific contribution (see below for further details).


Citation


If you are using the RobotVision2012 sequences or information for your research, please consider cite the Overview Paper in your articles:

  • Martinez-Gomez, J., Garcia-Varea, I. and Caputo, Barbara. Overview of the ImageCLEF 2012 Robot Vision Task. In CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, September 17-20, 2012. I.S.B.N.: 978-88-904810-3-1


Schedule

  • 01/03/2012 - Registration open for the task.
  • 02/04/2012 - Training data and task release.
  • 21/05/2012 - Test data release
  • 15/06/2012 - Submission of runs
  • 22/06/2012 - Release of results
  • 06/09/2012 - Release of all data (including annotations for test sequences)
  • 17-22/09/2012 - CLEF 2012 conference in Rome


Results

Task 1 (Obligatory)
Groups Runs
# Group Score
1 CIII UTN FRC 2071
2 NUDT 1817
3 UAIC2012 1348
4 USUroom409 1225
5 SKB Kontur Labs 1028
6 CBIRITU 551
7 SIMD - baseline results 462
8 BuffaloVision -70
# Group Score
1 CIII UTN FRC 2071
2 NUDT 1817
3 NUDT 1729
4 UAIC2012 1348
5 USUroom409 1225
6 USUroom409 1225
7 USUroom409 1193
8 UAIC2012 1049
9 UAIC2012 1049
10 SKB Kontur Labs 1028
11 SKB Kontur Labs 1006
12 SKB Kontur Labs 997
13 CBIRITU 551
14 CBIRITU 542
15 SIMD - baseline results 462
16 BuffaloVision -70
17 BuffaloVision -110
18 BuffaloVision -234


Task 2 (Optional)
Groups Runs
# Group Score
1 CIII UTN FRC 3930
2 NUDT 3859
3 CBIRITU 3169
4 SIMD - baseline results 1041
# Group Score
1 CIII UTN FRC 3930
2 CIII UTN FRC 3925
3 NUDT 3859
4 CBIRITU 3169
5 SIMD - baseline results 1041

The Task


Two different tasks will be considered in this edition: task 1 and task 2. For both tasks, participants should be able to answer the question "where are you?" when presented with a test sequence imaging a room category seen during training.

The main difference between both tasks will be the presence (or lack) of kidnappings in the final test sequence and also the availability on the use of the temporal continuity of the sequence. The importance of kidnappings is explained below.


Task 1 (mandatory)

  • Test frames have to be classified without using the temporal continuity of the test sequence.
  • Lack of kidnappings in the final test sequence.

Task 2 (optional)

  • Participants can take advantage of the temporal continuity of the test sequence.
  • Presence of kidnappings in the final test sequence.
  • Additional points for fames after a kidnapping if they are correctly classified.


The data


The main novelty of this edition will be the availability of depth images, acquired with the kinect device. These images will be provided in addition to the visual images acquired with a perspective visual camera. Depth images are stored as visual ones by using the openkinect library.


Participants are allowed to use additional tools to generate the 3D point cloud from these images. We provide a python script depth2cloud.zip that generates .pcd 3D point cloud files from kinect depth images. Generated files are an approximation to real point cloud files and participants are allowed to modify/improve this script.

 
Example of RGB and Depth images


Depth to Cloud


We provide a python script depth2cloud.zip that generates .pcd 3D point cloud files from kinect depth images. Generated files are an approximation to real point cloud files and participants are allowed to modify/improve this script.

The script depth2cloud.py can simply be executed as a script. Given that Python is already installed, running the script without any parameters will produce the following usage note:

/======================================================================\
|                            depth2cloud.py                            |
|----------------------------------------------------------------------|
| RobotVision@ImageCLEF'12 Point Cloud Library Images Generator Script |
| Author: Jesus Martinez-Gomez                                         |
\======================================================================/

Error: Incorrect command line arguments.

Usage: depth2cloud.py input_depth_image output_cloud_image

Arguments:
  *input_depth_image  - Path to the input file. This file should be a .jpg dept h image from the RobotVision'12 datasets
  *output_cloud_image - Path to the output file. This file should be a .pcd file that will be created or overwritten

In Linux, it is sufficient to make the depth2cloud.py executable (chmod +x ./depth2cloud.py) and then type ./depth2cloud.py in the console. In Windows, the .py extension is usually assigned to the Python interpreter and typing depth2cloud.py in the console (cmd) is sufficient to produce the note presented above.

In order to show how it works, run the script with the parameters described above e.g. as follows: depth2cloud.py depth_2060.jpg depth_2060.pcd. The order will generate a new files while the output should be as follows:

Selected Arguments:
  input_depth_image  = depth_2060.jpg
  output_cloud_image = depth_2060.pcd

Reading colour image...

Input image information: 640 in width and 480 in heigh

Generating cloud file...

points 1 115261
points 2 115261
Done!

The new depth_2060.pcd image generated can visualized using the Point Cloudy Library viewer. The following image shows the original visual image (rgb_2060.jpg), the kinect depth (depth_2060.jpg) image and the result of visualizing the depth_2060.pcd file generated by using the depth2cloud.py script.

   
Visual, depth and 3D point cloud files

Sequences


Three training sequences will be provided for training and two additional for the final experiment

Rooms


These are all the rooms/categories that appear in the database

  • Corridor
  • ElevatorArea
  • PrinterRoom
  • LoungeArea
  • ProfessorOffice
  • StudentOffice
  • VisioConference
  • TechnicalRoom
  • Toilet




Awards


There will be (at least) the following awards

  • Best performance on the obligatory track
  • Best Student
  • Scientific Innovation




Scholarships


There will be a limited number of travel scholarships (around 500 CHF ) for those students that have submitted, at least, one run and are first author of the corresponding working note paper.

How to apply for the scholarships and more details will be given in next weeks.




Journal Publications


  • We have been confirmed by the Machine Vision and Application journal that there will be a special issue on the ImageCLEF 2012 Robot Vision task.
  • All groups that have participated to the tasks are warmly invited to submit to the journal special issue, as well as other groups who have registered for the data but eventually not submitted any run, as well as researchers interested in the topic.
  • The official call for paper will be published by the end of this year/beginning of the next.
  • Publication will follow the peer-review process, meaning that all papers will go through the review process --no invited papers
  • For more information about the journal, please refer to http://www.springer.com/computer/image+processing/journal/138

Machine Vision and Applications: Special issue on Benchmark Evaluation of RGB-D based Visual Recognition Algorithms

Visual recognition is a critical component of machine intelligence. For a robot to behave autonomously, it must have the ability to recognize its surroundings (I am in the office; I am in the kitchen; On my right is a refrigerator). Natural human computer interaction requires the computer to have the ability to recognize human’s gestures, body languages, and intentions. Recently, the availability of cheap 3D sensors such as Microsoft Kinect has made it possible to easily capture depth maps in real time, and therefore use them for various visual recognition tasks including indoor place recognition, object recognition, and human gesture and action recognition. This in turn poses interesting technical questions such as:

1. What are the most discriminative visual features from 3D depth maps? Even though one could treat depth maps as gray images, depth maps consist of strong 3D shape information. How to encode the 3D shape information is an important issue for any visual recognition tasks.

2. How to combine depth maps and RGB images? An RGB-D sensor such as Microsoft Kinect provides a depth channel as well as a color channel. The depth map contains shape information while the color channel contains texture information. The two channels complement each other, and how to combine them in an effective way is an interesting problem.

3. What are the most suitable paradigms for recognition with RGB-D data? With depth maps, foreground background separations are easier, and in general, better object segmentations can be obtained than with conventional RGB images. Therefore the conventional bag of feature approaches may not be the most effective approaches. New recognition paradigms that leverage depth information are worth exploring.

Scope



This special issue covers all aspects of RGB-D based visual recognition. It emphasizes on the evaluation on two benchmark tasks: ImageCLEF Robotic Vision Challenge (http://www.imageclef.org/2012/robot) and CHALEARN Gesture Challenge (http://gesture.chalearn.org/). The special issue is also open to researchers that did not submit runs to either of the two challenges, provided they will test their methods on at least one of the two datasets. In addition to the two benchmark tasks, researchers are welcome to report experiments on other datasets to further validate their techniques.
Topics include but are not limited to:

  • new machine learning techniques that are successfully applied to either of the two benchmark tasks
  • novel visual representations that leverage the depth data
  • novel recognition paradigms
  • techniques that effectively combine RGB features and depth features
  • analysis of the results of the evaluation on either of the two benchmark tasks
  • theoretical and/or practical insights into the problems for the semantic spatial modeling task, and/or for the robot kidnapping task in ImageCLEF Robotic Vision Challenge
  • theoretical and/or practical insights into the one-shot recognition problem in the CHALEARN Gesture Challenge
  • computational constraints of methods in realistic settings
  • new metrics for performance evaluations

Information for Authors:


Authors should prepare their manuscripts according to the author guideline from the online submission page of Machine Vision and Applications (http://www.editorialmanager.com/mvap/).

Important Dates (tentative):


  • Manuscript submission deadline: January 30, 2013
  • First round review decision: May, 2013
  • Second round review decision: September, 2013
  • Final manuscript due: November, 2013
  • Expected publication date: January, 2014

Guest Editors:


  • Barbara Caputo, Idiap Research Institute, Switzerland
  • Markus Vincze, The Institute of Automation and Control Engineering, Austria
  • Vittorio Murino, Istituto Italiano di Tecnologia, Italy
  • Zicheng Liu, Microsoft Research, United States


Performance evaluation


The following rules are used when calculating the final score for a run:

Task 1:

  • For each correctly classified frame: +1 points
  • For each misclassified frame: -1 points
  • For each frame that was not classified: +0 points

Task 2:

  • For each correctly classified frame: +1 points
  • For each misclassified frame: -1 points
  • For each frame that was not classified: +0 points
  • Additional points for kidnappings
    • All the WindowSize frames after a kidnapping will obtain an additional point if they are correctly classified.
    • No additional penalization is applied for misclassified frames
    • WindowSize = 4 for the final test

Window Size

Performance Evaluation Script


Python module/script is provided for evaluating performance of the algorithms on the test/validation sequence. The script and some examples are available:

Python is required in order to use the module or execute the script. Python is available for Unix/Linux, Windows, and Mac OSX and can be downloaded from http://www.python.org/download/. The knowledge of Python is not required in order to simply run the script; however, basic knowledge might be useful since it can also be integrated with other scripts as a module. A good quick guide to Python can be found at http://rgruet.free.fr/PQR26/PQR2.6.html.

The archive contains five files:

  • robotvision.py - the main Python script/module
  • example.py - small example illustrating how to use robotvision.py as a module
  • example1.results - example of a file containing fake results for the training 1 sequence
  • example2.results - example of a file containing fake results for the training 2 sequence
  • example3.results - example of a file containing fake results for the training 3 sequence

When using the script/module, the following codes should be used to represent a room category:

  • Corridor
  • ElevatorArea
  • PrinterRoom
  • LoungeArea
  • ProfessorOffice
  • StudentOffice
  • VisioConference
  • TechnicalRoom
  • Toilet
  • empty string - no result provided

The script calculates the final score by comparing the results to the groundtruth encoded as part of its contents. The score is calculated for one set of training/validation/testing sequences.


Using robotvision.py as a script


robotvision.py can simply be executed as a script. Given that Python is already installed, running the script without any parameters will produce the following usage note:

/========================================================\
|                     robotvision.py                     |
|--------------------------------------------------------|
| RobotVision@ImageCLEF'12 Performance Evaluation Script |
| Author: Jesus Martinez-Gomez, Ismael Garcia-Varea      |
\========================================================/

Error: Incorrect command line arguments.

Usage: robotvision.py results_file test_sequence task_number

Arguments:
  *results_file  - Path to the results file. Each line in the file
                    represents a classification result for a single
                    image and should be formatted as follows:
                     
  *test_sequence - ID of the test sequence: 'training1', 'training2' or 'training3'
  *task_number   - # number of the task: task1 (without temporal continuity ) or task2 (temporal continuity and kidnapping)

In Linux, it is sufficient to make the robotvision.py executable (chmod +x ./robotvision.py) and then type ./robotvision.py in the console. In Windows, the .py extension is usually assigned to the Python interpreter and typing robotvision.py in the console (cmd) is sufficient to produce the note presented above.

In order to obtain the final score for a given training sequence, run the script with the parameters described above e.g. as follows:
robotvision.py example2.results training2 task1

The command will produce the score for the results taken from the example2.results file obtained for the training2 sequence and task 2. The outcome should be as follows:

Selected Arguments:
  results_file  = example2.results
  test_sequence = training2
  task_number   = task2

Calculating the score...
Done!

===================
Final score: -1208.0
===================

Each line in the results file should represent a classification result for a single image. Since each image can be uniquely identified by its frame number, each line should be formatted as follows:
<frame_number> <area_label>
As indicated above, <area_label> can be left empty and the image will not contribute to the final score (+0.0 points).


Using robotvision.py as a module in other scripts


robotvision.py can also be used as a module within other Python scripts. This might be useful in case when the results are calculated using Python and stored as a list. In order to use the module, import it as shown in the example.py script and execute the evaluate function.

The function evaluate is defined as follows:

def evaluate(results, testSequence, unknownRooms = [])

The function returns the final score for the given results and test sequence ID.

The function should be executed as follows:
score = robotvision.evaluate(results, testSequence, task)
with the following parameters:

  • results - results table of the following format:
    results = [ ("<frame_numer1>", "<area_label1>"), ..., ("<frame_numberN>", "<area_labelN>") ]
  • testSequence - ID of the test sequence, use "training1" "training2" or "training3"
  • task - ID of the task, use "task1" or "task2"


Kidnappings


The main difference between sequences of frames with and without kidnappings relies on the room changes. Room changes in sequences without kidnappings are usually represented by a small number of images showing a transition as the one shown in the image below.

No Kidnapping

On the other side, when kidnappings are present room changes are represented by a drastic change for frames. This situation is represented below.

Kidnapping


Useful information for participants

The organizers propose the use of several techniques for features extraction and cue integration. Thanks to these well documented techniques with open source available, participants can focus on the development of visual features while using the proposed method for cue integration or vice versa.

In addition to feature extraction and integration, the organizers also provide useful information as the point cloud library and a technique for taking advantage of the temporal continuity.

________________________________________________________________________________________________________________

Features generation


Visual images:

  • Pyramid Histogram of Oriented Gradients (PHOG)
    • Web page: Phog Page
    • Article to refer: A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international
      conference on Image and video retrieval. ACM, 2007, p. 408.
    • Source code for download (Matlab): Phog Code

Depth images:

  • Normal Aligned Radial Feature (NARF)
    • Article to refer: Steder, B.; Rusu, R.B.; Konolige, K.; Burgard, W.; , "Point feature extraction on 3D range scans taking into account object boundaries," Robotics and Automation (ICRA), 2011 IEEE International Conference on , vol., no., pp.2601-2608, 9-13 May 2011
    • Source code for download (C++): How to extract NARF Features from a range image

________________________________________________________________________________________________________________

Cue integration


Learning the classifier

  • Online-Batch Strongly Convex mUlti keRnel lEarning: OBSCURE
    • Article to refer: Orabona, F.; Luo Jie; Caputo, B.; , "Online-batch strongly convex Multi Kernel Learning," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.787-794, 13-18 June 2010
    • Source code for download (Matlab): Dogma

________________________________________________________________________________________________________________

Temporal Continuity


Selecting a prior class for low confidence decissions

  • Idiap 2010 RobotVision Proposal
    • Technique: Once a frame has been identified as low confidence, we use the classification results obtained for the last n frames to
      solve the ambiguity: if all the last n frames have been assigned to the same class Ci, then we can conclude that all frames come from the same class Ci, and the label will be assigned accordingly.
    • Article to refer: Martinez-Gomez, J. and Caputo, B. (2011). Towards Semi-Supervised Learning of Semantic Spatial Concepts. In ICRA 2011. Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China. Pages 1936 - 1943
    • Source code for download : not necessary
    • Contact person: jesus.martinez@uclm.es

________________________________________________________________________________________________________________

3D point cloud processing


Framework with numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.

Sponsors:

SNSF (Swiss National Science foundation )vision@home

Acknowledgments:

This work is partially supported by the Spanish MICINN under projects MIPRCV Consolider Ingenio 2010 (CSD2007-00018) and MD-PGMs CICYT (TIN2010-20900-C04-03)
MIPRCV logo

AttachmentSize
Image icon depth_2060pcd.jpg70.59 KB
Plain text icon depth.txt53.06 KB
Plain text icon rgb.txt49.32 KB