You are here

Segmented and Annotated IAPR TC-12 dataset

1 Introduction and Overview

This site describes the segmented and annotated IAPR-TC12 benchmark (SAIAPR TC-12): an extension of the IAPR TC-12 collection for the evaluation of automatic image annotation methods and for studying their impact on multimedia information retrieval. This includes the pictures from the IAPR TC-12 collection plus:

  • Segmentation masks and segmented images for the 20,000 pictures;
  • Features extracted from the regions and labels assigned to them;
  • Region-level annotations according an annotation hierarchy;
  • Spatial relationships information.

Each image has been manually segmented and the resultant regions have been annotated according to a predefined vocabulary of labels; the vocabulary is organized according to a hierarchy of concepts. Visual features have been extracted from each region.

The SAIAPR TC-12 Benchmark is now publicly available. Information on how to access (and download) the complete benchmark is given below, while Section 4 provides links to related publications.

2 Collection Content

The following resources constitute the SAIAPR TC-12 resource:

  • Segmentation masks.. One per region: 99,535 files; one per image: 20,000 files. Each object of reasonable size is segmented by using ISATOOL. In average 5 objects per image have been segmented. The average area of such objects is of ~16% of the total of their respective image. The resultant segmented images are provided as well.
  • Annotations. One per region: 99,535 regions were manually annotated. Each segmented region is assigned a label from a carefully defined vocabulary, see [1]; the annotation vocabulary has been organized according to a conceptual hierarchy. For annotation the annotator went through the hierarchy from top to bottom looking for the best label for each object.
  • Spatial relationships. One per image: 20,000 files. The following relationships have been calculated for each pair of regions in every image: adjacent, disjoint, beside, X-aligned, above, below and Y-aligned.
  • Visual features. A vector of features per region: 99,535 vectors of attributes. The following features have been extracted from each region: area, boundary/area, width and height of the region, average and standard deviation in x and y, convexity, average, standard deviation and skewness in both color spaces RGB and CIE-Lab.

The benchmark includes the 20,000 segmented images. Here are a few example images from the SAIAPR TC-12 collection:

3 On the Annotation Vocabulary

To annotate segmented regions in SAIAPR-TC12 a hierarchical organization for the annotation vocabulary was defined. According to this hierarchy an object can lie into one of six main branches: Humans, Animals, Food, Landscape-Nature, Man-made and Other . Here is the branch Humans (click in the respective names to visualize the corresponding branches).

The path in the hierarchy for the regions in the 20,000 images is provided with the collection

4 Access and Download

The following archive contains the complete SAIAPR TC-12 Benchmark, which is now available free of charge and without any copyright restrictions:

Important: Because of its size, the benchmark has been divided and compressed into 3 files (using winrar®), please follow the following instructions:


In publications based on the SAIAPR TC-12 Benchmark and/or the use of its data or a subset thereof, please cite the following publication:

The Segmented and Annotated IAPR TC-12 Benchmark.. Escalante, H. J., Hernández, C., Gonzalez, J.., López, A., Montes, M., Morales, E., Sucar, E., , L., Grubinger, M.. Computer Vision and Image Understanding, doi:10.1016/j.cviu.2009.03.008, 2009.

A preprint of that paper is included with the collection. Additional information on this data is available from the TIA - INAOE mirror page:

5 Related Publications

[1] Escalante, H. J., Hernández, C., Gonzalez, J.., López, A., Montes, M., Morales, E., Sucar, E., , L., Grubinger, M.: The Segmented and Annotated IAPR TC-12 Benchmark. Computer Vision and Image Understanding, doi:10.1016/j.cviu.2009.03.008, 2009.
[2] Clement H.C. Leung, Horace Ip: Benchmarking for Content Based Visual Information Search. Proceedings of the Fourth International Conference on Visual Information Systems (VISUAL'2000), number 1929 in Lecture Notes in Computer Science, pages 442 - 456, Lyon, France. Springer Verlag.
[3] Michael Grubinger. "Analysis and Evaluation of Visual Information Systems Performance". PhD Thesis. School of Computer Science and Mathematics Faculty of Health, Engineering and Science Victoria University, Melbourne, Australia, 2007.
[4] Michael Grubinger, Paul D. Clough, Henning Müller, Thomas Deselaers: The IAPR TC-12 Benchmark - A New Evaluation Resource for Visual Information Systems. Proceedings of the International Workshop OntoImage'2006 Language Resources for Content-Based Image Retrieval, held in conjunction with LREC'06, pages 13 - 23, Genoa, Italy, May 2006.
[5] Hugo Jair Escalante, Manuel Montes, L. Enrique Sucar, Michael Grubinger: Towards a Region-Level Automatic Image Annotation Benchmark. Proceedings of the Third Workshop on Image and Video Retrieval Evaluation, pages 64-73, Budapest, Hungary, 2007.

6 Acknowledgements

We are very grateful with Thomas Deselaers for his support on the storage and management of the data.


Please send any feedback, comments & suggestions to:
Hugo Jair Escalante.
National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro # 1, Puebla, 72840, Mexico.
hugojair at ccc dot inaoep dot mx