Stereo features in a hierarchical feed-forward model
by , ,
Abstract:
Depth information is an important auxiliary component to biological and artificial visual systems alike. In an object detection context, it is usually used for object segmentation, for 3D localization of visually matched objects or for direct identification of objects without visual information. However, in all these approaches, depth annotations are handled independently of the visual object recognition process. Here, we present a novel, biologically motivated approach in which luminance and depth information are processed directly as compound features within a pattern matching hierarchy. We apply the model to train a feature dictionary from a dataset of 3D-rendered objects and show that resulting features include both depth and shape information. We show that these features can be used to improve performance on object detection as well as localization tasks. We further show that the depth annotations in the feature dictionary can be used to produce a 3D structure estimate if only 2D shape information is present. We hypothesize that binding of multiple submodalities into compound features may prove to be an important building block of how visual information is represented in the human brain and knowledge of this structure might helps us make artificial object recognition systems more robust.
Reference:
Stereo features in a hierarchical feed-forward model (S. Eberhardt, M. Fahle, C. Zetzsche), In 16. Anwendungsbezogener Workshop zur Erfassung, Modellierung, Verarbeitung und Auswertung von 3D-Daten, 2013.
Bibtex Entry:
@InProceedings{Eberhardt2013a,
  author    = {S. Eberhardt and M. Fahle and C. Zetzsche},
  title     = {Stereo features in a hierarchical feed-forward model},
  booktitle = {16. Anwendungsbezogener Workshop zur Erfassung, Modellierung, Verarbeitung und Auswertung von 3D-Daten},
  year      = {2013},
  abstract  = {Depth information is an important auxiliary component to biological and artificial visual systems alike. In an object detection context, it is usually used for object segmentation, for 3D localization of visually matched objects or for direct identification of objects without visual information. However, in all these approaches, depth annotations are handled independently of the visual object recognition process. Here, we present a novel, biologically motivated approach in which luminance and depth information are processed directly as compound features within a pattern matching hierarchy. We apply the model to train a feature dictionary from a dataset of 3D-rendered objects and show that resulting features include both depth and shape information. We show that these features can be used to improve performance on object detection as well as localization tasks. We further show that the depth annotations in the feature dictionary can be used to produce a 3D structure estimate if only 2D shape information is present. We hypothesize that binding of multiple submodalities into compound features may prove to be an important building block of how visual information is represented in the human brain and knowledge of this structure might helps us make artificial object recognition systems more robust.},
}