Viewer centered object recognition with mobile computing


Intézmény: Pannon Egyetem
informatikai tudományok
Informatikai Tudományok Doktori Iskola

témavezető: Czúni László
társ-témavezető: Kató Zoltán
helyszín (magyar oldal): Pannon Egyetem, Műszaki Informatikai Kar, Villamosmérnöki és Információs Rendszerek Tanszék
helyszín rövidítés: PE

A kutatási téma leírása:

There are several psychophysical supports for two-dimensional view interpolation theory for object recognition. In [Bülthoff] it is suggested that the human visual system can be described by recognizing 3D objects by 2D view interpolation. In [Fang] viewpoint aftereffects also prove that object-selective neurons can be tuned to specific viewing angles in the human visual system.
Viewer centered recognition methods can be considered as early attempts for the recognition of 3D objects. The idea of storing only a limited number of views of 3D objects and then applying transformations to find correspondence with other views already appear for example in [Basri] where novel views are generated by the linear combination of stored ones. Rigid objects with smooth surfaces and articulated objects could also be represented this way.
Recently used multilayer deep learning recognition approaches discover intricate structure in large data sets by using the back propagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation of the previous layer. While there are such successful techniques for object recognition in large databases [Szegedy], [Krizhevsky], these techniques require tremendous performance regarding processing power and memory.
Handheld 3D object recognition is a difficult task due to changing viewpoints, varying 3D to 2D projections, possible different noises (e.g. motion blur, color distortion), and the limited computational performance and memory. Local feature descriptors (like SIFT, FAST, etc) are often used for view centred recognition. In [Noor] the underlying topological structure of an image dataset was generated as a neighborhood graph of features. Motion continuity in the query video was exploited to demonstrate that the results obtained using a video sequence are much robust than using a single image.

It is obvious that video gives much more information about 3D objects than simply 2D projections. Not only the different views of the objects can be recorded but the 3D structure can be reconstructed by structure from motion techniques. However, these later approaches require good quality images and camera calibration with relatively large computational power still far from most of the mobile computing platforms and intelligent sensor motes. Luckily mobile computing devices often contain inertial measurements units (IMUs) and the calibration of cameras can be combined with IMUs [Hol]. However, it is still an open question how to exploit the IMUs in video recognition without going through the structure from motion processing methodology. Our research is focused on a viewer centered recognition model where the relative position of the target object and the camera is utilized. Our preliminary experiments already showed [Czuni] that IMUs can help in the recognition process with low computational demands. However, fast object tracking and/or segmentation still can be a problem in this framework being also a subject for research. Most object recognition are “passive” from the model side. We propose to build up model-driven interactive retrieval methods where the search engine gives hint how to move the camera around the object to get the fastest and most reliable recognition result.

H. H. Bülthoff and S. Edelman, “Psychophysical support for a two-dimensional view interpolation theory of object recognition,” Proceedings of the National Academy of Sciences, vol. 89, no. 1, pp. 60–64, 1992.
F. Fang and S. He, “Viewer-centered object representation in the human visual system revealed by viewpoint aftereffects,” Neuron, vol. 45, no. 5, pp. 793–800, 2005.
R. Basri, “Viewer-centered representations in object recognition: A computational approach,” in Handbook of pattern recognition & computer vision. World Scientific Publishing Co., Inc., 1993, pp. 863–882.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR 2015, 2015. [Online]. Available:
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 2012, pp. 1106–1114. [Online]. Available: 0534.pdf [21]
H. Noor, S. H. Mirza, Y. Sheikh, A. Jain, and M. Shah, “Model generation for video-based object recognition,” in Proceedings of the 14th ACM International Conference on Multimedia, ser. MM ’06. New York, NY, USA: ACM, 2006, pp. 715–718. [Online]. Available:
J. D. Hol, T. B. Sch¨on, and F. Gustafsson, “A new algorithm for calibrating a combined camera and imu sensor unit,” in Control, Automation, Robotics and Vision, 2008. ICARCV 2008. 10th International Conference on. IEEE, 2008, pp. 1857–1862.
L. Czuni and M. Rashad, “Lightweight video object recognition based on sensor fusion,” in Computational Intelligence for Multimedia Understanding (IWCIM), 2015 International Workshop on, Oct 2015, pp. 1–5

felvehető hallgatók száma: 1

Jelentkezési határidő: 2016-11-15

2017. I. 31.
ODT ülés
Az ODT következő ülésére 2017. március 10-én 10.00 órakor kerül sor a Semmelweis Egyetem Szenátusi termében (Bp. Üllői út 26. I. emelet).

Minden jog fenntartva © 2007, Országos Doktori Tanács - a doktori adatbázis nyilvántartási száma az adatvédelmi biztosnál: 02003/0001. Program verzió: 1.2334 ( 2017. I. 15. )