876IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,VOL.17,NO.7,JULY2007 2-D Feature-Point Selection and Tracking Using3-D Physics-Based Deformable Surfaces

Michail Krinidis,Nikos Nikolaidis,and Ioannis Pitas

Abstract—This paper presents a novel approach for selecting and tracking feature points in video sequences.In this approach, the image intensity is represented by a3-D deformable surface model.The proposed approach relies on selecting and tracking feature points by exploiting the so-called generalized displacement vector that appears in the explicit surface deformation governing equations.This vector is proven to be a combination of the output of various line-and edge-detection masks,thus leading to distinct, robust features.The proposed method was compared,in terms of tracking accuracy and robustness,with a well-known tracking al-gorithm,Kanade–Lucas–Tomasi(KLT),and a tracking algorithm based on scale-invariant feature transform(SIFT)features.The proposed method was experimentally shown to be more precise and robust than both KLT and SIFT tracking.Moreover,the fea-ture-point selection scheme was tested against the SIFT and Harris feature points,and it was demonstrated to provide superior results. Index Terms—Feature point selection,tracking,intensity sur-face,video analysis,3-D deformable models.


T RACKING objects in video sequences is a frequently encountered task in video-based applications,such as surveillance,hand-gesture recognition,human–computer inter-action,smart environments,motion capture for virtual reality and computer animation,video editing,medical and meteorological imaging,and3-D scene reconstruction from uncalibrated video. Thus,in the last two decades,intensive research has been carried out in this area.Building a tracking system is far from being a simple process due to varying lighting conditions,partial occlu-sions,clutter,unconstrained motion,and so on.So far,various systems for person,face,and object tracking have been presented in the literature.These systems can be broadly divided in four categories:

?color-based tracking;

?template-based tracking;

?contour tracking;

?feature-based tracking.

Additional information about the aforementioned tracking cat-egories can be found in the excellent review publications that have appeared in the literature[1]–[5].

Color is a distinctive object feature and,therefore,is useful for object localization on static and video images.Color in-

This work was conducted in conjunction with the"SIMILAR"European Network of Excellence on Multimodal Interfaces of the IST Programme of the European Union.

The authors are with the Department of Informatics,Aristotle University of Thessaloniki,54124Thessaloniki,Greece(e-mail:mkrinidi@aiia.csd.auth.gr; nikolaid@aiia.csd.auth.gr;pitas@aiia.csd.auth.gr).

Digital Object Identi?er10.1109/TCSVT.2007.897463formation produces satisfactory tracking results and allows fast processing,which is important for a tracking system that needs to run at a reasonable frame rate.Many approaches are based on color histograms while some others use global color reference models[6],[7].

Template matching techniques are used by many researchers to perform object tracking by following the same principles with the template matching techniques used in object recognition [8],[9].Template-based tracking involves the use of multiple templates or template warping to accommodate changes in ob-ject pose.The process of determining correspondences between image and template pixels is computationally expensive but pro-vides robust tracking results.

Tracking using outline contour information is easier than modeling and tracking the entire object area,e.g.,when using color.Moreover,contour tracking is more robust than using simple corner or edge tracking,since it can be adapted to cope with partial occlusions.The active contour representation introduced by Kass et al.[10]is the most popular method for contour delineation and tracking.

Feature-based tracking is a frequently used approach,in which moving objects are represented by feature points de-tected prior to tracking or during tracking.Feature-based tracking,though prone to tracking errors,can be implemented very ef?ciently and is important in many time-critical appli-cations.The selection of the feature points depends on the algorithm and usually is based on speci?c features(local image properties)of these points.A signi?cant number of feature tracking algorithms have been introduced trying,in general, to correlate image features from frame to frame.In[11],the feature points are stochastically selected based on the energy of their Gabor wavelet transform coef?cients.The global place-ment of the feature points is determined by a2-D mesh,using the area of the triangles formed by the feature points.This method uses a local feature vector containing Gabor wavelet transform coef?cients and a global feature vector containing triangle areas.In order to?nd the corresponding features in the next frame,the2-D golden section algorithm is employed. Mid-level features(strokes)are used in[12]instead of low-level ones(edge points).Strokes are accomplished by organizing edge points through an edge-linking operation.Two labels (valid/invalid)are considered for each stroke,and a probability is assigned to each of them.In this way,all of the strokes con-tribute to track the moving object but with different weights.In [13],multiple features were used in order to improve the ac-curacy and robustness of a real-time tracker.More speci?cally, color histogram features combined with edge-gradient-based shape features were tracked over time under a Monte Carlo



framework.A comparison of four feature-point tracking al-gorithms is given in[14].Many researchers,instead of trying to improve the tracking performance through the selection of “good”features,exploited the knowledge of how a tracker works and tried to impose several constraints so as to improve the tracking of feature points[15]–[17].

In most of the cases,an initialization step which depends on the tracking algorithm is applied prior to tracking and de?nes the area of points that will be tracked.In feature-based algo-rithms,several feature-point selection strategies can be used. The goal is to obtain distinctive feature points on the image that are appropriate for tracking.Many of these feature points are also used for image matching applications,e.g.,for?nding the correspondences between two views of the same scene.Lowe proposed the scale-invariant feature transform(SIFT)feature points[18],which are scale-,rotation-,and partially illumi-nation-invariant.The SIFT feature points were used for image matching and image retrieval.Harris et al.[19]proposed a com-bined edge and corner detector which provides feature points that exhibit high“cornerness”and thus are suitable for tracking. In[20],the feature points are extracted based on the eigen-values of an image gradient matrix constructed over a window around the candidate feature point.If the minimum eigenvalue of this matrix is larger than a user-de?ned threshold,then the feature point is considered to be good for tracking.The ex-tracted feature points are optimal for the tracking algorithm pre-sented in the same paper.Moreover,a scheme for the selec-tion of discriminative tracking features was proposed in[21]. Given a set of features,the log-likelihood ratios of class condi-tional sample densities from the objects of interest and the back-ground were computed,to form a new set of candidate features tailored to the local object/background discrimination task.The two-class variance ratio is used to rank these new features ac-cording to how well they separate sample distributions of object and background pixels.This feature evaluation mechanism is embedded in a mean-shift tracking system that adaptively se-lects the top-ranked discriminative features for tracking.

A novel feature selection and tracking algorithm is proposed in this paper.The approach was motivated by the technique pre-sented in[22]–[24],which aims at analyzing nonrigid object motion,with application to medical images.Nastar et al.[22] used deformable models to approximate the dynamic object sur-face deformations in time sequences of volume data(i.e.,se-quences of3-D data)and applied modal analysis techniques(a standard engineering technique that allows more effective com-putations and provides a closed-form solution of the deforma-tion process)in order to describe and analyze the deformations. The framework proposed in this paper has been also exploited for the alignment of serially acquired slices[24],for multi-modal brain image analysis[23]and segmentation of2-D ob-jects[25].In our case,the deformable model formulation is used in a totally different and novel application,i.e.,that of feature-point tracking.We assume that the image intensity in each video frame can be approximated by a deformable“intensity”sur-face,where we select and track characteristic feature points.The proposed technique exploits a byproduct of the explicit surface deformation governing equations,in order to select and subse-quently track distinctive feature points.More speci?cally,the feature-point selection process utilizes the so-called generalized displacement vector[22],which is shown to be a novel combi-nation of the output of various line and edge detection masks and,thus,produces feature points corresponding to local edges, lines,corners,or other characteristic image features that are suit-able for tracking.The connection between the deformable sur-face model and the line/edge detector operators is an important outcome of this study.The tracking procedure that follows the feature selection is based on measuring and matching the gen-eralized displacement vector of the feature points from frame to frame.

In summary,the novelty of this paper lies in the use of a de-formable surface to approximate the image intensity surface and the consequent use of a term appearing in the deformation pro-cedure to perform robust feature-point selection and tracking. With respect to the deformable model and its modal analysis as introduced in[26]and[22]and further used in[23]and[24], the novelty lies in the use of the model for a different applica-tion(i.e.,that of feature selection and tracking),the use of dif-ferent external forces that attract the model towards the image intensity(as will be described in Section II)and the use of an intermediate result(generalized displacement vector)of the de-formation procedure instead of using the model per se. Compared with existing feature selection and tracking algo-rithms,namely the Kanade–Lucas–Tomasi(KLT)[27]and the SIFT algorithm[28],the proposed method achieves better per-formance in terms of tracking accuracy and robustness.The re-sults show that the proposed method is robust against rotations, zooming,varying lighting conditions,and hard shadows and can track the selected features for long time periods.Moreover,the feature-point selection part of the algorithm was compared with other feature selection algorithms,i.e.,the SIFT[18]and Harris [19]feature-point detectors,and was shown to provide superior results in their subsequent tracking.

The remainder of this paper is organized as follows.In Section II,a brief description of the deformation procedure based on modal analysis is presented.The feature-point se-lection procedure is introduced in Section III.The tracking algorithm is described in Section IV.The performance of the proposed technique,as well as a comparison between the proposed algorithm and the well-known KLT feature-based tracking algorithm[27],tracking using SIFT feature points [18],and feature selection using SIFT and Harris[19]feature detectors are presented in Section V.Final conclusions are drawn in Section VI.



intensity can be assumed to de?ne a surface over the image

domain that will be subsequently called intensity surface.The proposed tracking approach focuses on parameterizing the3-D space de?ned

by that is called

the space[29].A3-D physics-based deformable surface model,introduced in[22],[23],and[26],is used for this purpose.In this section,the methodology described in these papers will be brie?y reviewed,so as to make this paper self-contained.For more details,including the assumptions that are involved,interested readers can consult the abovementioned papers.


