搜档网
当前位置:搜档网 › cvpr13_Minimum Uncertainty Gap for Robust Visual Tracking

cvpr13_Minimum Uncertainty Gap for Robust Visual Tracking

Minimum Uncertainty Gap for Robust Visual Tracking

Junseok

Department of ECE,ASRI,National University,

{paradis0,kyoungmu}@snu.ac.kr,http://cv.snu.ac.kr

Abstract

We propose a novel tracking algorithm that

tracks the target by?nding the state which minimizes

tainty of the likelihood at current state.The

the likelihood is estimated by obtaining the gap

lower and upper bounds of the likelihood.By

the gap between the two bounds,our method?nds the

?dent and reliable state of the target.In the paper,the

that gives the Minimum Uncertainty Gap(MUG)

likelihood bounds is shown to be more reliable than the

which gives the maximum likelihood only,especially

there are severe illumination changes,occlusions,and

variations.A rigorous derivation of the lower and

bounds of the likelihood for the visual tracking

provided to address this issue.Additionally,an

ference algorithm using Interacting Markov Chain

Carlo is presented to?nd the best state that

average of the lower and upper bounds of the

and minimizes the gap between two bounds

Experimental results demonstrate that our method

fully tracks the target in realistic videos and

conventional tracking methods.

1.Introduction

The objective of the tracking problem is to track the

get accurately in the real environments[3,5,6,8,9,

13,15,21,31,35,36].For robust tracking,most

ventional tracking methods formulate the tracking

lem by the Bayesian framework[10,20,18,26,27,

33,34].In the Bayesian tracking approach,the

of the tracking problem is to?nd the best state,

maximizes the posterior probability p(X t|Y1:t).

called as the Maximum a Posterior(MAP)

?X MAP t =arg

X t

max p(X t|Y1:t),where?X MAP

t

best(MAP)state at time t.To obtain the MAP state,the

method searches for the state that maximizes the likelihood

p(Y t|X t),which is near the previous state as a prior.In this case,the likelihood is typically calculated by measuring the

similarity between the observation Y t at the state X t and

Lower & upper bounds of likelihood

Frame

(a)likelihood bounds?

Conventional approach

Finite number of target models Infinite number of target models

(c)How our method ef?ciently employs target models? Figure1.Basic idea of the proposed

bounds(uncertainty)are formed

tions due to different target models

updating strategies during the tracking

between the upper and lower bounds

ing state gives very different answers

the target models used(red and

lihood obtained by using set of all

That means the likelihood estimation

and unreliable.So,our method tries

imum gap(uncertainty),which

hoods),regardless of the target models.

average likelihood bound at the

state,which con?dently maximizes

posed method only compares two

while utilizing the in?nite number

ate lower and upper bounds of the

other methods compares all the?nite

evaluate the likelihood.

the target model M t at time t.

In this case,the MAP estimation

state produces the highest likelihood

vious state.However,in the

sumption is not valid unless the target model M t is always correct.In practice,the target model is easily corrupted and distorted during the online update.To deal with severe appearance changes,many tracking algorithms evolve the target model with online update.However,because of the

A

B

MAX : 0.51GAP : 0.05

MAX : 0.40GAP : 0.01

MUG

MAP

Lower & upper bounds of the likelihood Frame

Gap between lower & upper bounds

Frame

(a)MAP and MUG (b)Likelihood bounds and their gap

Figure 2.Example of our tracking results in skating1seq.Our method successfully tracks the target using the MUG estimation,whereas the conventional methods fail to track it using the MAP estimation.The MUG estimation ?nds the true state A of the tar-get because the gap between the likelihood bounds in State A is smaller than that in State B .On the other hand,the MAP estima-tion ?nds the wrong state B because the posterior probability in State B is larger than that in State A .

tracking error,the target model includes more background and becomes erroneous as time goes on.Eventually,the conventional trackers drift into the background and fail to track the target.This drift phenomenon frequently occurs even though the methods ?nd the optimal MAP state be-cause of the noisy target model.According to this fun-damental inherent problem,the conventional MAP-based tracking approach need to be reconsidered.

rede?ne the goal of tracking prob-state that maximizes the average and,at the same time,minimizes of the likelihood .We call this the Gap (MUG)estimation.Note that in general tracking problem,the upper and lower bounds or the uncertainties of the likelihoods are naturally formed since many different likelihoods are made by different tar-get models that are the reference appearances of the target.The different target models are usually constructed due to the different updating strategies during the tracking process [23].Specially when there exist severe occlusions,illumi-nation changes,and so on,the likelihood uncertainty be-comes larger,as empirically demonstrated in Fig.1(a).This is because the distractors such as occlusions and illumina-tion changes usually make the target models to be much different with each other.

Since the different likelihoods can be generated by dif-ferent target models,obtaining the likelihood bounds is the same as considering all possible target models that could be https://www.sodocs.net/doc/f110115916.html,ing the likelihood bounds,the proposed method can ?nd the good target state because it implicitly covers all possible appearance changes of the target with all possible target models.Thus,as illustrated in Fig.1(b),the large gap between the upper and lower bounds indicates that the corresponding state can have either a very good likeli-hood or a very bad likelihood depending on the employed target model.In this case,the likelihood estimation over the state is easily affected by the noisy target models and the es-timated likelihood is uncertain and not reliable.Hence,by minimizing the gap between the two likelihood bounds,the proposed method can ?nd the con?dent state of the target.MUG is also affected by aforementioned distractors (out-

Nevertheless,MUG is more ro-MUG provides the con?dence estimation,whereas MAP can-usually produce low con?dence can easily identify and avoid values of them.To mea-likelihood,our method estimates of the likelihood,minimizes the accurately tracks the target,as multiple target models,the the target successfully with the The MIL tracker [2]target appearance using mul-model and robustly tracked multiple instance learning algo-works which utilized a rela-models,our method implic-models with the likelihood with the likelihood bound,an ef?cient L1tracker Re-sampling (BPR)technique bound of the likelihood.How-BPR technique to speed up the accuracy.Our method uti-lizes the likelihood bounds to measure uncertainty of the likelihood and enhances the accuracy of visual tracking.In sampling based tracking methods,the particle ?lter [12]handled the non-Gaussianity of the target distribution in the tracking problems.In [16,32],the Markov Chain Monte Carlo (MCMC)method coped with the high-dimensional state spaces,whereas the joint particle ?lter was not able to.The Interacting Markov Chain Monte Carlo (IMCMC)method [19]required a relatively small number of sam-ples by exchanging information about good states between Markov Chains.Our method employs the IMCMC method.However,our method uses the samples to obtain the state that minimizes the uncertainty of the target distribution,whereas the samples in the other methods are used to ob-tain the maximum of the target distribution.

The ?rst contribution of the paper is a novel tracking framework that utilizes the MUG instead of the MAP es-timation.By ?nding the state that minimizes the gap be-tween the likelihood bounds,our method can cope with the drift problem caused by a noisy target model more ro-bustly than the conventional MAP-based approach,and suc-cessfully tracks the target when there are severe illumina-tion changes,occlusions,and pose variations.The second contribution is a rigorous derivation of the lower and up-per bounds of the likelihood in the visual tracking prob-lem.Although the bounds of the likelihood are obtained based on [14],applying those bounds into the visual track-ing problem directly is not trivial since proper and careful

designs of the parameterγand the distribution q are needed for the visual tracking problem.In this work,γand q are designed as the hyper parameter of the likelihood and the prior distribution of a target model,respectively.The last contribution is an ef?cient strategy to obtain the state that has the Minimum Uncertainty Gap.Our method constructs two chains and inferences the best state on the chains using the IMCMC method in[19].In the?rst chain,the proposed method?nds the state that maximizes the average bound (mean of the lower and upper bounds)of the likelihood. In the second chain,the method searches for the state that minimizes the gap between two likelihood bounds.These chains communicate with each other to obtain the best state that maximizes the average bound and minimizes the gap between bounds at the same time.

2.New Objective of the Bayesian Tracker

To?nd the best state,?X t,our method obtains the Mini-mum Uncertainty Gap(MUG)at each time t as follows:

?X t =arg

X t

min

p u(Y t|X t)?p l(Y t|X t)

p u(Y t|X t)+p l(Y t|X t),(1)

where p l(Y t|X t)and p u(Y t|X t)denote the lower and upper bounds of the likelihood,respectively.In(1),our method?nds the state that maximizes the average bound [p u(Y t|X t)+p l(Y t|X t)]and minimizes the gap between bounds,[p u(Y t|X t)?p l(Y t|X t)],at the same time.The best state(MUG state)at time t is represented as a three-dimensional vector?X t=(?X x t,?X y t,?X s t),where?X x t,?X y t, and?X s t indicate the x,y position and the scale of the target, respectively.

To obtain the MUG state,we need to estimate the lower and upper bounds of the likelihood.First,we de?ne the likelihood as

p(Y t,θ|X t)=exp?λdist θ,Y t(X t) ,(2) where Y t(X t)denotes the observation at the state X t, dist θ,Y t(X t) represents the dissimilarity measure be-tween the target modelθand the observation Y t(X t),and λis a weighting parameter.The observation and the tar-get model are modeled by HSV histogram.The dissimi-larity measure is designed by Bhattacharyya similarity co-ef?cient[28].As aforementioned,the main cause of the tracking failures is the noisy target models.Therefore,our method integrates out the target modelθin(2)and esti-mates the log marginal likelihood: Θln p(Y t,θ|X t)dθ, whereΘdenotes the whole target model space.To ap-proximate the integral numerically,we obtain the mathe-matical lower(Jensen’s inequality)and mathematical upper bounds(Gibbs’inequality)of the marginal likelihood based on[14].

ln p l(Y t|X t,γ)= Θq(θ|γ,X t)ln p(Y t,θ|X t)

q(θ|γ,X t)dθ,(3)

ln p u(Y t|X t,γ)= Θp(θ|Y t,X t)ln p(Y t,θ|X t)

q(θ|γ,X t)

dθ,

(4)

where q(θ|γ,X t)is the prior distribution of the target model

θandγis the hyper parameter of the distribution.Becauseθ

is marginalized out in(3)(4),the lower and upper bounds of

the likelihood is the function of X t andγ,which are a state

and a parameter,respectively.Then,the goal of our method

is to?nd both the best state and parameter,which reduce

gap between the likelihood bounds.Thus,our method com-

poses two main parts as follows:

?Parameter learning(Section3):Using the MUG states,

{?X i}t i=1,our method learns the parameterγfor time t+1.

In our method,the parameter is not set empirically but is

obtained analytically to maximize the lower bound in(3)

and to minimize the upper bound in(4).Moreover,the pa-

rameter is not?xed to constant but is adaptively varied at

each time t by the process in Section3.

?State inference(Section4):Given the parameterγes-

timated at time t?1,our method?nds the MUG state?X t

at time t,which produces the minimum uncertainty gap.To

achieve the goal,the method searches states that maximize

the average bound by increasing the denominator in(1).

Thus,the method can obtain good quality of states with high

likelihood scores.This advantage is similar to that of the

MAP estimation.In addition,it prevents the best state with

the minimum uncertainty gap from having a low likelihood

score.Our method simultaneously searches states that min-

imize uncertainty of the likelihood estimation by decreasing

the numerator in(1).Then,the method can avoid outliers

which have a large uncertainty gap,even though they have

high likelihood scores.This advantage cannot be achieved

in the MAP estimation.

3.Parameter Learning

3.1.Learningγu for the Upper Bound

We learn the best parameterγu which minimizes the upper bound(4):γu=arg

γ

min ln p u(Y t|X t,γ).Then,γu also minimizes the KL divergence D(p q)because

ln p u(Y t|X t,γ)=D(p q)+ln p(Y t|X t),where

D(p q)= Θp(θ|Y t,X t)ln p(θ|Y t,X t)

q(θ|γ,X t)

dθ.(5)

The parameterγu that minimizes D(p q)satis?es

E q(θ|γ

u

,X t)

[v(θ)]=E p(θ|Y

t

,X t)

[v(θ)],as derived in Ap-pendix(supplementary material).This means that the mini-

mization of KL divergence is equivalent to Moment Match-

ing(MM)ofθ[4]:the?rst and second moments ofθun-

der the distribution q(θ|γu,X t)is equal to those under the

distribution p(θ|Y t,X t).In(5),the prior q(θ|γ,X t)is de-

signed as

q (θ|γ,X t )=q (θ|μ,σ,X t )=1√2πσ

exp ?

θ?μ√2σ

”2

,(6)

so the parameter γu =(μu ,σu )is obtained for each bin in the HSV histogram:

?1st MM:Since the ?rst moment of θunder q (θ|γu ,X t )and under p (θ|Y t ,X t )≈p (Y t ,θ|X t )in (2)is μu and Θθp (θ|Y t ,X t )dθ,respectively,the following can be taken:

μu = Θ

θp (θ|Y t ,X t )dθ.(7)

In (7),the integration over θis approximated using the Z

samples of θ,{θi }Z i =1,where θi is designed as Y i (?X i ),

which indicates the observation around the MUG state at

the i -th recent frame.By substituting X t =?X

t and p (θ|Y t ,X t )≈p (Y t ,θ|X t )in (2)into (7),we get μu = Θ

θp (θ|Y t ,X t )dθ≈1

Z t ?1i =t ?Z θi exp ?λdist θi ,Y t (?X t ) .?2nd MM:Since the second moment of θun-der q (θ|γu ,X t )and p (θ|Y t ,X t )is σu and Θ θ?μu 2

p (θ|Y t ,X t )dθ,respectively,the following can be taken:

σu = Θ

θ?μu 2

p (θ|Y t ,X t )dθ,(8)

where Θ θ?μu 2p (θ|Y t ,X t )dθ≈1

Z t ?1i =t ?Z θi ?μu 2exp ?λdist θi ,Y t (?

X t ) .

Finally,the global minimum of the upper bound of the likelihood at the state X t in (4)is estimated based on [14]:

ln p u (Y t |X t ,γu )≈

1Z t ?1

i =t ?Z

ln p (θi ,Y t |X t )q (θi |γu ,X t )

,(9)

where the integration in (4)is approximately obtained using

the Z samples of θ,{θi }Z i =1,where θi indicates the obser-vation around the MUG state at the i -th recent frame.

3.2.Learning γl for the Lower Bound

We learn the best parameter γl which maximizes the lower bound in (3):γl =arg γ

max ln p l (Y t |X t ,γ).For this

purpose,the gradient of ln p l (Y t |X t ,γ)is taken with re-spect to γto zero:

d

dγln p l (Y t |X t ,γ)=?

Θ

h (θ|γ,X t )q (θ|γ,X t )dθ=0,(10)where

h (θ|γ,X t )=h (θ|μ,σ,X t )= 1+ln q (θ|μ,σ,X t )p (Y t ,θ|X t ) ??μq (θ|μ,σ,X t )??σ

q (θ|μ,σ,X t )

.(11)

1

To ?nd the parameter γl =(μl ,σl )that satis?es

(10),the quasi-optimized lower bound is estimated by Stochastic Approximation Monte Carlo (SAMC)in [22]to de?ne the recursive approximation of the solution of d

dγln p l (Y t |X t ,γ)=0.SAMC then iteratively updates a sequence of values via the recursion:

γ(n +1)=γ(n )+s (n +1)ω(γ(n )),

ω(γ(n ))=?

Θ

h (θ|γ(n ),X t )dθ,

(12)

where

Θh (θ|γ(n ),X t )dθ≈1Z t ?1

i =t ?Z h (θi |γ

(n ),?X t ).γ(n +1)denotes approximation of the γvalue at (n+1)-th iteration,and s (n +1)indicates the modi?cation factor at (n+1)-th iteration,which linearly decreases from 0.5to 0.1as time goes on.After the prede?ned iterations N ,we get γ(N )=γl =(μl ,σl ).

Then,the ?nal estimate of the lower bound at the state X t in (3)is estimated based on [14]:

ln p l (Y t |X t ,γl )≈

1Z t ?1

i =t ?Z

ln p (θi ,Y t |X t )q (θi |γl ,X t )

.(13)

4.Inference of the MUG State

To ?nd the best state that satis?es (1)with the ?xed pa-rameters γl and γu ,our method utilizes the IMCMC sam-pling method [19].In the IMCMC sampling method,two

markov chains are designed.The ?rst chain frequently ac-cepts the state that maximizes the average likelihood bound.The second frequently accepts the state that minimizes the gap between the bounds.The IMCMC sampling method consists of two modes,parallel and interacting.In the par-allel mode,our method acts as the parallel Metropolis Hast-ings algorithm and separately obtains samples over those chains via two main steps:the proposal step and the accep-tance step.At the proposal step,a new state is proposed by the proposal density function.

Q (X ?t ;X t )=G (X t ,σ2

p ),

(14)

where Q denotes the proposal density function,G repre-sents the Gaussian distribution with mean X t and variance

σ2p ,and X ?t represents a new state at time t .Given the pro-posed state,each chain decides whether the state is accepted or not with the acceptance ratio in the acceptance step:

a p 1=min 1,[p u (Y t |X ?t ,γu )+p l (Y t |X ?t ,γl )]Q (X t ;X ?

t

)[p u (Y t |X t ,γu )+p l (Y t |X t ,γl )]Q (X ?t ;X t )

,a p 2=min 1,[p u (Y t |X t ,γu )?p l (Y t |X t ,γl )]Q (X t ;X ?t

)[p u (Y t |X ?t ,γu )?p l (Y t |X ?t ,γl )]Q (X ?t ;X t )

,(15)

1In

(11),[1+ln

q (θ|μ,σ,X t )

p (Y t ,θ|X t )

]=[λdist `θ,Y t (X t )′?`θ?μ√

2σ′2

?ln

2πσ2+1],?

q (θ|μ,σ,X t )=1√2πσ

exp ?“θ?μ√2σ”

21σ2(θ?μ),and

?

?σq (θ|μ,σ,X t )=?12σ2+(θ?μ)22σ2

.

Algorithm1Minimum Uncertainty Gap tracker

Input:X t?1=(X x t?1,X y t?1,X s t?1),α=1

Output:?X t=(?X x t,?X y t,?X s t)

1:while all frames do

2:for1to R do

3:Choose mode.Sampleρ~U[0,1].

4:ifρ<αthen

5:Interacting mode:

6:Two chains replace their states with that of the other with the probability a i1and a i2in(16),respectively.

7:else

8:Parallel mode:

9:Two chains propose states using Q in(14)and accepts the states with the probability a p1and a p2in(15),respectively. 10:end if

11:Decrease theαvalue.

12:end for

13:Estimate the MUG state,?X t using(1).

14:Determineγu using(7)(8)andγl using(12).

15:end while

where p l(Y t|X?t,γl)and p u(Y t|X?t,γu)denote the esti-mated lower and upper bounds of the likelihood at the state X?t,respectively.These steps iteratively proceed until the number of iterations reaches the prede?ned value R.

When the method is in the interacting mode,the trackers communicate with the others and make leaps to better states of the target.Due to the interaction mode,our method can ?nd the common state,which maximizes the average like-lihood bound and,at the same time,minimizes the gap be-tween bounds.A chain accepts the state of the chain1as its own state with the probability a i1,if the state of the chain 1greatly maximizes the average likelihood bound.Simi-larly,a chain accepts the state of the chain2as its own state with the probability a i2,if the state of the chain2greatly minimizes the gap between bounds:

a i1=[p u(Y t|X t,γu)+p l(Y t|X t,γl)]??1 2p l(Y t|X t,γl)??1+?2,

a i2=?2?[p u(Y t|X t,γu)?p l(Y t|X t,γl)] 2p l(Y t|X t,γl)??1+?2,

?1=MAX p u(Y t|?X t?1)+p l(Y t|?X t?1) ?14,0 ,?2=MIN p u(Y t|?X t?1)?p l(Y t|?X t?1) +14,1 .(16)

In(16),[p u(Y t|X t,γu)+p l(Y t|X t,γl)]??1indicates the increased quantity of the average likelihood bound and ?2?[p u(Y t|X t,γu)?p l(Y t|X t,γl)]represents the de-creased quantity of the gap between bounds.p l(Y t|?X t?1) and p u(Y t|?X t?1)are the lower and upper bounds of the likelihood on the best state at the previous frame with the best parameter,respectively.Our method operates in the interacting mode with the probabilityα,which linearly de-creases from1.0to0.0as the simulation goes on.Notably, the IMCMC method[19]typically converges to the invari-

ant distribution p u(Y t|X t)?p l(Y t|X t)

p u(Y t|X t)+p l(Y t|X t)in(1).Algorithm1il-

lustrates whole process of our method.

0100400

0.0

0.2

G

a

p

b

e

t

w

e

e

n

b

o

u

n

d

s

Occlusion

Pose variation Illumination change

Occlusion Figure3.Tracking environments gaps between the like-lihood bounds are maximized

(a)high-jump(b)shaking(c)david(d)skater Figure4.States of the target,which produce the maximum lower bound(blue rectangle)and the minimum upper bound(red rectan-gle)of the likelihood at a frame.

5.Experimental Results

For the experiments,publicly available video sequences obtained from[1,17,19,25,34,30]were https://www.sodocs.net/doc/f110115916.html,ing the sequences,the proposed method(MUG)was analyzed and compared with6state-of-the-art tracking methods, MC[16,28],IVT[29],FRAGT[1],MIL[2],VTS[19],and MTT[37].In all experiments,λin(2)is set to5.Z in(9) and(13)is set to15.σp in(14)is set toσx p=

√4,σy

p

=

√2, andσs p=

√0.01,whereσx

p

,σy p,andσs p denote the vari-ance of the x translation,y translation and the scale,respec-tively.Please note that our method always use the same settings throughout all experiments and the parameters of other methods were adjusted to show the best tracking per-formance.Same initializations were set to all methods for fair comparison.The software provided by the authors were used to obtain the tracking results of IVT,MIL,FRAGT, VTS,and MTT.The supplementary material contains result videos.

5.1.Analysis of the Proposed Method

Lower and Upper Bounds of the Likelihood:The track-ing environments are examined when the gaps between the likelihood bounds are maximized.As illustrated in Fig.3, the gaps between the likelihood bounds were maximized when there were severe occlusions,pose variations,or il-lumination changes.These changes caused the target ap-pearance to become noisy.Because of the noisy appear-ance,the estimated likelihood became very uncertain.This uncertainty of the likelihood produced the large gap be-tween the lower and upper bounds of the likelihood in our method.In other words,the likelihood cannot be uniquely determined,especially when the tracking environments in-clude the aforementioned appearance changes.Therefore, our method considers the uncertainty of the likelihood to track the target robustly in real-world situations.

The states of the target that produce the maximum lower bound or the minimum upper bound of the likelihood are

https://www.sodocs.net/doc/f110115916.html,parison of tracking results using MAP.ML,and MUG.The numbers indicate the average center location errors in pixels.The improvement score is calculated by dividing the track-ing error of ML3by that of MUG.

bird1bird2lemming woman soccer skating1diving high-jump skater MAP199451112751115267030 ML1208471213756110436547 ML2201511613861107467151 ML3210421110149150277137 MUG131116143217143017 Score16 3.80.77.2 1.58.8 1.9 2.3 2.2 https://www.sodocs.net/doc/f110115916.html,parison of tracking results using MCMC and IM-CMC.The improvement score is calculated by dividing the track-ing error of MCMC by that of IMCMC.

bird1bird2lemming woman soccer skating1diving high-jump skater MCMC243152404789326551 IMCMC131116143217143017 Score 1.8 2.8 3.3 2.9 1.5 5.2 2.3 2.2 3.0

also checked.In(7)(8),the target model for the minimum upper bound is constructed by averaging the target appear-ance in the recent frames.So,the model is adequate to track the target whose appearance smoothly changes over time. Then,the state that produces the minimum upper bound is the best state when the smooth changes in the target appear-ance are assumed,as shown in Fig.4(a)(c).In(10)(11),the target model for the maximum lower bound is heavily up-dated if the current observation is vastly different from the model.So,the model is robust to track the target whose appearance abruptly changes at a certain time.Then,the state that produces the maximum lower bound is the best state when the abrupt changes in the target appearance are assumed,as shown in Fig.4(b)(d).Therefore,the state that reduces the gap between two bounds is the best state for both smooth and abrupt changes in target appearance. Performance of MUG and IMCMC:To evaluate the per-formance of MUG,we used the same likelihood function that employs the Bhattacharyya coef?cient as the similar-ity measure and the HSV color histogram as the feature. The only difference is how to determine the best state. The best states estimated by MAP is the one which max-imize the posterior probability.The best states estimated by ML1,ML2,and ML3are the ones which maximize the lower likelihood bound,upper likelihood bound,and aver-age likelihood bound,respectively.The best state obtained by MUG is the one which maximizes the average likelihood bound and simultaneously minimizes the gaps between two bounds.As shown in Table1,the best state obtained by MUG gives most accurate tracking results.These results demonstrated the state which produces the maximum like-lihood score and the maximum posterior probability do not always correspond to the true target state in real-world set-tings.Additionally,the results shows the methods should consider uncertainty of the estimated likelihood by measur-ing the gaps between the likelihood bounds like our method.

To evaluate the performance of IMCMC,the same MUG strategy is used to determine the best state.The only dif-ference is the procedure in?nding the best state.The best https://www.sodocs.net/doc/f110115916.html,parison of tracking results.The numbers indi-cate the average center location errors in pixels.Red is the best result and blue is the second-best.Other numbers in()indi-cate the percent of successfully tracked frames,where tracking is success when the overlap ratio between the predicted bound-ing box A p and ground truth bounding box A g is over than0.5: area(A p T A g)

area(A p S A g)>0.5.

MC IVT FRAGT MIL VTS MTT MUG bird1215(16)230(13)228(13)270(11)119(13)265(12)13(43) bird240(18)115(11)24(67)13(81)81(23)76(21)11(86) lemming12(85)14(79)84(26)14(51)70(45)90(25)16(71) woman138(11)133(11)112(19)120(15)111(19)120(15)14(62) soccer53(15)116(9)82(11)41(17)15(35)17(34)32(20) skating1172(14)213(11)93(26)85(31)8(93)150(20)17(40) diving27(24)79(20)64(20)73(20)80(20)98(19)14(24) high-jump73(15)79(15)69(15)91(15)143(14)94(15)30(17) skater28(47)86(41)23(61)85(41)25(66)25(66)17(66) Speed(fps)762170.40.13 state in MCMC is found by using a single chain,in which the chain?nds the state that maximizes the average bound and minimizes the gap between two bounds simultaneously. The best state in IMCMC is obtained by employing two chains,in which one chain only searches for the state that maximizes the average bound and the other only searches for the state that minimizes the gap between two bounds. Then,two chains exchange information about good states. As indicated in Table2,using two chains shows better track-ing performance because the tracking methods using a sin-gle chain get trapped in local optima more frequently as the target distribution becomes complex.The target distribu-tion is complex because the different two types of the like-lihood distribution are mixed in a single distribution.Our method divides a complex distribution into two simple ones using IMCMC,where two distributions describe the aver-age bound and the gap between two bounds,respectively.

https://www.sodocs.net/doc/f110115916.html,parison with other Tracking methods

As summarized in Table3,our method(MUG)most ac-curately tracked the targets in terms of the center location error and the success rate,even though there are several types of appearance changes.VTS showed the second-best tracking performance.Our method was robust to the geo-metric appearance changes of the non-rigid target in diving, high-jump,and skater;the occlusions in bird1,and woman; and the motion blur in bird2.In this paper,we wanted to demonstrate that our method can produce better tracking results by utilizing a very simple likelihood function and its lower and upper bounds.In using the simple likelihood function,the method was much faster and more accurate than VTS.The tracking performance of our method can be further enhanced if more advanced likelihood functions are employed.Additionally,with the simple likelihood func-tion,our method produced more accurate tracking results than other state-of-the-art methods,where they are robust to the pose variations,occlusions,and illumination changes. Note that,for the sampling-based methods,we used the same number of samples to track the target.

Fig.5and Fig.6demonstrate how our method outper-

(a)bird1seq.

(b)bird2seq.

Time Gap between Lower & Upper Bounds

Time

(c)soccer seq.

(d)skating1seq.

(e)lemming seq.

Figure5.Tracking results in several challenging sequences.Yellow,blue,white,purple,green and red rectangles represent tracking results of MUG,MTT,VTS,MIL,FRAGT,and MC,respectively.

forms the state-of-the-art methods in several challenging sequences.In Fig.5,the tracking performance under the se-vere occlusions and background clutters was tested.When the sequence contained several appearance changes of the target at the same time,our method robustly tracked the target over time,while other tracking methods frequently missed the targets.The tracking results of MIL drifted into the background when the aforementioned changes trans-formed the target appearance into a different one.Our method overcame this problem and successfully tracked the target by evaluating the target con?guration with several likelihoods.In Fig.6,our method did not miss the target in all frames,although the sequences include the severe geo-metric appearance changes of the target.On the other hand, MIL and VTS frequently failed to track the target when the target was severely deformed.Our method was more ef?-cient than VTS in terms of the computational cost because it utilized two likelihoods only for evaluating the con?gura-tion by estimating the lower and upper bounds of the likeli-hood.

6.Conclusion

In this paper,we propose a novel tracking framework that tracks the target robustly by?nding the best state of the target,which minimizes the gap between the lower and upper bounds of the likelihood.Obtaining the likelihood bounds is the same as considering all possible target mod-els during the tracking process.Therefore,our method?nds the good state of the target by re?ecting all possible ap-pearance changes of the target.The experimental results demonstrate that our method outperforms the conventional tracking methods using the MAP and ML estimation.The method also shows better tracking performances than those of the state-of-the-art tracking methods when there are illu-mination,occlusions,and deformation.

References

[1] A.Adam,E.Rivlin,and I.Shimshoni.Robust fragments-based track-

ing using the integral histogram.CVPR,2006.

[2] B.Babenko,M.Yang,and S.Belongie.Visual tracking with online

multiple instance learning.CVPR,2009.

[3]S.Birch?eld.Elliptical head tracking using intensity gradients and

color histograms.CVPR,1998.

[4] C.I.Byrnes and A.Lindquist.A convex optimization approach to

generalized moment problems.Control and Modeling of Complex Systems,Springer,2003.

[5]R.T.Collins,Y.Liu,and M.Leordeanu.Online selection of discrim-

inative tracking features.PAMI,27(10):1631–1643,2005.

[6] https://www.sodocs.net/doc/f110115916.html,aniciu,V.Ramesh,and P.Meer.Real-time tracking of non-

rigid objects using mean shift.CVPR,2000.

[7] A.C.Courville,N.D.Daw,G.J.Gordon,and D.S.Touretzky.

Model uncertainty in classical conditioning.NIPS,2003.

[8]M.Godec,P.M.Roth,,and H.Bischof.Hough-based tracking of

non-rigid objects.ICCV,2011.

[9]M.G.H.Grabner and H.Bischof.Real-time tracking via on-line

boosting.BMVC,2006.

[10] B.Han and L.Davis.On-line density-based appearance modeling

for object tracking.ICCV,2005.

[11]S.Hare,A.Saffari,and P.H.S.Torr.Struck:Structured output

tracking with kernels.ICCV,2011.

Lower & Upper Bounds of the Likelihood Gap between Lower & Upper Bounds

Frame

Frame

(a)high-jump seq.

Lower & Upper Bounds of the Likelihood Gap between Lower & Upper Bounds

Frame

Frame (b)diving seq.

Lower & Upper Bounds of the Likelihood

Gap between Lower & Upper Bounds

Frame

Frame

(c)skater seq.

Lower & Upper Bounds of the Likelihood

Gap between Lower & Upper Bounds

Frame

Frame

(d)woman seq.

Figure 6.Tracking results with lower and upper bounds of the likelihood obtained by MUG.Yellow,white,and purple rectangles represent tracking results of MUG,VTS,and MIL,respectively.Yellow and red curves represent lower and upper bounds of the likelihood over time in MUG,respectively.Green curve represents gap between the bounds over time in MUG.

[12]M.Isard and A.Blake.Icondensation:Unifying low-level and high-level tracking in a stochastic framework.ECCV ,1998.

[13] A.D.Jepson,D.J.Fleet,and T.F.E.Maraghi.Robust online appear-ance models for visual tracking.PAMI ,25(10):1296–1311,2003.[14] C.Jia,H.Shenb,and M.Westc.Bounded approximations for

marginal likelihoods.Technical report ,2010.

[15]Z.Kalal,K.Mikolajczyk,and J.Matas.Tracking-learning-detection.

PAMI ,34(7):1409–1422,2012.

[16]Z.Khan,T.Balch,and F.Dellaert.MCMC-based particle ?lter-ing for tracking a variable number of interacting targets.PAMI ,27(11):1805–1918,2005.

[17]J.Kwon and K.M.Lee.Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling.CVPR ,2009.

[18]J.Kwon and K.M.Lee.Visual tracking decomposition.CVPR ,

2010.

[19]J.Kwon and K.M.Lee.Tracking by sampling trackers.ICCV ,2011.[20]J.Kwon and K.M.Lee.Wang-landau monte carlo-based tracking

methods for abrupt motions.PAMI ,35(4):1011–1024,2013.

[21] B.Leibe,K.Schindler,N.Cornelis,and L.Van Gool.Coupled ob-ject detection and tracking from static cameras and moving vehicles.PAMI ,30(10):1683–1698,2008.

[22] F.Liang,C.Liu,and R.J.Carroll.Stochastic approximation in monte

carlo computation.J.Amer.Statist.,102(477):305–320,2007.

[23]L.Matthews,T.Ishikawa,and S.Baker.The template update prob-lem.PAMI ,26(6):810–815,2004.

[24]X.Mei,H.Ling,Y .Wu,E.Blasch,and L.Bai.Minimum error

bounded ef?cient l1tracker with occlusion detection.CVPR ,2011.

[25]S.M.S.Nejhum,J.Ho,and M.-H.Yang.Visual tracking with his-tograms and articulating blocks.CVPR ,2008.

[26]S.Oron,A.Bar-Hillel,D.Levi,and S.Avidan.Locally orderless

tracking.CVPR ,2012.

[27] D.W.Park,J.Kwon,and K.M.Lee.Robust visual tracking using

autoregressive hidden markov model.CVPR ,2012.

[28]P.Perez,C.Hue,J.Vermaak,and M.Gangnet.Color-based proba-bilistic tracking.ECCV ,2002.

[29] D.A.Ross,J.Lim,R.Lin,and M.Yang.Incremental learning for

robust visual tracking.IJCV ,77(1):125–141,2008.

[30]J.Santner,C.Leistner,A.Saffari,T.Pock,and H.Bischof.Prost:

Parallel robust online simple tracking.CVPR ,2010.

[31]L.Sevilla-Lara and E.Learned-Miller.Distribution ?elds for track-ing.CVPR ,2012.

[32]K.Smith,D.Gatica-Perez,and https://www.sodocs.net/doc/f110115916.html,ing particles to

track varying numbers of interacting people.CVPR ,2005.

[33]S.Stalder,H.Grabner,and L.V .Gool.Cascaded con?dence ?ltering

for improved tracking-by-detection.ECCV ,2010.

[34]S.Wang,H.Lu,F.Yang,and M.-H.Yang.Superpixel tracking.

ICCV ,2011.

[35]M.Yang and Y .Wu.Tracking non-stationary appearances and dy-namic feature selection.CVPR ,2005.

[36] A.Yilmaz,O.Javed,and M.Shah.Object tracking:A survey.ACM

Comput.Surv.,38(4),2006.

[37]T.Zhang,B.Ghanem,S.Liu,and N.Ahuja.Robust visual tracking

via multi-task sparse learning.CVPR ,2012.

相关主题