搜档网
当前位置:搜档网 › Robust estimates in generalized partially linear models

Robust estimates in generalized partially linear models

Robust estimates in generalized partially linear models
Robust estimates in generalized partially linear models

a r

X

i

v

:07

8

.

1

6

5

v

1

[

s

t a

t

.

M

E

]

1

A

u

g

2

7

The Annals of Statistics 2006,Vol.34,No.6,2856–2878DOI:10.1214/009053606000000858c Institute of Mathematical Statistics ,2006ROBUST ESTIMATES IN GENERALIZED PARTIALLY LINEAR MODELS 1By Graciela Boente,Xuming He and Jianhui Zhou Universidad de Buenos Aires and CONICET,University of Illinois at Urbana-Champaign and University of Virginia In this paper,we introduce a family of robust estimates for the parametric and nonparametric components under a generalized par-tially linear model,where the data are modeled by y i |(x i ,t i )~F (·,μi )with μi =H (η(t i )+x T i β),for some known distribution function F and link function H .It is shown that the estimates of βare root-n consistent and asymptotically normal.Through a Monte Carlo study,the performance of these estimators is compared with that of the clas-sical ones.1.Introduction.Semiparametric models contain both a parametric and a nonparametric component.Sometimes,the nonparametric component plays the role of a nuisance parameter.Much research has been done on estima-tors of the parametric component in a general framework,aiming to obtain asymptotically e?cient estimators.The aim of this paper is to consider semi-parametric versions of the generalized linear models where the response y is to be predicted by covariates (x ,t ),where x ∈R p and t ∈T ?R .It will be assumed that the conditional distribution of y |(x ,t )belongs to the canonical exponential family exp[yθ(x ,t )?B (θ(x ,t ))+C (y )]for known functions B and C .Then μ(x ,t )=E (y |(x ,t ))=B ′(θ(x ,t )),with B ′denoting the deriva-tive of B .In generalized linear models [19],which constitute a popular ap-proach for modeling a wide variety of data,it is often assumed that the mean is modeled linearly through a known inverse link function,g ,that is,g (μ(x ,t ))=β0+x T β+αt.

2G.BOENTE,X.HE AND J.ZHOU

For instance,an ordinary logistic regression model assumes that the ob-

servations(y i,x i,t i)are such that the response variables are independent binomial variables y i|(x i,t i)~Bi(1,p i),whose success probabilities depend on the explanatory variables through the relation g(p i)=β0+x T iβ+αt i,

with g(u)=ln(u/(1?u)).

The in?uence function of the classical estimates based on the quasi-

likelihood is https://www.sodocs.net/doc/671786766.html,rge deviations of the response from its mean,

as measured by the Pearson residuals,or outlying points in the covariate

space,can have a large in?uence on the estimators.Those outliers or poten-tial outliers for the generalized linear regression model are to be detected and controlled by robust procedures such as those considered by Stefanski, Carroll and Ruppert[23],K¨u nsch,Stefanski and Carroll[17],Bianco and Yohai[5]and Cantoni and Ronchetti[9].

In some applications,the linear model is insu?cient to explain the re-

lationship between the response variable and its associated covariates.To

avoid the curse of dimensionality,we allow most predictors to be modeled linearly while a small number of predictors(possibly just one)enter the model nonparametrically.The relationship will be given by the semipara-metric generalized partially linear model

(1)

μ(x,t)=H(η(t)+x Tβ),

where H=g?1is a known link function,β∈R p is an unknown parameter

andηis an unknown continuous function.

Severini and Wong[22]introduced the concept of generalized pro?le likeli-

hood,which was later applied to this model by Severini and Staniswalis[21].

In this method,the nonparametric component is viewed as a function of

the parametric component and

ROBUST SEMIPARAMETRIC REGRESSION3

2.The proposal.

2.1.The estimators.Let(y i,x i,t i)be independent observations such that y i|(x i,t i)~F(·,μi),withμi=H(η(t i)+x T iβ)and V ar(y i|(x i,t i))= V(μi).Letη0(t)andβ0denote the true parameter values and E0the ex-pected value under the true model,so that E0(y|(x,t))=H(η0(t)+x Tβ0). Lettingρ(y,u)be a loss function to be speci?ed in the next subsection,we de?ne

S n(a,β,t)=

n

i=1W i(t)ρ(y i,x T iβ+a)w1(x i),

(2)

S(a,β,τ)=E0[ρ(y,x Tβ+a)w1(x)|t=τ],

(3)

where W i(t)are the kernel(or nearest-neighbor with kernel)weights on t i and w1(·)is a function that downweights high leverage points in the x space. Note that S n(a,β,τ)is an estimate of S(a,β,τ),which is a continuous func-tion of(a,β,τ)if(y,x)|t=τhas a distribution function that is continuous with respect toτ.

Fisher consistency states thatη0(t)=argmin a S(a,β0,t).This is a key point in order to get asymptotically unbiased estimates for the nonparamet-ric component.In many situations,a stronger condition holds,that is,under general conditions,it can be veri?ed that

S(η0(t),β0,t)

(4)

which entails Fisher consistency.

Following the ideas of Severini and Staniswalis[21],we de?ne the function ηβ(t)as the minimizer of S(a,β,t)that will be estimated by the minimizer ?ηβ(t)of S n(a,β,t).

To provide an estimate ofβwith root-n convergence rate,we denote

F n(β)=n?1

n

i=1ρ(y i,x T iβ+?ηβ(t i))w2(x i),

(5)

F(β)=E0[ρ(y,x Tβ+ηβ(t))w2(x)],

(6)

where w2(·)plays the same role(and can be taken to be the same)as w1(·). We will assume thatβ0is the unique minimizer of F(β).This assumption is a standard condition in M-estimation in order to get consistent estimators of the parametric component and is analogous to condition(A-4)of[16], page129.

A two-step robust proposal is now given as follows:

?Step1:For each value of t andβ,let

?ηβ(t)=argmin

a∈R S n(a,β,t).

(7)

4G.BOENTE,X.HE AND J.ZHOU ?Step2:De?ne the estimate ofβ0as

?β=argmin

β∈R p F n(β)

(8)

and the estimate ofη0(t)as?η?

β

(t).

2.2.Loss functionρ.We propose two classes of loss functions.The?rst aims to bound the deviances,while the second,introduced by Cantoni and Ronchetti[9],bounds the Pearson residuals.

The?rst class of loss functions takes the form

ρ(y,u)=φ[?ln f(y,H(u))+A(y)]+G(H(u)),

(9)

whereφis a bounded nondecreasing function with continuous derivative?and f(·,s)is the density of the distribution function F(·,s)with y|(x,t)~F(·,H(η0(t)+x Tβ0)).To avoid triviality,we also assume thatφis noncon-stant in a positive probability set.Typically,φis a function which behaves like the identity function in a neighborhood of0.The function A(y)is typ-ically used to remove a term from the log-likelihood that is independent of the parameter and can be de?ned as A(y)=ln(f(y,y))in order to obtain the deviance.The correction term G is used to guarantee Fisher consistency and satis?es

G′(s)= ?[?ln f(y,s)+A(y)]f′(y,s)dμ(y)

=E s(?[?ln f(y,s)+A(y)]f′(y,s)/f(y,s)),

where E s indicates expectation taken under y~F(·,s)and f′(y,s)is short-hand for?f(y,s)/?s.With this class ofρfunctions,we call the resulting estimator a modi?ed likelihood estimator.

In a logistic regression setting,Bianco and Yohai[5]considered the score function

φ(t)= t?t2/2c if t≤c,

c/2otherwise,

while Croux and Haesbroeck[12]proposed using the score function φ(t)= t exp(?√t)exp(?√c)+c)exp(?√

ROBUST SEMIPARAMETRIC REGRESSION5 The second class of loss functions is based on[9],wherein the authors con-sider a general class of M-estimators of Mallows type by separately bound-

ing the in?uence of deviations on y and(x,t).Their approach is based on robustifying the quasi-likelihood,which is an alternative to the gener-

alizations given for generalized linear regression models by Stefanski,Car-roll and Ruppert[23]and K¨u nsch,Stefanski and Carroll[17].Let r(y,μ)=

(y?μ)V?1/2(μ)be the Pearson residuals with V ar(y i|(x i,t i))=V(μi).De-noteν(y,μ)=V?1/2(μ)ψc(r(y,μ)),whereψc is an odd nondecreasing score

function with tuning constant c,such as the Huber function and

ρ(y,u)=? H(u)s0ν(y,s)ds+G(H(u)) ,

(10)

where s0is such thatν(y,s0)=0and the correction term(included to ensure Fisher consistency),also denoted G(s),satis?es G′(s)=?E s(ν(y,s)).With such aρfunction,we call the resulting estimator a robust quasi-likelihood estimator.For the binomial and Poisson families,explicit forms of the cor-rection term G(s)are given in[9].

2.3.General comments.

(a)Fisher consistency and uniqueness.Under a logistic partially linear regression model,if

P(x Tβ=α|t=τ)<1,?(β,α)=0andτ∈T,

(11)

and if we consider the loss function given by(9)withφsatisfying the regu-

larity conditions given in[5],it is easy to see that(4)holds and that Fisher consistency for the nonparametric component is attained under this model. Moreover,it is easy to verify thatβ0is the unique minimizer of F(β)in this case.The same assertion can be veri?ed for the robust quasi-likelihood proposal ifψc is bounded and increasing.

Under a generalized partially linear model with the response having a

gamma distribution with a?xed shape parameter,Theorem1of Bianco, Garc′?a Ben and Yohai[4]allows us to verify(4)and Fisher consistency for the nonparametric and parametric components if the score functionφis bounded and strictly increasing on the set where it is not constant and if (11)holds.

For any generalized partially linear model,conditions similar to those

considered in[9]will lead to the desired uniqueness implied by(4).Note that this condition is quite similar to Condition(E)of[21],page511.When

considering the classical quasi-likelihood,the assumptionβ0=argmin

βF(β)

is related to Condition(7.e)of[21],page511,but for the robust quasi-likelihood,this assumption is ful?lled,for instance,for a gamma family with a?xed shape parameter such that(11)holds andψc is bounded and increasing.

6G.BOENTE,X.HE AND J.ZHOU

(b)Di?erentiated equations.If the functionρ(y,u)is continuously dif-ferentiable and we denoteΨ(y,u)=(?ρ(y,u))/?u,the estimates will be so-lutions to the di?erentiated equations.More precisely,ηβ(t)and?ηβ(t)will be solutions to S1(a,β,t)=0and S1n(a,β,t)=0,respectively,with

S1(a,β,τ)=E(Ψ(y,x Tβ+a)w1(x)|t=τ),

(12)

S1n(a,β,t)=

n

i=1W i(t)Ψ(y i,x T iβ+a)w1(x i).

(13)

Furthermore,?βis a solution of F1n(β)=0and Fisher consistency implies that F1(β0)=0and S1(η0(t),β0,t)=0,where

F1(β)=E Ψ(y,x Tβ+ηβ(t))w2(x) x+?

?β?ηβ(t i) .

(15)

Note that these?rst order equations may have multiple solutions and,there-fore,we may need the values of the objective functions(2)and(5)to select the?nal estimator.For a family of distributions with positive and?nite infor-mation number,Bianco and Boente[1]give conditions that entail the follow-ing:for each t,there exists a neighborhood ofη0(t)where S1(η0(t),β0,t)=0 and S1(a,β0,t)=0for a=η0(t).Moreover,η0(t)corresponds to a local minimum of S(a,β0,t).The asymptotic results in this paper are derived by assuming the existence of a unique minimum;otherwise,one can only ensure that there exists a solution to the estimating equations that is consistent. In the modi?ed likelihood approach,the derivative of(9)is given by Ψ(y,u)=H′(u)[Ψ1(y,H(u))+G′(H(u))],where

Ψ1(y,u)=?[?ln f(y,H(u))+A(y)][?f′(y,H(u))/f(y,H(u))].

On the other hand,for the proposal based on the robust quasi-likelihood, we have the following expression for the derivative of(10):

Ψ(y,u)=?[ν(y,H(u))+G′(H(u))]H′(u)

=?[ψc(r(y,H(u)))V?1/2(H(u))+G′(H(u))]H′(u)

=?[ψc(r(y,H(u)))?E H(u){ψc(r(y,H(u)))}]H′(u)V?1/2(H(u)). One advantage of solving S1n(a,β,t)=0and F1n(β)=0is to avoid the numerical integration involved in the loss function(10),but the uniqueness of the solutions might be di?cult to guarantee in general,except for those cases discussed in part(a)of this section.Also,note that when using the score function of Croux and Haesbroeck[12],the function G(s)in(9)has an explicit expression which does not require any numerical integration.

ROBUST SEMIPARAMETRIC REGRESSION7

(c)Some robustness issues.It is clear that for unbounded response vari-ables y,a bounded score function allows us to deal with large residuals.For models with a bounded response,for example,under a logistic model,the advantage of a bounded score function is mainly to guard against outliers with large Pearson residuals.If a binary response y is contaminated,the Pearson residuals are large only when the variances at the contaminated points are close to0.These points are made more speci?c in the simulation study in Section5.

It is also worth noting that our robust procedures are e?ective only if at least one nonconstant covariate x is present.To consider a case without any covariate,we may take y i~Bi(1,p)as a random sample.Then easy calcula-tions show that the minimizer?a of S n(a)=n?1 n i=1ρ(y i,a)equals the clas-sical estimator,that is,?a=H?1( n i=1y i/n)with H(u)=1/(1+exp(?u)), when using either the score function proposed in[5]or that given by Can-toni and Ronchetti[9].The same situation obtains if y i|t i~Bi(1,p(t i)), where the resulting estimate of p(t)will be the local mean.In the present paper,with a semiparametric model where the covariate x plays a role,both downweighting the leverage points and controlling outlying responses work toward robustness.

3.Consistency.We will assume that t∈T and let T0?T be a com-pact set.For any continuous function v:T→R,we will denote v ∞= sup t∈T|v(t)|and v 0,∞=sup t∈T0|v(t)|.

In this section,we will show that the estimates de?ned by means of(7) and(8)are consistent under mild conditions,when the smoother weights are the kernel weights W i(t)=( n j=1K((t?t j)/h n))?1K((t?t i)/h n).Analo-gous results can be obtained for the weights based on nearest neighbors using arguments similar to those considered in[6].In this paper,we will use the following set of assumptions:

C1.The functionρ(y,a)is continuous and bounded and the functionsΨ(y,a)=?ρ(y,a)/?a,w1(.)and w2(.)are bounded.

C2.The kernel K:R→R is an even,nonnegative,continuous and bounded function,satisfying K(u)du=1, u2K(u)du<∞and|u|K(u)→0 as|u|→∞.

C3.The bandwidth sequence h n is such that h n→0and nh n/log(n)→∞. C4.The marginal density f T of t is a bounded function and given any compact set T0?T,there exists a positive constant A1(T0)such that A1(T0)

C5.The function S(a,β,t)satis?es the following equicontinuity condition: for anyε>0,there exists someδ>0such that for any t1,t2∈T0and β1,β2∈K,a compact set in R p,

|S(a,β1,t1)?S(a,β2,t2)|<ε.

|t1?t2|<δand β1?β2 <δ?sup

a∈R

8G.BOENTE,X.HE AND J.ZHOU

C6.The function S(a,β,t)is continuous andηβ(t)is a continuous function of(β,t).

Remark3.1.If the conditional distribution of x|t=τis continuous with respect toτ,the continuity and boundness ofρstated in C1entail that S(a,β,τ)is continuous.

Assumption C3ensures that for each?xed a andβ,we have convergence of the kernel estimates to their mean,while C5guarantees that the bias term converges to0.

Assumption C4is a standard condition in semiparametric models.In the classical case,it corresponds to condition(D)of[21],page511.It is also considered in nonparametric regression when the uniform consistency results on the t-space are needed;it allows us to deal with the denominator in the de?nition of the kernel weights,which is,in fact,an estimate of the marginal density f T.

Assumption C5is ful?lled under C1if the following equicontinuity con-dition holds:for anyε>0,there exist compact sets K1?R and K p?R p such that for anyτ∈T0,P((y,x)∈K1×K p|t=τ)>1?ε,which holds, for instance,if,for1≤i≤n and1≤j≤p,x ij=φj(t i)+u ij,whereφj are continuous functions and u ij are i.i.d and independent of t i.

Theorem3.1.Let K?R p and T0?T be compact sets such that Tδ?T, where Tδis the closure of aδ-neighborhood of T0.Assume that C1–C6and the following conditions hold:

(i)K is of bounded variation;

(ii)the family of functions F={f(y,x)=ρ(y,x Tβ+a)w1(x),β∈K, a∈R}has covering number N(ε,F,L1(Q))≤Aε?W,for any probability Q and0<ε<1.

Then we have

(a)supβ∈K

a∈R S n(a,β,·)?S(a,β,·) 0,∞ a.s.?→0;

(b)if infβ∈K

t∈T0

[lim|a|→∞S(a,β,t)?S(ηβ(t),β,t)]>0,then

sup

β∈K

?ηβ?ηβ 0,∞ a.s.?→0.

Theorem3.2.Let?βbe the minimizer of F n(β),where F n(β)is de?ned as in(5),with?ηβsatisfying

sup β∈K ?ηβ?ηβ 0,∞ a.s.?→0

(16)

for any compact set K in R p.If C1holds,then

ROBUST SEMIPARAMETRIC REGRESSION9

(a)supβ∈K|F n(β)?F(β)| a.s.?→0;

(b)if,in addition,there exists a compact set K1such that lim m→∞P( n≥m?β∈K1)=1and F(β)has a unique minimum atβ0,then ?β a.s.

?→β0.

Remark3.2.Theorems3.1and3.2entail that ?η?

β?η0 0,∞

a.s.

?→0,since

ηβ(t)is continuous.For the covering number used in Condition(ii)of The-orem3.1,see[20].

4.Asymptotic normality.From now on,T is assumed to be a compact set.A set of assumptions denoted N1–N6,under which the resulting esti-mates are asymptotically normally distributed,are detailed in the Appendix.

Theorem4.1.Assume that the t i’s are random variables with distri-bution on a compact set T and that N1–N6hold.Then for any consistent solution?βof(15),we have

n U n D

?→N(0,A?1Σ(A?1)T).

10G.BOENTE,X.HE AND J.ZHOU

5.Monte Carlo study.A small-scale simulation study was carried out to assess the performance of the robust estimators considered in this paper.A one-dimensional covariate x and a nonparametric function η(t )were con-sidered.The modi?ed likelihood estimator (MOD)used the score function of Croux and Haesbroeck [12]with c =0.5.With this choice,the function G (s )has an explicit expression,so no numerical integration is necessary.The weight functions take the form

w 21(x i )=w 22(x i )={1+(x i ?M n )2}?1,where M n =Median {x j :j =1,...,n }is the sample median.

The two competitors considered in the study were the quasi-likelihood estimator (QAL)of Severeni and Staniswalis [21]and the robust quasi-likelihood estimator (RQL)of Cantoni and Ronchetti [9].For the latter,the Huber function ψc (x )=max {?1.2,min(1.2,x )}was used with the same weight functions as above.The QAL estimator corresponds to ψc (x )=x and w 1(x )=w 2(x )=1.In all cases,the kernel K (t )=max {0,1?|t |}was used.In Studies 1and 3below,the search for βuses a grid of size 0.05,while in Study 2the grid size is 0.01.

An important issue in any smoothing procedure is the choice of the smoothing parameter.Under a nonparametric regression model with β=0and H (t )=t ,two commonly used approaches are cross-validation and plug-in.However,these procedures may not be robust;their sensitivity to anoma-lous data was discussed by several authors,including [7,10,18,24].Wang and Scott [24]note that in the presence of outliers,the least squares cross-validation function is nearly constant on its whole domain and,thus,es-sentially worthless for the purpose of choosing a bandwidth.The robustness issue remains for the estimators considered in this paper.With a small band-width,a small number of outliers with similar values of t i could easily drive the estimate of ηto dangerous levels.Therefore,we may consider a robust cross-validation approach as follows:

?Select at random a subset of size 100(1?α)%.Let I 1?αdenote the indexes of these observations and J 1?αthe indexes of the remaining ones.?For each given h ,compute ?η(?α)β(t,h )=argmin a ∈R i ∈I 1?α

W i (t,h )ρ(y i ,x T i β+a )w 1(x i ),?β

(?α)(h )=argmin β∈R p

i ∈I 1?αρ(y i ,x T i β+?η(?α)β(t i ,h ))w 2(x i ),

where W i (t,h )={ n j =1K ((t ?t j )/h )}?1K ((t ?t i )/h ).

?Choose ?h n =argmin h i ∈J 1?αρ(y i ,x T i

?β(?α)(h )+?η(?α)?β(?α)(t i ,h ))w 2(x i ).

ROBUST SEMIPARAMETRIC REGRESSION11 When the sample size n is small,the leave-one-out cross-validation,which is similar to the approach considered here,is usually preferred.When n is modestly large,the v-fold cross-validation is often used.However,both of them are computationally expensive.Based on our experience with a number of data sets,including some from Study1below,we found that the approach considered here is helpful.A full evaluation of this approach has not yet been completed.

To measure performance through simulation,we use the bias and standard deviation for theβestimate as well as the mean square error of the function estimate

MSE(?η)=n?1

n

i=1[?η(t i)?η(t i)]2.

We report the comparisons in three scenarios as follows.

Study1.Random samples of size n=100were generated from the model x~U(?1,1),t~U({0.1,0.2,...,1.0}),y|(x,t)~Bi(10,p(x,t)), where log(p(x,t)/(1?p(x,t)))=3x+e2t?4.We summarized the results over100runs in Table1,using three di?erent bandwidths,h n=0.1,h n=0.2 and h n=0.3.The three estimates are labeled as QAL(h n),RQL(h n)and MOD(h n).Figure1gives the histograms of the estimates ofβfor each method and bandwidth.It is clear that the robust estimators RQL and MOD have similar performance and that the relative e?ciencies of the MOD(h n) are between0.69and0.80,as compared to QAL(h n).The MOD method tends to have smaller bias than the RQL method and even than the QAL method.The normality of?βappeared to hold up quite well at this sample size.

Table1

Summary results for Study1

QAL(0.1)0.0590.2190.0510.111

QAL(0.2)0.0330.2140.0470.073

QAL(0.3)0.0040.2200.0480.152

RQL(0.1)?0.0510.2420.0610.114

RQL(0.2)?0.0540.2540.0670.089

RQL(0.3)?0.1050.2620.0800.154

MOD(0.1)0.0300.2520.0640.143

MOD(0.2)0.0180.2510.0630.088

MOD(0.3)?0.0010.2520.0640.135

12G.BOENTE,X.HE AND J.ZHOU

Fig.1.Histograms of?βfor QAL,RQL and MOD using bandwidths h n=0.1,0.2and 0.3.

We also applied the data-adaptive method described in this section for choosing h n based on a split of the sample into a training set(80%of the data)and a validation set(20%).On a total of ten random samples for Study1,the resulting h n are mostly between0.1and0.2.From Table1,we may observe that h n=0.2is indeed a good choice,but the performance of ?βis not very sensitive to the choice of h

n.

Study2.To see how the robust estimators protect us from gross errors in the data,we generated a data set of size n=100from the model x~N(0,1),t~N(1/2,1/6),y|(x,t)~Bi(10,p(x,t)), where log(p(x,t)/(1?p(x,t)))=2x+0.2.We then replaced the?rst one, two and three observations by gross outliers.Table2gives the parameter

ROBUST SEMIPARAMETRIC REGRESSION13

Table2

Estimates ofβ(true value of2)in Study2.

(x i,y i),1≤i≤3,denote the three contaminating

points which replace the?rst three observations

one by one

Original data 2.02 2.08 1.99

x1=10,y1=00.90 2.07 2.00

x2=?10,y2=100.31 2.06 1.97

x3=?10,y3=100.12 2.05 1.95

Data Estimator Bias(?β)SD(?β)MSE(?β)MSE(?η) estimates under the contaminated data,with h n=0.1,where(x i,y i),1≤

i≤3,denote the outliers.It is clear that the QAL estimate ofβwas very

sensitive to a single outlier,whereas the robust estimators remained stable.

Study3.We considered data sets of size n=200which are generated from a bivariate normal distribution(x i,t i)~N((0,1/2),Σ),truncated to t∈[1/4,3/4],with

Σ= 11/(6√

3)1/36 .

The response variable was then generated as

y i= 1,β0x i+η0(t i)+εi≥0,

0,β0x i+η0(t i)+εi<0,

14G.BOENTE,X.HE AND J.ZHOU

whereβ0=2,η0(t)=2sin(4πt)andεi was a standard logistic variate.For each data set generated from this model,we also created three contami-nated data sets,denoted C1,C2and C3in Table3.The purpose of the?rst two contaminations is to see how the robust methods work when one has contamination in y only.

?Contamination1.The contaminated data points were generated as fol-lows:u i~U(0,1),x i=x i and

y i= y i if u i≤0.90,

a new observation from Bi(1,0.5)if u i>0.90.?Contamination2.For each generated data set,we chose ten“design points”with H(β0x i+η0(t i))>0.99,where H(u)=1/(1+exp(?u)),so at those points,the conditional mean of y given the covariates is not close to0.5.We then contaminate y as in Contamination1,but only at those ten points.Of those ten points,about half are expected to be outliers with large Pearson residuals.

?Contamination3.Here,we considered a contamination with bad lever-age points by using u i~U(0,1),

x i= x i if u i≤0.90,

a new observation from N(10,1)if u i>0.90,

y i= y i if u i≤0.90,

a new observation from Bi(1,0.05)if u i>0.90.

Both the original and the contaminated data sets were analyzed using the three competing https://www.sodocs.net/doc/671786766.html,ing a bandwidth of h n=0.1,we sum-marized the results in Table3based on100Monte Carlo samples.The bandwidth was chosen to be smaller than that used in Study1because we have200distinct observed values of t here,as compared to ten in the earlier study.Table3shows the poor performance of the classical estimates ofβ, especially under contamination C3.Under C1,most contaminated y do not result in large Pearson residuals and the robust estimators RQL and MOD can improve the nonrobust estimator somewhat,but not as signi?cantly as under C2and C3.With respect to the estimation ofη,all procedures seem to be stable because the magnitude of outlying y is very limited in this case.

Our studies show the good performance of the two families of robust esti-mators considered here in the presence of outliers.The MOD method often shows smaller bias for estimatingβ,but its mean square error is usually similar to that of RQL.

ROBUST SEMIPARAMETRIC REGRESSION 15

APPENDIX

A.1.Proof of the consistency results.

Proof of Theorem 3.1.(a)Let Z i (a,β)=ρ(y i ,x T i β+a )w 1(x i ),

R 1n (a,β,t )=(nh n )

?1n i =1Z i (a,β)K ((t ?t i )/h n ),R 0n (t )=(nh n )?1n i =1

K ((t ?t i )/h n ).Then S n (a,β,t )=R 1n (a,β,t )/R 0n (t ),which implies that

sup β∈K a ∈R

S n (a,β,·)?S (a,β,·) 0,∞

≤ sup β∈K a ∈R

R 1n (a,β,·)?E (R 1n (a,β,·)) 0,∞+sup β∈K a ∈R

E (R 1n (a,β,·))?S (a,β,·)E (R 0n (·)) 0,∞+ ρ ∞ w 1 ∞ R 0n ?E (R 0n ) 0,∞

inf t ∈T 0R 0n (t ) ?1,where ρ ∞=sup (y,a )|ρ(y,a )|and w 1 ∞=sup x |w 1(x )|.

Since E (R 0n (t ))= K (u )f T (t ?uh n )du >A 1(T δ),it is enough to show that

sup β∈K a ∈R R 1n (a,β,·)?E (R 1n (a,β,·)) 0,∞ a.s.

?→0,(A.1)

R 0n ?E (R 0n ) 0,∞ a.s.

?→0,(A.2)sup β∈K a ∈R E (R 1n (a,β,·))?S (a,β,·)E (R 0n (·)) 0,∞→0.(A.3)Assumptions C2–C4imply (A.2);see [20],page 35.On the other hand,

(A.3)follows easily from the boundness of ρ,the integrability of the ker-nel,the equicontinuity condition C5and the fact that h n →0.In order to prove (A.1),let us consider the class of functions

F n ={f t,a,β,h n (y,x ,v )=B ?1ρ(y,x T β+a )w 1(x )K t,h n (v )},

with B = ρ ∞ w 1 ∞ K ∞and K t,h n (v )=K ((t ?v )/h n ).Using the fact that the graphs of translated kernels K t,h n have polynomial discrimina-tion,inequality 0≤K t,h n ≤ K ∞and assumption (ii),we obtain that

16G.BOENTE,X.HE AND J.ZHOU

N (ε,F n ,L 1(Q ))≤A 1ε?W 1for any probability Q and 0<ε<1,where A 1and W 1do not depend on n .Since for any f t,a,β,h n ∈F n ,|f t,a,β,h |≤1and E (f 2t,a,β,h n

(y,x ,v ))≤h n K ?1∞ f T ∞,Theorem 37in [20]and C4imply that (h n )?1sup F n n ?1n i =1

f t,a,β,h n (y i ,x i ,t i )?Ef t,a,β,h n (y 1,x 1,t 1) a.s.?→0,which concludes the proof of (A.1).

(b)The continuity of ηβ(t )implies that ηβ(t )is bounded for t ∈T 0and β∈K and,thus,that there exists a compact set A (T 0,K )such that ηβ(t )∈A (T 0,K )for any t ∈T 0and β∈K .Assume that sup β∈K ?ηβ?ηβ 0,∞does not converge to 0in a set ?0with P (?0)>0.Then for each ω∈?0,we have that there exists a sequence (βk ,t k )such that t k ∈T 0,βk ∈K and ?ηβk (t k )?ηβk (t k )→c =0.Since T 0and K are compact,without loss of generality we can assume that t k →t L ∈T 0and βk →βL ∈K and hence obtain that ηβk (t k )→ηβL (t L ),implying that ?ηβk (t k )?ηβL (t L )→c .When c <∞,the same steps as those used in Lemma A1of [11]lead to a contradiction.If c =∞,we have that ?ηβk (t k )→∞.By assumption,we have that

0

β∈K t ∈T 0 lim |a |→∞S (a,β,t )?S (ηβ(t ),β,t )

and so lim |a |→∞S (a,βL ,t L )?S (ηβL (t L ),βL ,t L )≥i .Thus,for k su?ciently large,S (?ηβk (t k ),βL ,t L )>S (ηβL (t L ),βL ,t L )+i/2.The equicontinuity con-dition implies that given ε>0,for k su?ciently large,S (ηβL (t L ),βk ,t k )≤S (ηβL (t L ),βL ,t L )+ε/4and S (?ηβk (t k ),βL ,t L )≤S (?ηβk (t k ),βk ,t k )+ε/4,which from (a)and the de?nition of ?ηβ,implies that S (?ηβk (t k ),βL ,t L )≤S n (?ηβk (t k ),βk ,t k )+ε/2≤S n (ηβL (t L ),βk ,t k )+ε/2.Again using (a),we obtain S (?ηβk (t k ),βL ,t L )≤S n (ηβL (t L ),βk ,t k )+ε/2≤S (ηβL (t L ),βk ,t k )+3ε/4≤S (ηβL (t L ),βL ,t L )+ε.Hence,for k su?ciently large,S (ηβL (t L ),βL ,t L )+i/2

The next proposition states a general uniform convergence result which will be helpful in proving Theorems 3.2and 4.1.

We will begin by ?xing some notation.Denote by C 1(T )the set of contin-uously di?erentiable functions in T .Note that if S 1(a,β,τ)de?ned in (12)is continuously di?erentiable with respect to (a,τ),then ηβ∈C 1(T ).V (β)and H δ(β)denote neighborhoods of β∈K and ηβsuch that V (β)?K and

H δ(β)= u ∈C 1(T ): u ?ηβ ∞≤δ, ??t ηβ ∞≤δ

.

ROBUST SEMIPARAMETRIC REGRESSION17 Proposition A.1.Let(y i,x i,t i)be independent observations such that y i|(x i,t i)~F(·,μi),withμi=H(η0(t i)+x T iβ0)and V ar(y i|(x i,t i))=V(μi).

Assume that t i are random variables with distribution on T.Let g:R2→R be a continuous and bounded function,W(x,t):R p+1→R be such that E(|W(x,t)|)<∞andηβ(t)=η(β,t):R p+1→R be a continuous function of (β,t).De?ne L(y,x,t,β,v)=g(y,x Tβ+v(t))W(x,t)and E(β)= E0(L(y,x,t,β,ηβ)).Then

(a)E(n?1 n i=1L(y i,x i,t i,θ,v))→E(β)when θ?β + v?ηβ ∞→0,

(b)supθ∈K|n?1 n i=1L(y i,x i,t i,θ,ηθ)?E(L(y i,x i,t i,θ,ηθ))| a.s.?→0,

(c)supθ∈K,v∈H

1(β)|n?1 n i=1L(y i,x i,t i,θ,v)?E(L(y i,x i,t i,θ,v))| a.s.?→0 if,in addition,T is compact andηβ∈C1(T).

Proof.(a)follows from the dominated convergence theorem.The proofs of(b)and(c)follow using the continuity ofηβand g,Theorem3in Chap-ter2of[20],the compactness of K and H1(β)and analogous arguments to those considered in[2].

Remark A.1.Proposition A.1implies that for any weakly consistent estimate?ηβofηβsuch that sup t∈T|(?/?t)?ηβ(t)?(?/?t)ηβ(t)| a.s.?→0and sup t∈T|?ηβ(t)?ηβ(t)| a.s.?→0,we have(1/n) n i=1H(y i,x i,t i,β,?ηβ) a.s.?→E(β).

?→by p?→.

An analogous result can be obtained replacing a.s.

Proof of Theorem3.2.(a)De?ne

1

?F

n(β)=

18G.BOENTE,X.HE AND J.ZHOU

A.2.Proof of the asymptotic normality of the regression estimates.For the sake of simplicity,we denote

χ(y,a)=?

?a2

Ψ(y,a),

?υ(β,t)=?ηβ(t)?ηβ(t),?υ0(t)=?υ(β0,t), (A.5)

?v j(β,t)=

??υ(β,t)

?βj?β?

ηβ β=β1??2

ηβ(t) β=β0 x+?

?β?βT

ηβ(t) Tβ=β0 w2(x) . N4.The matrixΣis positive de?nite with

Σ=E0 Ψ2(y,x Tβ0+η0(t))w22(x) x+?

ROBUST SEMIPARAMETRIC REGRESSION19× x+?

?βj

?ηβ β=?β??

?βj?β?

?ηβ β=?β??2

?t ?ηβ(t)=?

(nh n2)?1 n i=1K′((t?t i)/h n)Ψ(y i,x T iβ+?ηβ(t))

?βj ?ηβ(t)=?

(nh n)?1 n i=1K((t?t i)/h n)χ(y i,x T iβ+?ηβ(t))x ij

20G.BOENTE,X.HE AND J.ZHOU

rate of convergence for the regression estimates.More precisely,assump-

tion N1(c)avoids the bias term and ensures that G n(?ηβ

0)will behave asymp-

totically as G n(ηβ

0),where for anyβ∈R p and any di?erentiable function

υβ(t)=υ(β,t):R p+1→R,

G n(υβ)=1

n

n

i=1Ψ(y i,x T iβ0+υβ0(t i)) x i+?

ηβ(τ) w1(x) t=τ =0.

Moreover,if either w2≡w1or N5(a)holds,then

A=E0 χ(y,x Tβ0+η0(t)) x+??βηβ(t) β=β0 T w2(x) . Therefore,ifΨ(y,u)is strictly monotone in u and P(w2(x)>0)=1,then N3

holds,that is,A will be nonsingular unless P(a T[x+(?ηβ(t))/?β|β=β

0]=0)=1

for some a∈R p(i.e.,unless there is a linear combination of x which can be completely determined by t).

Assumption N6is used to ensure the consistency of the estimates of A based on preliminary estimates of the regression parameterβand of the functionsηβ.

Lemma A.1.Let(y i,x i,t i)be independent observations such that y i|(x i,t i)~F(·,μi)withμi=H(η0(t i)+x T iβ0)and V ar(y i|(x i,t i))=V(μi). Assume that t i are random variables with distribution on a compact set T

and that N1–N3and N6hold.Let?βbe such that?βp?→β.Then A n p?→A,

where A is given in N3,?z i(?β)=x i+(??ηβ(t i))/?β|

β=?β

and

A n=n?1

n

i=1χ(y i,x T i?β+?η?β(t i))?z i(?β)?z i(?β)T w2(x i)

相关主题