搜档网
当前位置:搜档网 › Depth Super Resolution by Rigid Body Self-Similarity in 3D_CVPR2013

Depth Super Resolution by Rigid Body Self-Similarity in 3D_CVPR2013

Depth Super Resolution by Rigid Body Self-Similarity in 3D

Michael Horn′a ˇc

ek 1,*,Christoph Rhemann 2,Margrit Gelautz 1,and Carsten Rother 21

Vienna University of Technology

2

Microsoft Research Cambridge

Abstract

We tackle the problem of jointly increasing the spatial resolution and apparent measurement accuracy of an input low-resolution,noisy,and perhaps heavily quantized depth map.In stark contrast to earlier work,we make no use of ancillary data like a color image at the target resolu-tion,multiple aligned depth maps,or a database of high-resolution depth exemplars.Instead,we proceed by identi-fying and merging patch correspondences within the input depth map itself,exploiting patchwise scene self-similarity across depth such as repetition of geometric primitives or object symmetry.While the notion of ‘single-image’super resolution has successfully been applied in the context of color and intensity images,we are to our knowledge the ?rst to present a tailored analogue for depth images.Rather than reason in terms of patches of 2D pixels as others have before us,our key contribution is to proceed by reason-ing in terms of patches of 3D points,with matched patch pairs related by a respective 6DoF rigid body motion in 3D.In support of obtaining a dense correspondence ?eld in reasonable time,we introduce a new 3D variant of Patch-Match.A third contribution is a simple,yet effective patch upscaling and merging technique,which predicts sharp ob-ject boundaries at the target resolution.We show that our results are highly competitive with those of alternative tech-niques leveraging even a color image at the target resolu-tion or a database of high-resolution depth exemplars.

1.Introduction

With the advent of inexpensive 3D cameras like the Microsoft Kinect,depth measurements are becoming in-creasingly available for low-cost applications.Acquisitions made by such consumer 3D cameras,however,remain af-?icted by less than ideal attributes.Random errors are a

?Michael

Horn′a ˇc

ek is funded by Microsoft Research through its Euro-pean Ph.D.scholarship programme.

Figure 1.Shaded mesh of nearest neighbor upscaling (top)of a noiseless synthetic input depth map and the output of our algo-rithm (bottom),both by a factor of 3.In our approach,?ne details such as the penguin’s eyes,beak,and the subtle polygons across its body are mapped from corresponding patches at lesser depth,and boundaries appear more natural.

common problem.Low spatial resolution is an issue partic-ularly with time of ?ight (ToF)cameras,e.g.,200×200for the PMD CamCube 2.0or 176×144for the SwissRanger SR3000.In depth maps recovered using stereo techniques,depth resolution decreases as a function of increasing depth from the https://www.sodocs.net/doc/9216872061.html,mon avenues to jointly increasing the spatial resolution and apparent measurement accuracy of a depth map—a problem referred to as depth super res-olution (SR)—involve leveraging ancillary data such as a color or intensity image at the target resolution,multiple aligned depth maps,or a database of high-resolution depth exemplars (patches).Such ancillary data,however,is often unavailable or dif?cult to obtain.

In this work,we consider the question of how far one can push depth SR using no ancillary data,proceeding in-stead by identifying and merging patch correspondences from within the input depth map itself.Our observation is that—even in the absense of object repetition of the sort exempli?ed in Figure 1—real-world scenes tend to exhibit patchwise ‘self-similarity’such as repetition of geometric primitives (e.g.,planar surfaces,edges)or object symme-try (consider a face,a vase).Man-made scenes or objects

Figure2.In the depth map(left),the three pairs of2D patches depicted are dissimilar with respect to depth values.In the cor-responding point cloud(right),the analogous3D patch pairs are similar as point sets related by appropriate rigid body motions in 3D.The inlet shows part of the vase,contrast-stretched for greater clarity;pixel noise is clearly visible.

are often‘self-similar’by design;consider,for instance,the keys of a keyboard.It is primarily this observation that we exploit in this paper,coupled with the fact that under per-spective projection,an object patch at lesser depth with re-spect to the camera is acquired with a higher spatial reso-lution than a corresponding patch situated at greater depth. The key contribution of our work is to proceed not by rea-soning in terms of patches of2D pixels,but rather in terms of patches of3D points.It is reasoning in this manner that allows us to exploit scene self-similarity across depth.In addition,we introduce a new3D variant of PatchMatch to obtain a dense correspondence?eld in reasonable time and a simple,yet effective patch upscaling and merging tech-nique to generate the output SR depth map.

The notion of‘single-image’SR has already successfully been applied in the context of color and intensity images in the work of Glasner et al.[8].Their guiding observation is that within the same image there is often a large across-scale redundancy at the2D pixel patch level;for instance, an image of a leafy forest is likely to contain a large num-ber of small patches with various con?gurations of greens and browns that happen to recur across scales of the image. Their strategy is to search for corresponding5×5pixel patches across a discrete cascade of downscaled copies of the input image and to exploit sub-pixel shifts between cor-respondences.An SR framework reasoning in terms of small n×n pixel patches,however,faces serious problems in the context of depth SR.Figure2illustrates three funda-mental problems of matching3D points using n×n pixel patches:patch pairs(i)are situated at different depths or (ii)are subject to projective distortions owing to perspec-tive projection,or(iii)they straddle object boundaries.The problem of projective distortions calls for a small patch size, which renders matching particularly sensitive to noise.We overcome these problems by reasoning in terms of3D point patches,which we de?ne as the respective inliers—from among the3D points of the input depth map—within a?xed radius r of a center point and which we match with respect to3D point similarity over6DoF rigid body motions in3D.

1.1.Related Work

A number of surveys of image SR techniques are avail-able elsewhere, e.g.,van Ouwerkerk[19]or Tian and Ma[18].Glasner et al.[8],Yang et al.[20],and Free-man and Liu[7]are image SR techniques against which we compare our algorithm in Section3,by treating input depth maps as intensity images.Freeman and Liu et al.and Yang et al.both rely on an external patch database.

Previous work on depth SR can broadly be categorized into methods that(i)use a guiding color or intensity im-age at the target resolution,(ii)merge information contained in multiple aligned depth maps,or(iii)call on an external database of high-resolution depth exemplars.We devote the remainder of this section to a discussion of representative or seminal techniques from the depth SR literature.

Image at Target Resolution.The most common depth SR strategy involves using an ancillary color or intensity image at the target resolution to guide the reconstruction of the SR depth map.The underlying assumption is that changes in depth are colocated with edges in the guiding image.Yang et al.[21]apply joint bilateral upscaling on a cost volume constructed from the low resolution input depth map,followed by Kopf et al.[11]in a more general frame-work.Diebel and Thrun[5]propose an MRF-based ap-proach with a pairwise smoothness term whose contribution is weighted according to the edges in the high-resolution color image.Park et al.[13]take this idea further and use a non-local,highly-connected smoothness term that better preserves thin structures in the SR output.

Multiple Depth Maps.The Lidarboost approach of Schuon et al.[17]combines several depth maps acquired from slightly different viewpoints.The Kinectfusion ap-proach of Izadi et al.[10]produces outstanding results by fusing a sequence of depth maps generated by a tracked Kinect camera into a single3D representation in real-time. Database of Depth Exemplars.Most closely akin to ours is the work of Mac Aodha et al.[12].They propose to assemble the SR depth map from a collection of depth patches.Our approach likewise carries out depth SR by example,but with signi?cant differences.One major dif-ference is that we use patches only from within the input depth map itself,whereas Mac Aodha et https://www.sodocs.net/doc/9216872061.html,e an exter-nal database of5.2million high-resolution synthetic,noise-free patches.Another difference is that they carry out their matching in image space over3×3pixel patches,while ours can have arbitrary size depending on the scale,den-sity,and relative depth of point features one aims to capture.

Figure3.The rigid body motion g relating the3D point patches S x,S x?R3.The point P x∈R3is the pre-image of the pixel x of the input depth map.Note that the center point P x=g(P x) of the closer patch is by design not required to be one of the3D points of the input depth map,hence P x∈S x in general. Accordingly,their approach is subject to the problems dis-cussed in Section1that our reasoning in terms of3D point patches overcomes.Note that enlarging the patches in their database would lead to an explosion of its size.

2.Algorithm

Owing to the perspective projection that underlies image formation,object patches situated at a lesser depth with re-spect to the camera are imaged with a higher spatial resolu-tion(i.e.,a greater point density)than corresponding object patches at greater depth.Our depth SR algorithm consists of two steps:(i)?nd,for each patch in the input depth map, a corresponding patch at lesser or equal depth with respect to the camera,and(ii)use the dense correspondence?eld to generate the SR output.We begin,in Section2.1,by pre-senting our notion of‘3D point patch’and the matching cost we propose to minimize.Next,we detail the?rst step of our algorithm in Section2.2,and the second in Section2.3. 2.1.3D Point Patches

Let g=(R,t)∈SE(3)denote a6DoF rigid body mo-tion in3D,where R∈SO(3)and t∈R3.Let x=(x,y) be a pixel of the input depth map.The goal of the dense correspondence search algorithm in Section2.2is to?nd an optimal rigid body motion g for each pixel x,mapping the patch corresponding to x to a valid matching patch at lesser or equal depth with respect to the camera.We shall under-stand the patch corresponding to x—the further1patch,for brevity—to be the set S x?R3of3D points within a ra-dius r of the pre-image P x=Z x·K?1(x ,1) ∈R3of x, where Z x is the depth encoded at x in the input depth map and K is the3×3camera calibration matrix(cf.Hartley and Zisserman[9]).We carry out radius queries using a kd-tree.The3D points of the corresponding closer patch S x 1We acknowledge that this is something of an abuse of terminology, since two points can be situated at equal depth with respect to the camera but be at different distances from it.Notwithstanding,it is in this sense that we shall mean‘closer’and‘further’in this paper.are those within the same radius r of the point P x=g(P x). An illustration of these notions is provided in Figure3.

Matching Cost.A common strategy for evaluating the similarity of two point sets is to compute the sum of squared differences(SSD)over each point in one point set with respect to its nearest neighbor(NN)point in the other (cf.Rusinkiewicz and Levoy[14]).We proceed in a sim-ilar manner,but normalize the result and allow for comput-ing SSD in both directions in order to potentially obtain a stronger similarity measure,noting that we might be com-paring point sets with signi?cantly different point densities owing to relative differences in patch depth.Let NN S(P) denote the function that returns the nearest neighbor to the point P in the set S.The function c b(x;g)evaluates nor-malized SSD over the points of the further patch S x sub-ject to each point’s respective nearest neighbor among the ‘backward’-transformed points g?1(S x)of the closer patch:

c b(x;g)=

P∈S x

P?NN g?1(S

x

)

(P)

2

2

/|S x|.(1)

Analogously,the function c f(x;g)evaluates normalized SSD over the points of the closer patch S x subject to their respective nearest neighbors among the‘forward’-transformed points g(S x)of the further patch:

c f(x;g)=

P ∈S x

P ?NN g(S

x

)

(P )

2

2

/|S x|.(2)

For g to be deemed valid at x,we require that the depth of the sphere center point of the matched patch be less than or equal to that of the pre-image of x.Moreover,we require that their relative distance be at least r in order to avoid minimizing cost trivially by matching to oneself,and that |S x|≥|S x|≥3to bene?t from greater point density or from sub-pixel point shifts at equal density,and for reasons discussed below.Given a pixel x and a rigid body motion g, we compute the matching cost c(x;g)according to

c(x;g)=

α·c b(x;g)+α ·c f(x;g)if valid

∞otherwise

,

(3) whereα∈[0,1]andα =1?α.

2.2.Dense Correspondence Search

We introduce a new3D variant of the PatchMatch algo-rithm(cf.Barnes et al.[1])in the aim of assigning to each pixel x of the input depth map a6DoF rigid body motion in3D,mapping S x to a valid matching patch S x at equal or lesser depth with respect to the camera.PatchMatch was ?rst introduced as a method for obtaining dense approxi-mate nearest neighbor?elds between pairs of n×n pixel patches in2D,assigning to each pixel x in an image A

Figure 4.A ?lled variant of the disparity map of the Middlebury Cones data set (left)as input and a visualization of projected 3D displacements of the output of our dense correspondence search using conventional optical ?ow coloring,both overlayed sparsely with arrows for greater clarity (right).Note that cone tips map to one another and that the ?ow ?eld is spatially coherent.

a displacement vector mapping the patch centered at x to a matching patch in an image B with the objective of re-constructing one image in terms of patches from the other.Although PatchMatch has since been generalized and ap-plied to a variety of other problems (cf.Barnes et al .[2]or Besse et al .[3]),a common thread between variants of PatchMatch—in which ours is no exception—is a random (or semi-random)initialization step followed by i iterations of propagation and re?nement.We explain each step in greater detail in the remainder of this section.An exam-ple of a projected displacement ?eld obtained using our 3D variant of PatchMatch is shown in Figure 4.

Initialization.In contrast to PatchMatch variants that carry out initialization using altogether random states,we adopt a semi-random initialization strategy.In our experi-ments,we found this led to faster convergence when deal-ing with our high-dimensional state space.Speci?cally,for each pixel x we randomly select another pixel x of the input depth map such that the depth of P x is less than or equal to that of P x ,giving us a translation vec-tor (3DoF).We then compute the rotation minimizing arc length between the patch normal vector at P x and that at P x (2DoF),and choose a random angular perturbation around the normal of P x (1DoF).We pack these elements into a rigid body motion.A normal vector for each P x is precomputed via RANSAC plane ?tting over the 3D points in S x (and is the reason why we require that |S x |≥3in Section 2.1),which is made to point towards the camera.Propagation.In keeping with classical PatchMatch (cf.Barnes et al .[1]),we traverse the pixels x of our in-put depth map in scanline order—upper left to lower right for even iterations,lower right to upper left for odd—and adopt the rigid body motion assigned to a neighboring pixel if doing so yields an equal or lower cost.Note that as a consequence,we propagate over pixels for which |S x |<3,which we treat as so-called ?ying pixels,since such pixels are always assigned in?nite cost by c (x ;g )in (3).

Re?nement.Immediately following propagation at a given pixel x ,we independently carry out k iterations of additional initialization and of perturbation of the transla-tional and rotational components of g x ,adopting the ini-tialization or perturbation if doing so yields an equal or lower cost.Translational perturbation (3DoF)consists of checking whether hopping from P x to one of its k -NN points P x —which we obtain by again making use of a kd -tree—yields an equal or lower cost.Rotational perturbation,which we carry out in a range that decreases with every it-eration k ,consists of random rotation around the normal at P x (1DoF)and of random perturbation of the remaining two degrees of freedom of the rotation.We carry out and evaluate all three types of perturbations independently.

2.3.Patch Upscaling and Merging

Having assigned a motion g x ∈SE (3)to each pixel x of the input depth map,we generate an SR depth map by merg-ing interpolated depth values of the ‘backward’-transformed

points g ?1

x (S x

)of each valid matched patch.We begin,for each x ,by (i)determining—with the help of contour polygonalization—the spatial extent of S x at the target res-olution,giving an ‘overlay mask’over which we then (ii)generate an ‘overlay patch’by interpolating depth values

from the points g ?1

x (S x

).Next,we (iii)populate the SR depth map by merging the interpolated depth values of over-lapping overlay patches,with the in?uence of each valid overlay patch weighted as a function of patch similarity.Fi-nally,we (iv)clean the SR depth map in a postprocessing step,removing small holes that might have arisen at object boundaries as a consequence of polygonalization.Overlay Masks.The 2D pixels x of the input depth map to which the 3D points of S x project de?ne the spatial ex-tent of S x at the input resolution (cf.Figure 5).It is only these pixels,at the input resolution,that the ‘backward’-transformed points g ?1x (S

x )of the matched patch are al-lowed to in?uence,since it is over these pixels that we com-pute the matching cost.Upscaling the mask by the SR fac-tor using NN interpolation gives a mask at the target res-olution,but introduces disturbing jagged edges.Accord-ingly,we carry out a polygon approximation (cf.Douglas and Peucker [6])of this NN upscaled mask,constrained such that approximated contours be at a distance of at most the SR factor—corresponding to a single pixel at the input resolution—from the NN upscaled contours.We ignore re-covered polygonalized contours whose area is less than or equal to the square of the SR factor,thereby removing ?ying pixels.This polygonalized mask—to which we refer as the

overlay mask of x —consists of all SR pixels ?x

that fall into one of the remaining polygonalized contours but fall into no contour that is nested inside another,in order to handle holes like in the lamp in Figure 5.

Figure5.Overlay Masks.Left:A patch of points S x in the input point cloud and its corresponding pixel mask in the raster of the input depth map(depicted in yellow).Center:NN upscaling of the depth map and mask by a factor of2.Right:Corresponding polygon approximation of the NN upscaled mask,which we term the‘overlay mask’corresponding to x.In the merging step,it is only the SR pixels?x of the overlay mask of x that the‘backward’-transformed points g?1

x

(S x)of the matched patch are allowed to in?uence.

Overlay Patches.We interpolate,for the SR pixels?x of

the overlay mask corresponding to x,depth values from

the‘backward’-transformed points g?1x(S x).Since points

transformed according to a rigid body motion in3D are not

guaranteed to project to a regular grid in general,we inter-

polate over the depth values of these transformed points us-

ing barycentric coordinates on a Delaunay triangulation of

their projections to image space(cf.Sambridge et al.[15]).

Merging.The SR depth map is computed by working

out,for each SR pixel?x,a weighted average of the cor-

responding interpolated depth values from the overlapping

overlay patches.The weightωx of the interpolated depth

values of the overlay patch assigned to x is given by

exp(?γ·c b(x;g x)),whereγ∈R+controls the falloff to

0.If c b(x;g x)>β,β∈R+,we instead use the over-

lay patch at x given by the identity motion,ensuring that

patches for which no good match was found do not undergo

heavy degradation.We check against c b(x;g x)from(1)

since it gives an indication of how satis?ed the input points

are with the match without penalizing the addition of new

detail from S x.As in Section2.2,if|S x|<3then we

consider x a?ying pixel,and setωx=0.

Postprocessing.Since our polygon approximation guar-

antees only that the outlines of the polygon be within the

SR factor of the outlines of the NN upscaled mask,it is

possible that no overlay mask cover a given SR pixel.Such

holes can be?lled using morphological dilation carried out

iteratively,with the dilation affecting only pixels identi?ed

as holes.Another possible cause for holes is if pixels within

an overlay mask could not be interpolated owing to the spa-

tial distribution of the projected points.In that event,we

dilate within the overlay mask with highest weight,again

only over pixels identi?ed as holes.Note that no postpro-

cessing was performed in the output in Figure1.

3.Evaluation

We evaluate our method using depth data from stereo,

ToF,laser scans and structured light.We carry out a quan-

titative evaluation in Section3.1,and provide a qualitative

evaluation in the section thereafter.Unless otherwise stated,

we performed no preprocessing.In all our experiments,we

carried out5iterations of PatchMatch,with k=3,and

setα=0.5.Setting appropriate parameters r,β,andγis

largely intuitive upon visualization of the input point cloud,

and depends on the scale,density,and relative depth of point

features one aims to capture.In Section3.1,all algorithm

parameters were kept identical across Middlebury and laser

scan tests,respectively.We give additional information on

parameters and show additional results on our website.

3.1.Quantitative Evaluation

Following the example of Mac Aodha et al.[12]we pro-

vide a quantitative evaluation of our technique on Cones,

Teddy,Tsukuba and Venus of the Middlebury stereo data

set(cf.Scharstein and Szeliski[16]).For Middlebury tests,

we ran our algorithm on?lled ground truth data—the same

used in Mac Aodha et al.—downscaled by NN interpo-

lation by a factor of2and4and subsequently super re-

solved by the same factor,respectively,which we compare

to ground truth.Table1shows root mean squared error

(RMSE)scores.Among depth SR methods that leverage a

color or intensity image at the target resolution,we compare

against Diebel and Thrun[5]and Yang et al.[21];among

techniques that make use of an external database we com-

pare against Mac Aodha et al.,and against Yang et al.[20]

and Freeman and Liu[7]from the image SR literature.We

also compare against the approach of Glasner et al.[8].We

compare against NN upscaling to provide a rough baseline,

although it introduces jagged edges and does nothing to im-

prove the apparent depth measurement accuracy.Table1

also gives RMSE scores for three depth maps obtained from

laser scans detailed in Mac Aodha et al.,which we down-

scaled and subsequently super resolved by a factor of4.For

the laser scans we compare to the original resolution since

ground truth data was not available.In Table2,we pro-

vide percent error scores—giving the percentage of pixels

for which the absolute difference in disparity exceeds1—

for Middlebury.All RMSE and percent error scores were

computed on8bit disparity maps.The data sets—with the

exception of results on the algorithm of Glasner et al.[8]—and the code used in carrying out the quantitative evaluation are from Mac Aodha et al.[12]and were generously pro-vided by the authors.2

Although popular in the depth SR literature,RMSE scores over depth or disparity maps are dominated by mis-assignments at the boundaries of objects separated by large depth differences;given two data sets with equal percent error,a data set where boundaries are gently blurred will have lower RMSE than one with boundaries that are sharp. Even so,our RMSE scores fare highly competitively with those of alternative techniques.In percent error,we are the top performer among example-based methods,and on a few occasions outperform the image-guided techniques.

3.2.Qualitative Evaluation

In Figure6we show results on a data set of two similar egg cartons situated at different depths,obtained using the stereo algorithm of Bleyer et al.[4].Our result is visually superior to that of our competitors,and is the only one to succeed in removing noise.Note the patch artefacts for Mac Aodha et al.in the zoom.In Figure7,we consider a noisy ToF data set from[12].We see that although our depth map appears pleasing,it in fact remains gently noisy if shaded as a mesh,owing to the great deal of noise in the input. However,if we apply the same bilateral?ltering as Mac Aodha et al.[12],our result when shaded—although not as smooth over the vase—preserves edges better(e.g.,at the foot)without introducing square patch artefacts.Note that Glasner et al.do not succeed in removing visible noise in their depth map,and introduce halo artefacts at the bound-aries.Figure8provides a comparison over the noiseless, yet quantized Cones data set.Note that although Glasner et al.[8]perform well in RMSE,their method produces poor object boundaries.

4.Conclusion

Inspired by the work of Glasner et al.[8]on single-image super resolution for color and intensity images,we presented a tailored depth super resolution algorithm that makes use of only the information contained in the input depth map.We introduced a new3D variant of PatchMatch for recovering a dense matching between pairs of closer-further corresponding3D point patches related by6DoF rigid body motions in3D and presented a technique for up-scaling and merging matched patches that predicts sharp ob-ject boundaries at the target resolution.In our evaluation, we showed our results to be highly competitive with meth-ods leveraging ancillary data.

2The RMSE scores published in Mac Aodha et al.[12]were subject to a subtle image resizing issue.Details and updated numbers are available at https://www.sodocs.net/doc/9216872061.html,/pubs/ depthSuperRes/supp/index.html.References

[1] C.Barnes,E.Shechtman,A.Finkelstein,and D.B.Gold-

man.PatchMatch:A randomized correspondence algorithm for structural image editing.SIGGRAPH,2009.

[2] C.Barnes,E.Shechtman,D.B.Goldman,and A.Finkel-

stein.The generalized PatchMatch correspondence algo-rithm.In ECCV,2010.

[3] F.Besse,C.Rother,A.Fitzgibbon,and J.Kautz.Pmbp:

Patchmatch belief propagation for correspondence?eld esti-mation.In BMVC,2012.

[4]M.Bleyer,C.Rhemann,and C.Rother.PatchMatch Stereo

-Stereo matching with slanted support windows.In BMVC, 2011.

[5]J.Diebel and S.Thrun.An application of markov random

?elds to range sensing.In NIPS,2005.

[6] D.Douglas and T.Peucker.Algorithms for the reduction of

the number of points required to represent a digitized line or its caricature.Cartographica,10(2):112–122,1973.

[7]W.T.Freeman and C.Liu.Markov random?elds for super-

resolution and texture synthesis.In Advances in Markov Random Fields for Vision and Image Processing.MIT Press, Berlin,2011.

[8] D.Glasner,S.Bagon,and M.Irani.Super-resolution from a

single image.In ICCV,2009.

[9]R.I.Hartley and A.Zisserman.Multiple View Geometry

in Computer Vision.Cambridge University Press,second edition,2004.

[10]S.Izadi,D.Kim,O.Hilliges,D.Molyneaux,R.Newcombe,

P.Kohli,J.Shotton,S.Hodges,D.Freeman,A.Davison,and

A.Fitzgibbon.Kinectfusion:real-time3D reconstruction

and interaction using a moving depth camera.In UIST,2011.

[11]J.Kopf,M.F.Cohen,D.Lischinski,and M.Uyttendaele.

Joint bilateral upsampling.SIGGRAPH,2007.

[12]O.Mac Aodha,N.D.Campbell,A.Nair,and G.J.Bros-

tow.Patch based synthesis for single depth image super-resolution.In ECCV,2012.

[13]J.Park,H.Kim,Y.-W.Tai,M.Brown,and I.Kweon.High

quality depth map upsampling for3D-TOF cameras.In ICCV,2011.

[14]S.Rusinkiewicz and M.Levoy.Ef?cient variants of the ICP

algorithm.In3DIM,2001.

[15]M.Sambridge,J.Braun,and H.McQueen.Geophysi-

cal parametrization and interpolation of irregular data us-ing natural neighbours.Geophysical Journal International, 122(3):837–857,1995.

[16] D.Scharstein and R.Szeliski.High-accuracy stereo depth

maps using structured light.In CVPR,2003.

[17]S.Schuon,C.Theobalt,J.Davis,and S.Thrun.LidarBoost:

Depth superresolution for ToF3D shape scanning.In CVPR, 2009.

[18]J.Tian and K.-K.Ma.A survey on super-resolution imaging.

Signal,Image and Video Processing,5(3):329–342,2011.

[19]J.van Ouwerkerk.Image super-resolution survey.Image and

Vision Computing,24(10):1039–1052,2006.

[20]J.Yang,J.Wright,T.Huang,and Y.Ma.Image super-

resolution via sparse representation.IEEE Transactions on Image Processing,19(11):2861–2873,2010.

2x4x4x

Cones Teddy Tsukuba Venus Cones Teddy Tsukuba Venus Scan21Scan30Scan42 Nearest Neighbor 1.0940.8150.6120.268 1.531 1.1290.8330.3680.0180.0160.040 Diebel and

Thrun[5]0.7400.5270.4010.170 1.1410.8010.5490.243N/A N/A N/A Yang et al.[21]0.7560.5100.3930.1670.9930.6900.5140.216N/A N/A N/A Yang et al.[20] 2.027 1.4200.7050.992 2.214 1.5720.840 1.0120.0300.0350.054 Freeman and Liu[7] 1.4470.9690.6170.332 1.536 1.1100.8690.3670.0190.0170.075 Glasner et al.[8]0.8670.5960.4820.209 1.483 1.0650.8320.394 1.851 1.865 1.764 Mac Aodha et al.[12] 1.1270.8250.6010.276 1.504 1.0260.8330.3370.0170.0170.045 Our Method0.9940.7910.5800.257 1.399 1.1960.7270.4500.0210.0180.030

Table1.Root mean squared error(RMSE)scores.Yang et al.[20]and Freeman and Liu[7]are image SR methods and Mac Aodha et al.[12]a depth SR method,all of which require an external database.Diebel and Thrun[5]and Yang et al.[21]are depth SR methods that use an image at the target resolution.Glasner et al.[8]is an image SR technique that uses patches from within the input image.For most data sets,our method is competitive with the top https://www.sodocs.net/doc/9216872061.html,ser scan tests on the image-guided techniques were not possible for want of images at the target resolution.Best score is indicated in bold for the example-based methods,which we consider our main competitors.

2x4x

Cones Teddy Tsukuba Venus Cones Teddy Tsukuba Venus

Nearest Neighbor 1.713 1.548 1.2400.328 3.121 3.358 2.1970.609

Diebel and Thrun[5] 3.800 2.786 2.7450.5747.452 6.865 5.118 1.236

Yang et al.[21] 2.346 1.918 1.1610.250 4.582 4.079 2.5650.421

Yang et al.[20]61.61754.194 5.56646.98563.74255.0807.64947.053

Freeman and Liu[7] 6.266 4.660 3.2400.79015.07712.12210.030 3.348

Glasner et al.[8] 4.697 3.137 3.2340.9408.790 6.806 6.454 1.770

Mac Aodha et al.[12] 2.935 2.311 2.2350.536 6.541 5.309 4.7800.856

Our Method 2.018 1.862 1.6440.377 3.271 4.234 2.932 3.245

Table2.Percent error scores.Our method is the top performer among example-based methods and on a few occasions outperforms Diebel and Thrun[5]and Yang et al.[21].Results provided for Yang et al.[20]suffer from incorrect absolute intensities.

(a)Color image(b)Nearest neighbor(32bit)(c)Our result(32bit)

(d)Glasner et al.[8](8bit)(e)Mac Aodha et al.[12](preprocessed,32bit)

(b)(c)

(d)(e)

(f)Zooms

Figure6.2x nearest neighbor upscaling(b)and SR(c-e)on a stereo data set of two similar egg cartons obtained using the method of Bleyer et al.[4].Note that(e)was preprocessed using a bilateral?lter(window size5,spatial deviation0.5,range deviation0.001).

(a)Nearest

neighbor (b)Our

result

(c)Mac Aodha et al .[12](preprocessed)

(d)Glasner et al .[8](e)Yang et al .[20](f)Freeman and Liu [7]

(g)Nearest neighbor (pre-processed,32bit)

(h)Mac Aodha et al .[12](pre-processed,32bit)(i)Our result (preprocessed,32bit)(j)Our result (no preprocessing,

32bit)

(h)

(i)

(k)Zooms

Figure 7.Above,we provide zooms on a region of interest of the noisy PMD CamCube 2.0ToF data set shown in Figure 2for 4x nearest

neighbor upscaling in (a)and 4x SR otherwise.A depth map zoom for Mac Aodha et al .was available only with bilateral preprocessing (window size 5,spatial deviation 3,range deviation 0.1).Below,we show shaded meshes for the preprocessed result of Mac Aodha et al .[12]and for our method with and without the same preprocessing ((h)is not aligned with the other meshes because we obtained the rendering from the authors).Note that although we in (i)perform worse than (h)on the vase,we preserve ?ne detail better and do not introduce square patch artefacts.

(a)Nearest neighbor (b)Our result (c)Glasner et al .[8](d)Mac Aodha et al .[12]

(e)Nearest neighbor (32bit)(f)Our result (32bit)(g)Our result (8bit)

(h)Glasner et al .[8](8bit)(i)Mac Aodha et al .[12]

(8bit)

Figure 8.Above,zooms on a region of interest of the noiseless,though quantized Middlebury Cones data set.2x SR was carried out (in our case,using the parameters from the quantitative evaluation)on the 2x nearest neighbor downscaling of the original,depicted in (a).Our method produces the sharpest object boundaries.Below,the corresponding shaded meshes.We show our 8bit quantized mesh in (g)for comparison.Our method performs the best smoothing even after quantization (particularly over the cones),although it lightly smooths away the nose for the parameters used,which were kept the same for all Middlebury tests.We provide additional results on our website.[21]Q.Yang,R.Yang,J.Davis,and D.Nist′e r.Spatial-depth

super resolution for range images.In CVPR ,2007.

相关主题