Chuo-Ling Chang,Sangeun Han and Bernd Girod

Information Systems Laboratory,Department of Electrical Engineering,Stanford University



We propose a sender-based rate-distortion optimized framework to stream scalable bitstreams of3-D wavelet video stored at the sender to a remote receiver.Based on the requests and feedback from the receiver,the source rate-distortion profiles,the desired playout latency and transmission rate,and the network charac-teristics,the sender optimizes the responses sent to the receiver throughout the video playout session in order to minimize the dis-tortion in the reconstructed frames.Rate-distortion optimized re-sponse is formulated as a convex optimization problem,and an offline-computation approach is proposed to further reduce the complexity at the sender.Experimental results show that the pro-posed sender-based approach outperform the receiver-based ap-proach,and the offline-computation approach very closely approx-imates the fully-optimized approach.


Many attempts have been made to incorporate motion compensa-tion into the3-D wavelet video coding framework[1][2].Earlier works are somewhat unsatisfactory in terms of the rate-distortion coding performance because the motion vectorfield is severely re-stricted and the temporal transform is usually limited to the two-tap Haar wavelet.Recently,motion-compensated lifting has been pro-posed[3]-[5],which successfully incorporates unrestricted motion compensation into3-D wavelet coding and achieves compression efficiency approaching the state-of-the-art predictive video coding schemes.However,despite the increasing interest in3-D wavelet video,efficient streaming of such data sets that exploits the rate-distortion performance as well as the inherent support for scalabil-ity is seldom addressed.

In[6],Chou and Miao proposed a framework for rate-distortion optimized packet scheduling of video and audio data. In general,it can be applied to streaming of the3-D wavelet video data set.However,in their framework,the data set has to be as-sembled into packets before the optimization for packet schedul-ing takes place.Therefore,the packet content cannot dynamically adapt to the network characteristics as well as the state of the data previously transmitted to the receiver.Additionally,the optimiza-tion process is formulated as a combinatorial problem which re-quires high complexity to solve.

We have previously proposed a receiver-based rate-distortion optimized framework for streaming of3-D wavelet video with low latency[7].The source rate-distortion profiles,the desired playout latency and transmission rate,and the network characteristics are 1This work was supported by Grant No.ECS-0225315of the National Science Foundation,Philips Corporation,and the Max Planck Institute.all taken into account to optimize the streaming strategy.Addi-tionally,due to thefine scalability property of embedded wavelet coefficient coding,the optimization process is approximated as a convex optimization problem,which can be efficiently solved by standard optimization techniques.

In the receiver-based framework,it is the responsibility of the receiver to efficiently select the data to be retrieved from the sender.Therefore,the rate-distortion profile of the coded data has to be available at the receiver ahead of the video playout session. In addition,the losses and excess delay in the backward channel experienced by the requests issued from the receiver could block the transmission.In this paper,we describe a sender-based frame-work where the sender selects the data to be transmitted.To reduce the computational complexity at the sender,an offline-computation approach is proposed.

The remainder of the paper is organized as follows:In Sec-tion2,we briefly describe the structure of the3-D wavelet video coding scheme adopted in this work.In Section3,we present the proposed sender-based framework for streaming the3-D wavelet video over networks.The offline-computation approach is dis-cussed in Section4.Finally,experimental results are presented in Section5.



In this work,a3-D wavelet coder using motion-compensated lift-ing is adopted to encode the video sequence[3]-[5].A multi-level temporal discrete wavelet transform(DWT)implemented using motion-compensated lifting isfirst applied across the video frames to decompose them into temporal subbands,followed by a multi-level2-D spatial DWT decomposing the temporal subbands into wavelet coefficients.The SPIHT(Set Partitioning in Hierarchical Trees)[8]algorithm isfinally applied to encode the wavelet co-efficients of each subband into a scalable bitstream.The SPIHT algorithm provides a scalable representation so that different re-construction qualities of the video frames can be obtained by trun-cating the coded bitstreams at different lengths.

To decode a particular video frame,only a few subbands rel-evant to synthesizing the frame need to be reconstructed.The truncated bitstreams of these subbands available at the decoder are decoded into reconstructed wavelet coefficients by the inverse SPIHT algorithm.Then the inverse2-D spatial DWT is applied to reconstruct the temporal subbands.Finally,the inverse motion-compensated lifting procedure is performed to carry out the inverse temporal DWT in order to reconstruct the video frame.