Start Over

JPEG2000-based scalable interactive video (JSIV)

Authors :: Naman, Aous Thabit
Publication Year :: 2010
Publisher :: UNSW Sydney, 2010.
Abstract: Video is considered one of the main applications of modern day's Internet. Despite its importance, the interactivity available from current implementations is limited to pause and random access to a set of predetermined access points. In this work, we propose a novel and innovative approach which provides considerably better interactivity and we coin the term JPEG2000-Based Scalable Interactive Video (JSIV) for it. JSIV relies on three main concepts: storing the video sequence as independent JPEG2000 frames to provide for quality and spatial resolution scalability, as well as temporal and spatial accessibility; prediction and conditional replenishment of precincts to exploit inter-frame redundancy; and loosely-coupled server and client policies. The concept of loosely-coupled client and server policies is central to JSIV. With these policies, the server optimally selects the number of quality layers for each precinct it transmits and decides on any side-information that needs to be transmitted while the client attempts to make most of the received (distorted) frames. In particular, the client decides which precincts are predicted and which are decoded from received data (or possibly filled with zeros in the absence of received data). Thus, in JSIV, a predicted frame typically has some of its precincts predicted from nearby frames while others are decoded from received intra-coded precincts; JSIV never uses frame differences or prediction residues. The philosophy behind these policies is that neither the server nor the client drives the video streaming interaction, but rather the server dynamically selects and sends the pieces that, it thinks, best serve the client needs and, in turn, the client makes most of the pieces of information it has. The JSIV paradigm postulates that if both the client and the server policies are intelligent enough and make reasonable decisions, then the decisions made by the server are likely to have the expected impact on the client's decisions. We solve the general JSIV optimization problem by employing Lagrange-style rate-distortion optimization in a two pass iterative approach. We show that this approach converges under workable conditions, and we also show that the optimal solution for a given rate is not necessarily embedded in the optimal solution for a higher rate. The flexibility of the JSIV paradigm enables us to use it in a variety of frame prediction arrangements. In this work, we focus only on JSIV with sequential prediction arrangement (similar to IPPP\ldots) and hierarchical B-frames prediction arrangement. We show that JSIV can provide the sought-after quality and spatial scalability in addition to temporal and spatial accessibility. We also demonstrate a novel way in which a JSIV client can use its cache in improving the quality of reconstructed video. In general, JSIV can serve a wide range of usage scenarios, but we expect that real-time and interactive applications, such as teleconferencing and surveillance, would benefit most from it. Experimental results show that JSIV's performance is slightly inferior to that of existing predictive coding standards in conventional streaming applications; however, JSIV produces significant improvements when its scalability and accessibility features, such as the region of interest, are employed.