A multimedia broker for ubiquitous and accessible rich media content transcoding

A multimedia broker for ubiquitous and accessible rich media content transcoding
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  See discussions, stats, and author profiles for this publication at: A multimedia broker for ubiquitous andaccessible rich media content transcoding Conference Paper  · January 2004 DOI: 10.1109/GLOCOMW.2004.1417571 · Source: IEEE Xplore CITATIONS 8 READS 25 2 authors: Paola SalomoniUniversity of Bologna 112   PUBLICATIONS   456   CITATIONS   SEE PROFILE Silvia MirriUniversity of Bologna 66   PUBLICATIONS   252   CITATIONS   SEE PROFILE All content following this page was uploaded by Silvia Mirri on 08 October 2015. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.  A multimedia broker for ubiquitous and accessible rich media content transcoding Paola Salomoni, Silvia Mirri Dipartimento di Scienze dell'Informazione University of Bologna Bologna, Italy  Abstract   —Entertainment multimedia applications based on synchronized continuous media are largely used on standard platforms and are becoming ubiquitous. To integrate rich and interactive multimedia in context aware applications, contents have to be produced and integrated by using mechanisms that choose and provide the richest version that meets user and device capabilities. In this context, we discuss on the design and implementation of an Internet application designed to support the development of multimedia applications for non standard devices by providing a media-broking service. The media broker uses a SMIL-based description of the rich media and a CC/PP profile to adaptively compute the best version of the media content that can be delivered to a specific user and device.  Keywords-Content transcoding, context aware applications, multimedia enterntainment I.   I  NTRODUCTION  Entertainment services are emerging as a leading portion of the mobile market and the diffusion of mobile devices (like PDA's, smart phones or portable game consoles) with the considerable availability of wireless connectivity is widely increasing the opportunity to provide new multimedia services. A new wave of entertainment mobile applications is near to gain a leading position in the mobile service market: mobile games, mobile TV, mobile music sharing are the most  promising applications and new multimedia services, traditionally provided by the Web, will be soon available also on mobile devices [1]. Actually these services [2] are typically implemented as network and platform dependent applications and the interoperability with standard Internet services (Web and e-mail) is frequently disregarded. One of the next challenges of the market is to move from mobile applications to services that are offered over different platform supporting ubiquitous and nomadic computing and providing context aware multimedia experiences. In this context, a wide range of work is being carried out currently by the community of researchers in the area of context-aware applications [3, 4, 5], work that is frequently related to the one done in the Web accessibility area [6, 7, 8]. Two main remarks drive this analogy: first of all is obvious that a limited device restricts user capabilities so that a set of alternative strategies are needed to overcome these constrains. Secondly, context-awareness is strictly related with device profiling, but it is not limited to it and is important to consider that in a specific situation the user can be limited by the context. For example a user need a different rendering of a newspaper while he/she is driving a car and has sight and hands busy [6]. Voice interaction is a clear example of technology that is used both to implement ubiquitous services and to enhance accessibility [9]. Considering the Web as a model for rich media delivery that takes in account accessibility and universality issues, we can point out two different W3C standards that can be coupled to provide the user with rich media ubiquitous applications. The first one is CC/PP ( Composite Capabilities/Profile  Preferences ) [10] that is used to define  profiles , i.e. descriptions of device capabilities and user preferences that can direct the adaptation of contents. A CC/PP profile includes CC/PP attribute names and related values that are used to find out the most suitable form of a resource deliverable to a specific client. The second W3C technology that could support rich multimedia ubiquity is SMIL ( Synchronized     Multimedia  Integration Language ) [11], a markup language that describes rich media presentations which integrate streaming audio and video with images, text or any other media type, providing support to synchronization activities. SMIL allows the author to offer different media under a set of defined conditions by using the <switch>  element that identifies, for example, the language preferred by the user. The SMIL player is responsible to collect the user preferences (by setting some application options) and to playout the multimedia element defined by the <switch> element. This paper presents RMob (  Rich Media on-line Broker  ) an on line service that provides rich media content to users equipped with non standard hardware and software, including  both mobile terminals (such as PDA and phones) and platforms used by people with disabilities to improve computer accessibility. RMob uses and augment SMIL representation of synchronized multimedia contents to describe rich media and a description of user and device capabilities realized as a CC/PP  profile. By matching requirements and constrains derived from this two definitionss, RMob plans and starts a transcoding activity that transforms the srcinal description of rich media contents in the best one deliverable to this specific user. We have used RMob to develop an entertainment application that offers Latino dance lectures, considering a nomadic user that starts learning Salsa in public place (a beach or a discotheque) and wants to learn more from home. The same user can also Globecom 2004 Workshops0-7803-8798-8/04/ 20.00 ©2004 IEEEIEEE Communications Society186    Media Broker  Media Transcoding Unit Profiles aSMIL Profile Manager access to the Salsa lecture by using a mobile terminal (a PDA, in our experimental scenario) connected to the Internet by using a 2,5G mobile technology. Finally the same set of lecture has been used to verify the accessibility of the RMob service,  by accessing to the multimedia contents with a PC equipped with a screen reader and a Braille display, as a blind user should do. The remainder of the paper is organized as follows. Section 2 outlines the RMob architecture and introduces main RMob components. In Section 3, we provide details on how RMob documents and profiles are described by using a set of standard markup languages augmented with the aim to facilitate media transcoding. In Section 4, we introduce three different experimental scenarios used to verify effectiveness of our approach in terms of context awareness and accessibility. Finally, Section 5 concludes the paper and plans for future work. II.   A RCHITECTURE  RMod offers to client a brooking service: a rich media content produced in an augmented SMIL (aSMIL) is transcoded to meet the user preferences, expressed by the RMod client by using a CC/PP profile. The user profile is stored by RMob in the  Profile DB  containing both basic configuration profiles and personal profiles defined by users. During the first connection to RMob, the user specifies both device and personal settings, starting from a set of pre-configured standard profiles that are stored in the Profile DB. The system offers the user to store his new profile in the Profile DB together with a unique user ID. The RMob  Media Broker   (MB) compares the obtained set of settings with the rich media description and the available media formats and defines a transcoding strategy. Calling a sequence of transcoding components the MB computes a new version of the rich media, and sends it back to the RMob client. Figure 1. RMob Architecture The general architecture of RMob may be thought as constructed out of the following software components, as shown in Figure 1 above: •   The  Profile Manager   (PM) that manages the Profile DB and maintains the user preferences. PM is accessed  by clients through the Media Broker. •   The  Media Broker   (MB) that is responsible of agreeing on the profile definition at the start up of each connection and then defining and controlling the transcoding activity. Choices about what and how to transcode are taken on the basis of profile and contents descriptions as reported in the following Section 3. •   The Transcoding Unit   (TU) that uses the srcinal media sources to compute a transcoded media version, according with the requirements specified by the MB. Different sub components of the transcoding unit are used to compute different translation and re-coding activities, ranging from simple re-sizing of video to fit the display dimensions or re-coding of audio to enhance compression, to more complex mechanisms that transform continuous media (video or animation) in sequences of frames. A specific transcoding is needed every time that device or user capabilities suggest to convert the synchronous media composition (SMIL based) to an HTML based one. Every transcoding step is directed by the MB. The TU works also as a proxy system and maintains a cache of last recently used transcoded media files. III.   P ROFILE AND CONTENTS DESCRIPTIONS The MB uses information provided by the user as a CC/PP  profile to transcode the rich media sources and compose a new document according with the srcinal content description. The following subsection 3.1 describes the RMob contents description model, based on an extended version of SMIL while subsection 3.2 presents the RMob profile description,  based on a CC/PP vocabulary. Finally subsection 3.3 introduces the main mechanisms used by the MB to produce the best version of a given media content for a specific profile.  A.    RMod contents description SMIL [11] is a set of XML   modules which are used to describe the temporal, positional, and interactive behavior of a multimedia presentation. The SMIL markup language can be used to manage timing, layout and animation in compositions of synchronous multimedia. The SMIL language permits to define different versions of a given multimedia content by exploiting the <switch>  element. <switch>  permits the author to enclose different mutually exclusive elements selected by testing some specified attributes that describe a very limited set user's preferences and system capabilities. The last unconditioned element in the switch sequence is used as a default alternative every time that all the other options fail. Some of above mentioned conditions are frequently used to enhance internationalization (such for example systemLanguage ) and accessibility (as systemCaptions , which is use for audio captioning or systemAudioDesc , which activate an enhanced audio track with the description of a video component). Some other improves portability, as for example systemScreenSize  that specifies the dimensions of the device screen in pixels. To augment the set of condition that can be tested by using the <switch>  element, SMIL provides two elements ( customAttributes  and customTest ) which are used Globecom 2004 Workshops0-7803-8798-8/04/ 20.00 ©2004 IEEEIEEE Communications Society187  to define custom test attribute variables with a Boolean (true/false) value. This simple control mechanism is typically managed by the SMIL player at the client side. Suppose the user has turned on captioning, setting the corresponding player option, the player will show captioning text, eventually downloading the appropriate file. SMIL <switch>  element is not feasible to support completely the need to enhance rich media portability and extend accessibility features by applying (server side) transcoding strategies. Main improvement is based on the idea that <switch>  must be used to define alternative media sources. Two main mechanisms are added to SMIL to obtain aSMIL: 1.   A new element <a_switch>  is needed to specifies alternatives to multiple conditions. <a_switch>  works as follows: each time a condition contained in the internal elements fails, the default value for the <a_switch>  is added to the set of media selected by all the conditions that succeed. An example of <a_switch>  use is shown in the following Section 4. 2.   An extensible set of condition attributes is defined to specify constrains to the use of a specific media resource. For example a range of bandwidth for a video streaming or a minimal screen dimension that is suitable for readability of text. Inside these constrains a transcoding mechanism can produce the right version for a specific bandwidth or for a certain screen size, without the need to produce and store files to match all the possible requirements. The set of constrains covers the capabilities defined by the RMob Profile and described in the following Section 3.2  B.    RMob profile description CC/PP is based on RDF (  Resource Description  Framework  ) [12], which was designed by the W3C as a metadata and machine understandable properties description language. A CC/PP vocabulary is defined by using RDF and specifies components and attributes of these components used  by the application to describe a certain context. RMod CC/PP vocabulary includes four components; three of these are standard CC/PP components [10], hardware platform ,  software  platform  and browser   that are used to detail the platform hardware and software characteristics. In particular: •    Hardware Platform : This component defines the device (mobile device, personal computer, palmtop, tabletPC, etc…) in terms of hardware capabilities, such as displaywidth  and displayheight  (that specify display width and display height resolution), audio  (that specifies audio board presence), imagecapable  (that specifies images support), brailledisplay  (that specifies Braille display  presence), keyboard  (that specifies keyboard type). •   Software Platform : This component specifies the device software capabilities, such as name  (specifies operating system name), version  (specifies operating system version), tool  (specifies assistive tools presented), audio  (specifies audio types supported), video (specifies video types supported), SMILplayer  (specifies SMIL players presented). •    Browser  : This component describes the browser user agent capabilities, such as name  (specifies user agent name), version  (specifies user agent version), javascriptversion  (specifies JavaScript versions supported), CSS  (specifies CSS versions supported), htmlsupported (specifies HTML versions supported), mimesupported  (specifies mime types supported), language  (specifies languages supported). A last (the fourth) component of the profile contains user information and is used by the MB for different purposes related with accessibility, internationalization of contents and user preferences management. This last profile component is described in the follow: •   User Profile : This component describes user capabilities, preferences and characteristics. Some example of profile elements are age  (specifies user age), video  (specifies if user wants/is able to enjoy video resources), audio  (specifies if user wants/is able to enjoy audio resources), language  (specifies user native language and other languages understood  by the user). C.    Media Broking and transcoding In RMob multimedia information is processed and delivered on the fly on the basis of data provided to the MB by the RMob profile. The rich media contents delivered to the user are computed by the combination of different transcoding  processes guided by the RMob Profile and defined explicitly or implicitly by the aSMIL code that describes the contents structure. Transcoding of single media is basically founded on two different types of transformations: •   Translation  that involves the conversion from one modality to another. Some translations can be automatically computed by using a transcoding function such for example text to speech  (TTS) or animation to image . Other translations need an explicit declaration of content equivalence, as for example image to text. •   Scaling   that concerns operations of transcoding, recoding and compression that have effect in term of reduction of size, quality and data rate. Examples of scaling include image and video resizing and audio re-coding and compression. Inline conditions specified in aSMIL can be used to define media to be used under specific constrains (for example a specific video on a bandwidth constrain) or to decide the media Globecom 2004 Workshops0-7803-8798-8/04/ 20.00 ©2004 IEEEIEEE Communications Society188  cannot be delivered (for example a video to a blind user), both to activate translations and scaling functions. A third mechanism of transcoding is applied directly to the aSMIL source to compute the rich media structure according with the RMob profile. The structure transformation produces a synchronous rich media framework written in SMIL or, when the device or the user needs prevent the use of SMIL, one or more HTML page containing a partially synchronous media composition. The adaptation of structure is based on templates  that define the use of screen space and are implemented by selecting a SMIL layout or a HTML CSS from a set of  predefined elements. IV.   S CENARIOS  We have tested the above mentioned architecture on a multimedia application that delivers dance lectures, realized by using different synchronized multimedia contents. In particular, to verify the effectiveness of our approach, we have developed a Salsa lecture composed by an audio track (the salsa) synchronized with two elements: a video clip showing the teacher dancing and an animation reproducing steps. With the aim of verifying context awareness together with accessibility, we have considered three different scenarios: A.   A group of users that are learning salsa dance on the  beach. A standard PC platform is used, connected to a wide screen working as video output device. This scenario is technically equivalent to a standard access done by using a fully equipped PC, connected to the Internet and supporting a SMIL compliant player. B.   A mobile user, accessing to the Salsa lecture by using a PDA. The device has got a small screen and the  platform does not support a SMIL player. The user is connected by using a GPRS [13] interface that does not guarantee a fluent transmission of the video clip reproducing the teacher. C.   A blind user who accesses the Internet with a PC equipped with a screen reader and a Braille display. The platform supports a SMIL player but the interaction with the screen reader introduces some accessibility constrains, such for example difficulties in reading text that dynamically changes. Considering the goal (learn to dance salsa) all the device are equipped with a sound card and are able to playout the  background music. We can assume that in all the three scenarios, users speak English and prefer the English version of the explanation. On the contrary, devices and users have different capabilities to reproduce and understand the visual component of the lecture. A portion of the tree different CCPP profiles is showed in Figure 2 below, in which the most important different characteristics that are considered by the MB service are highlighted. In particular, in scenario A users have full capabilities to reproduce and understand the lecture in its richest version. In scenario B, two elements have to be considered by the MB: the absence of a SMIL player and the connection bandwidth. Figure 2. Profiles defined in the three different scenarios Globecom 2004 Workshops0-7803-8798-8/04/ 20.00 ©2004 IEEEIEEE Communications Society189
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!