Audio compression,History,MPEG format.

Rick · Feb 10, 2002

The following content was taken from MIT webservers.
About Structured Audio

Structured Audio means transmitting sound by describing it rather than compressing it. That's the whole idea, and it's a very simple one, but as you will see if you keep reading, it leads to a wealth of new directions for sound research and low bit-rate coding.

The Machine Listening Group invented the use of the term Structured Audio for this kind of sound transmission. We didn't invent the idea of Structured Audio; description methods for sound and the idea of using them for coding has been kicking around for years (Julius Smith, Andy Moorer, and probably others have written about it). But the NetSound and MPEG-4 Structured Audio projects were the first to really try to make it a practical reality.

The Problem

The problem we started out trying to solve is easy to understand: it takes too long to download sound over the World Wide Web. In order to avoid this, many researchers have developed audio compression techniques, which allow sound files to be "squished" for more rapid transmission. RealAudio, MP3, Liquid Audio, and many other technologies allow this compression. However, to compress audio to a point where it can be streamed over modems, you have to start squeezing the sound quality out of it. That's why, on one hand, MP3 files can't be streamed -- you have to download them (about 15-20 minutes for a 5 minute song) before listening -- but they sound good; and on the other, RealAudio files can be streamed, but they sound like AM radio, not like a CD. So we wanted to build technology that would allow high-quality sound to be streamed over a modem.

Csound

Csound is a special computer language invented and refined over the last twenty years by Prof. Barry Vercoe, who leads the Machine Listening Group. It's a language that's used to describe sound synthesizers. Each program in Csound is the description of the "internal workings" of a particular kind of synthesizer. That is, one Csound program (or instrument in Csound language) might use FM synthesis like the Yamaha DX-7, while another might use wavetable techniques like the E-Mu Proteus, and another might model a classic analog synthesizer from the 1970's. Part of the power of Csound and languages like it is that any synthesizer can be described this way.

When musicians write music using Csound, they write two parts: the orchestra, which describes what all of their instruments should sound like, and the score, which describes how to use those instruments to create music. If you are familiar with MIDI files, the score of a Csound composition is something like a MIDI file. But the orchestra has no direct analog in the MIDI world, unless it's the electronics on the inside of the synthesizer. In a MIDI composition, the composer has no direct control over the sound; s/he must "trust" that the synthesizer will do the right thing. In Csound, though, the composer specifies exactly what the instruments sound like.

The Csound program itself turns the orchestra and score into sound, in real-time if you have a fast machine (that is, you can listen to the sound while Csound is producing it); or otherwise, written into a file for later listening.

NetSound

We (especially group alumnus Michael Casey) made the observation in the spring of 1996 that using Csound would be a good way to put high-quality audio on a WWW page. The combination of a Csound orchestra and score is usually many hundreds of times smaller (more compressed) than the sound it turns into. If Csound were present on your home computer, I could transmit an orchestra and a score to you, and Csound would turn it into sound. So Mike, Adam Lindsay, Paris Smaragdis, and Eric Scheirer wrote a simple set of scripts that "wrap up" a Csound orchestra, score, MIDI files, and maybe some sound samples for delivery on the WWW, and a client-side program that separates them and dispatches Csound to create sound.

We called this idea NetSound, and it's proven to be a very successful platform for demonstrating Structured Audio concepts (we do a lot of NetSound demos at the Media Lab) and getting the message out about this concept in sound coding (Mike and Paris wrote a paper about it, and it was a finalist for a 1997 Discover Magazine Innovation Award).

NetSound isn't a perfect system; it's hard to code voice or very expressive natural instruments this way, and the Csound model isn't right for streaming of data. (In many cases, you don't have to stream data, though, because the NetSound file is so compact). Also, many people find Csound difficult to use (many people love it, though, too!)

The move to MPEG

In the fall of 1996, visitors from Media Lab sponsor Hughes Electronics saw a demo of NetSound. Hughes is a big player in MPEG; they make a lot of money from the MPEG-2 video standard, which is used to transmit the data in their direct-satellite broadcast system called DirectTV in the USA. Hughes realized that the concepts we were demonstrating were a good solution to an outstanding MPEG-4 call-for-proposals for "SNHC Audio" (which stands for "Synthetic/Natural Hybrid Coding", a concept explored elsewhere).

We wrote up a brief submission to MPEG based on Mike and Paris's paper, but it turned out that due to copyright hangups and some other problems, NetSound wasn't exactly the right solution for MPEG-4. So we (especially Eric) decided to use the opportunity to revisit some of the synthesis-language issues represented in Csound, and dive deeper into the Structured Audio concepts than NetSound did. The result was the language SAOL, which we designed over the winter of 1996-1997 and submitted to MPEG soon after, meeting with general enthusiasm.

(MPEG) a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video. Established in 1988, the group has produced MPEG-1, the standard on which such products as Video CD and MP3 are based, MPEG-2, the standard on which such products as Digital Television set top boxes and DVD are based, MPEG-4, the standard for multimedia for the fixed and mobile web and MPEG-7, the standard for description and search of audio and visual content. Work on the new standard MPEG-21 "Multimedia Framework" has started in June 2000. So far a Technical Report has been produced and the formal approval process has already begun for 2 more parts of the standard. Several Calls for Proposals have already been issued and two working drafts are being developed.

The following tutorial was copied from MPEG home page about MPEGs.
==============================================
Meantime, the Moving Picture Experts Group has not sat still after getting MPEG-4 ready for prime time. Recently, it finalized the first version of the International MPEG‑7 Standard for Content Description, to be published by ISO within the next few months. MPEG-7 will complement MPEG-4, not replace it. MPEG-4 defines how to represent content; MPEG-7 specifies how to describe it. And on the horizon there is yet another ISO MPEG standard, MPEG-21, which aims to provide a truly interoperable multimedia framework. The essence of all MPEG efforts is interoperability – interoperability for the consumer. Interoperability means that consumers can be sure to be able to use the content and not be bugged by incompatible formats, codecs, metadata, and so forth.

MPEG-1 and MPEG-2 provide interoperable ways of representing audiovisual content, commonly used on digital media and on the air. MPEG-4 extends this to many more application areas through features like its extended bitrate range, its scalability, its error resilience, its seamless integration of different types of ‘objects’ in the same scene, its interfaces to digital rights management systems and its powerful ways to build interactivity into content. MPEG-7 defines an interoperable framework for content descriptions way beyond the traditional ‘metadata’. MPEG-7 has descriptive elements that range from very ‘low-level’ signal features like colors, shapes and sound characteristics, to high level structural information about content collections. MPEG-7 is also unique in its tools for structuring information about content. MPEG-7 and MPEG-4 form a great couple, especially when MPEG-4 objects are used. With MPEG-7, it is now possible to exchange information about multimedia content in interoperable ways, making it easier to find content and identify just what you wanted to use. MPEG-7 information will be added to broadcasts; personal video recorders and search engines can use it, and it greatly facilitates managing multimedia content in often large content repositories. Audiovisual archives are currently hard to search from outside the organizations that own them, because they all employ their own metadata schemes. MPEG-7 will lift that barrier.

All this interoperability sounds promising, but what is achieved may be almost completely undone by efforts to protect the digital assets. As digital multimedia spreads to many different platforms, transmission speeds increase and storage costs fall, digital rights management (DRM) becomes a necessity to protect the value of content. In its current form, DRM could unfortunately go against the very goal of interoperability, as it locks up the ‘standardized content’ using non-standardized protection mechanisms. This development is actually not even recent: many proprietary conditional access (CA) systems make standard MPEG-2 TV content inaccessible to people that happen to own the wrong set top box — even when they can receive the signal and are willing to pay the associated content fees.

Tackling the Hard Problems
Very early on, MPEG understood that more interoperability in DRM is crucial to an open multimedia infrastructure. While recently a very popular topic in almost every digital forum, already five years ago, representatives from rights owners, technology providers and CA/DRM providers sat down together to discuss the issue in the context of MPEG-4. This gave us the ‘hooks’: a set of standard interfaces to proprietary Intellectual Property Management and Protection (IPMP) systems, deeply embedded in MPEG-4 Systems. These were a step in the right direction: if you want to play content you ‘only’ need to plug in the right IPMP system, and where to obtain it can be signaled in the bitstream.

But it was not enough. A portable music player cannot download an IPMP system – interoperability lost. In the summer of 2000 MPEG started to work on more interoperable IPMP for MPEG-4. (Note that IPMP is MPEG-speak for digital rights management.) This work is now at ‘Committee Draft stage (the first public draft) and will be finalized in May 2002. Interoperability in DRM is a very hard question. It requires standardized trust. Content owners must for instance be able to trust all the players that consume the content. This type of trust is very difficult to standardize; the necessary trust infrastructure is not readily available. But, building on its 5-year experience with often difficult IPMP discussions, MPEG is perhaps the only forum where the issue can be technically addressed.

6 Billion Content Producers
This brings us to the MPEG-21 Multimedia Framework. To achieve true end-to-end interoperability, more is needed than the interoperable IPMP terminal architecture mentioned above. According to its Technical Report (an almost final version is here), MPEG-21’s goal is to describe a ‘big picture’ of how different elements to build an infrastructure for the delivery and consumption of multimedia content – existing or under development – relate to each other. In setting the vision and starting the work, MPEG‑21 has drawn much new blood to MPEG, including representatives from major music labels, the film industry and technology providers.

The MPEG-21 world consists of Users that interact with Digital Items. A Digital Item can be anything from an elemental piece of content (a single picture, a sound track) to a complete collection of audiovisual works. A User can be anyone who deals with a Digital Item, from producers to vendors to end-users. Interestingly, all Users are ‘equal’ in MPEG-21, in the sense that they all have their rights and interests in Digital Items, and they all need to be able to express those. For example: usage information is valuable content in itself; an end-user will want control over its utilization. A driving force behind MPEG-21 is the notion that the digital revolution gives every consumer the chance to play new roles in the multimedia food chain. There are 6 billion potential MPEG‑21 Users out there: producers, packagers, resellers, distributors, ...

MPEG-21 seeks to use existing standards where possible, to facilitate their integration and to fill in gaps. It does so together with appropriate other standards fora. Highly laudable and rather abstract, but MPEG is currently drafting a number of very interesting, concrete and commercially relevant parts of the MPEG-21 standard. Counting the MPEG-21 Technical Report as part number one, the second part of MPEG-21 will be ready in summer 2002. This is the Digital Item Declaration, a concise and powerful XML-based schema for declaring Digital Items. Arguably more ambitious is MPEG-21’s third part: the Digital Item Identification and Description. This work solves the problem of uniquely identifying digital content in a global way, and giving a resolution mechanism along with the unique identification. Imagine you found a piece of content — got it from a friend, stumbled across it on the Web, received it on a CD — and you want to ‘consume’ it. The content is protected, but the Digital Item Identification tells you where to go to find information about its rights.

The rights information is coded using the two further MPEG-21 parts, the Rights Expression Language, REL (part 5) and the Rights Data Dictionary, RDD (part 6). These two parts together allow the expression of rights in an interchangeable form, using a standardized syntax (REL) and standardized terms (RDD). The Call for Proposals for these parts is out; proposals are due end November; the standards will be ready early 2003. Probably the Rights Expression Language will be based on XML, but equally likely it will also have a compact, binary representation to be used under bandwidth-constrained, real-time conditions. Between parts 3 and 5 above, the work on more interoperable IPMP in MPEG-4 was recently added to MPEG-21 as its fourth element, because it applies to MPEG-7, -2 and -1 just as well.

Content that Adapts to the Environment
The 7th framework element will be a unified description of environments in which content is being used. This covers networks, terminals and access conditions. The goal is to achieve ‘Universal Multimedia Access’, where content can adapt itself seamlessly to dynamic consumption circumstances.

MPEG-21 is developed using a staggered approach, the various parts following each other in time. Future MPEG-21 work items will likely include: Content Representation (how the media resources are represented beyond the existing MPEG standards) Content Handling and Usage (interfaces for managing content), and Event Reporting.

All Users Benefit
MPEG-4 is now proving its viability in the market as an open standard for multimedia. The ecosystem is coming to life: players, servers, hardware and software, testing systems, IP cores, and authoring tools are all being readied. It will mean a major step towards more interoperability in multimedia. MPEG-7 will help manage the growing abundance of content, and MPEG-21 will make trusted interaction with content much more transparent, creating a level playing field for all participants in the multimedia food chain. Users will only stand to benefit.

References
[1] www.eetimes.com

[2] Junko Yoshida (ed.), MPEG Standards, In Focus, EE Times, 12 November 2001 http://www.eetimes.com/story/getHeadlines?category=2582&the_date=
11/12/01&title=MPEG+Standards&header=in_focus/system_design/header.html
&footer=in_focus/system_design/footer.html

[3] Rob Koenen, Object-based MPEG offers flexibility, EE Times, 12 November 2001, CMP http://www.eetimes.com/story/OEG20011112S0042
Tackling the Hard Problems
Very early on, MPEG understood that more interoperability in DRM is crucial to an open multimedia infrastructure. While recently a very popular topic in almost every digital forum, already five years ago, representatives from rights owners, technology providers and CA/DRM providers sat down together to discuss the issue in the context of MPEG-4. This gave us the ‘hooks’: a set of standard interfaces to proprietary Intellectual Property Management and Protection (IPMP) systems, deeply embedded in MPEG-4 Systems. These were a step in the right direction: if you want to play content you ‘only’ need to plug in the right IPMP system, and where to obtain it can be signaled in the bitstream.

But it was not enough. A portable music player cannot download an IPMP system – interoperability lost. In the summer of 2000 MPEG started to work on more interoperable IPMP for MPEG-4. (Note that IPMP is MPEG-speak for digital rights management.) This work is now at ‘Committee Draft stage (the first public draft) and will be finalized in May 2002. Interoperability in DRM is a very hard question. It requires standardized trust. Content owners must for instance be able to trust all the players that consume the content. This type of trust is very difficult to standardize; the necessary trust infrastructure is not readily available. But, building on its 5-year experience with often difficult IPMP discussions, MPEG is perhaps the only forum where the issue can be technically addressed.

6 Billion Content Producers
This brings us to the MPEG-21 Multimedia Framework. To achieve true end-to-end interoperability, more is needed than the interoperable IPMP terminal architecture mentioned above. According to its Technical Report (an almost final version is here), MPEG-21’s goal is to describe a ‘big picture’ of how different elements to build an infrastructure for the delivery and consumption of multimedia content – existing or under development – relate to each other. In setting the vision and starting the work, MPEG‑21 has drawn much new blood to MPEG, including representatives from major music labels, the film industry and technology providers.

The MPEG-21 world consists of Users that interact with Digital Items. A Digital Item can be anything from an elemental piece of content (a single picture, a sound track) to a complete collection of audiovisual works. A User can be anyone who deals with a Digital Item, from producers to vendors to end-users. Interestingly, all Users are ‘equal’ in MPEG-21, in the sense that they all have their rights and interests in Digital Items, and they all need to be able to express those. For example: usage information is valuable content in itself; an end-user will want control over its utilization. A driving force behind MPEG-21 is the notion that the digital revolution gives every consumer the chance to play new roles in the multimedia food chain. There are 6 billion potential MPEG‑21 Users out there: producers, packagers, resellers, distributors, ...

MPEG-21 seeks to use existing standards where possible, to facilitate their integration and to fill in gaps. It does so together with appropriate other standards fora. Highly laudable and rather abstract, but MPEG is currently drafting a number of very interesting, concrete and commercially relevant parts of the MPEG-21 standard. Counting the MPEG-21 Technical Report as part number one, the second part of MPEG-21 will be ready in summer 2002. This is the Digital Item Declaration, a concise and powerful XML-based schema for declaring Digital Items. Arguably more ambitious is MPEG-21’s third part: the Digital Item Identification and Description. This work solves the problem of uniquely identifying digital content in a global way, and giving a resolution mechanism along with the unique identification. Imagine you found a piece of content — got it from a friend, stumbled across it on the Web, received it on a CD — and you want to ‘consume’ it. The content is protected, but the Digital Item Identification tells you where to go to find information about its rights.

The rights information is coded using the two further MPEG-21 parts, the Rights Expression Language, REL (part 5) and the Rights Data Dictionary, RDD (part 6). These two parts together allow the expression of rights in an interchangeable form, using a standardized syntax (REL) and standardized terms (RDD). The Call for Proposals for these parts is out; proposals are due end November; the standards will be ready early 2003. Probably the Rights Expression Language will be based on XML, but equally likely it will also have a compact, binary representation to be used under bandwidth-constrained, real-time conditions. Between parts 3 and 5 above, the work on more interoperable IPMP in MPEG-4 was recently added to MPEG-21 as its fourth element, because it applies to MPEG-7, -2 and -1 just as well.

Content that Adapts to the Environment
The 7th framework element will be a unified description of environments in which content is being used. This covers networks, terminals and access conditions. The goal is to achieve ‘Universal Multimedia Access’, where content can adapt itself seamlessly to dynamic consumption circumstances.

MPEG-21 is developed using a staggered approach, the various parts following each other in time. Future MPEG-21 work items will likely include: Content Representation (how the media resources are represented beyond the existing MPEG standards) Content Handling and Usage (interfaces for managing content), and Event Reporting.

All Users Benefit
MPEG-4 is now proving its viability in the market as an open standard for multimedia. The ecosystem is coming to life: players, servers, hardware and software, testing systems, IP cores, and authoring tools are all being readied. It will mean a major step towards more interoperability in multimedia. MPEG-7 will help manage the growing abundance of content, and MPEG-21 will make trusted interaction with content much more transparent, creating a level playing field for all participants in the multimedia food chain. Users will only stand to benefit.

--------------------------------------------------------------------------------

Rob Koenen is Senior Director of Technology Initiatives at InterTrust Technologies Corporation. He chairs MPEG’s Requirements Group and is the President of the MPEG-4 Industry Forum.
(By:Robert Koenen)

Log in or Sign up to hide all adverts.

Porfiry · Feb 10, 2002

That's why, on one hand, MP3 files can't be streamed -- you have to download them (about 15-20 minutes for a 5 minute song) before listening
Click to expand...

I thought MP3's *could* be streamed?

Log in or Sign up to hide all adverts.

Rick · Feb 11, 2002

Confused...

Streamed MP3s Porf?,well i dont know,is that so??
Please Register or Log in to view the hidden image!

bye!
Please Register or Log in to view the hidden image!

Log in or Sign up to hide all adverts.

Log in or Sign up

Audio compression,History,MPEG format.

Rick ॐ Valued Senior Member

Google AdSense Guest Advertisement

Porfiry Nomad Registered Senior Member

Google AdSense Guest Advertisement

Rick ॐ Valued Senior Member

Google AdSense Guest Advertisement

Share This Page

Log in or Sign up

Audio compression,History,MPEG format.

Rick ॐ Valued Senior Member

Google AdSense Guest Advertisement

Porfiry Nomad Registered Senior Member

Google AdSense Guest Advertisement

Rick ॐ Valued Senior Member

Google AdSense Guest Advertisement

Share This Page

Useful Searches