AOH :: MP3.FAQ

Frequently Asked Questions about MPEG-3 compression.

Frequently Asked Questions about MPEG Audio Layer-3, Fraunhofer-IIS, and
all the rest...

Version 2.83

   This text will be continously upgraded: step by step, more answers and
   more information will be included. Yes, we definitely know that there
   are a lot more questions to answer! But we cannot do that all at once.
   So, some parts may remain "under construction" for a while, and other
   parts may be modified due to new results of our research work or new
   applications. You find the latest release at
   
              http://www.iis.fhg.de/departs/amm/layer3/sw/ or
                   ftp://ftp.fhg.de/pub/layer3/l3faq.html
                                      
  Table of Contents
  
     * Introduction - or: What is "MPEG Audio Layer-3"?
     * Applications - or: Layer-3, what is it good for?
     * Overview about the ISO-MPEG Standard - or: What is MPEG all about?
     * Some Basics about MPEG Audio - or: What about Layer-1, Layer-2,
       Layer-3?
     * Advanced Features of Layer-3 - or: Why does Layer-3 perform so
       well?
     * Basics of Perceptual Audio Coding - or: What is the trick?
     * References - or: Where to find more information?
     * About us - or: What is going on at our Fraunhofer Institute?
       
  Introduction - or: What is "MPEG Audio Layer-3"?
  
   Today, efficient coding techniques are a must for cost-effective
   processing of digital audio and video data by computers. Data
   reduction of moving pictures and sound is a key technology for any
   application with limited transmission or storage capacity. In the
   recent years, a lot of progress has been achieved. While there (still)
   exist several proprietary formats for audio and video coding, the
   ISO/IEC standardisation body has released an international standard
   ("MPEG") for powerful audio and video coding tools (see: Overview
   about the ISO-MPEG Standard - or: What is MPEG all about?).
   
   Without data reduction, digital audio signals typically consist of 16
   bit samples recorded at a sampling rate more than twice the actual
   audio bandwidth (e.g. 44.1 kHz for Compact Disks). So you end up with
   more than 1400 kbit to represent just one second of stereo music in CD
   quality. By using MPEG audio coding, you may shrink down the original
   sound data from a CD by a factor of 12, without losing sound quality.
   Factors of 24 and even more still maintain a sound quality that is
   significantly better than what you get by just reducing the sampling
   rate and the resolution of your samples. Basically, this is realized
   by "perceptual coding" techniques addressing the perception of sound
   waves by the human ear (see: Basics of Perceptual Audio Coding - or:
   What is the trick?).
   
   Using MPEG audio, one may achieve a typical data reduction of 1:4
                                                                         
   by Layer 1 (corresponds with 384 kbps for a stereo signal), 1:6...1:8
                                                                         
   by Layer 2 (corresponds with 256..192 kbps for a stereo signal),
   1:10...1:12 by Layer 3 (corresponds with 128..112 kbps for a stereo
                           signal),
                                                                         
   still maintaining the original CD sound quality.
   
   By exploiting stereo effects and by limiting the audio bandwidth, the
   coding schemes may achieve an acceptable sound quality at even lower
   bitrates. Layer-3 is the most powerful member of the MPEG audio coding
   family. For a given sound quality level, it requires the lowest
   bitrate - or for a given bitrate, it achieves the highest sound
   quality (see: Advanced Features of Layer-3 - or: Why does Layer-3
   perform so well?).
   
   Some typical performance data of Layer-3 are:
   
   sound quality            bandwidth  mode  bitrate reduction  ratio
   ------------------------ --------- ------ -----------------  -----
   "telephone sound"         2.5 kHz   mono        8 kbps (*)    96:1
   "better than shortwave"   4.5 kHz   mono       16 kbps        48:1
   "better than AM radio"    7.5 kHz   mono       32 kbps        24:1
   "similar to FM radio"      11 kHz  stereo  56..64 kbps    26..24:1
   "near-CD"                  15 kHz  stereo      96 kbps        16:1
   "CD"                      >15 kHz  stereo 112..128kbps    14..12:1
                                                                         
*: Fraunhofer uses a non-ISO extension of Layer-3 for enhanced
   performance ("MPEG 2.5")

   All in all, Layer-3 is the key for numerous low-bitrate, high-quality
   sound applications (see: Applications - or: Layer-3, what is it good
   for?).
   
  Applications - or: Layer-3, what is it good for?
  
   A key technology like Layer-3 is useful for a pretty large spectrum of
   applications - practically almost any system with a limited channel
   capacity may benefit from it. The following chapters identify some
   main areas and list some companies that are actively exploiting the
   Layer-3 technology. For product-related information, please contact
   these companies directly.
   
    Music Links via ISDN
    
   Digital telephone networks (ISDN = Integrated Services Digital
   Network) offer reliable dial-up links with two 64 kbps data channels
   per basic rate adapter; other regional networks (in North-America) use
   56 kbps data links. Transmission fees are often rather similar or
   identical to the traditional analog phone lines - those allow to
   transmit up to 28.8 kbps (V.34 modem) or even 32 kbps ("V.34+").
   
   Using Layer-3, a low-cost narrowband ISDN connection allows to
   transmit CD-quality sound. Audio professionals, like broadcasting
   stations and sound studios, benefit from the "music-by-phone"
   application in various ways. They save money, as they only pay
   transmission fees for the actual time of usage (not 24 h a day in case
   of a leased phone line) and for a rather small data channel (one ISDN
   phone connector for a stereo music link). Radio stations increase the
   attractiveness of their programs, as reporters transmit high-quality
   takes (e.g. an interview) or live news without annoying "telephone
   sound". And new applications become possible, e.g. a "virtual studio",
   where remote artists may play along some preproduced material, without
   actually travelling to the studio.
   
   Examples:
   
     * In 1992, Radio FFN, a private broadcasting station in
       Niedersachsen, Germany, replaced its leased phone lines with ISDN
       and Layer-3 codecs, to transmit 8 local programs 20 min per day to
       the central broadcasting studio. This move saved them transmission
       fees of more than 300.000 US$ per year.
     * As one of the first real-world trials, all private radio stations
       of Germany very successfully used Layer-3 codecs during the Winter
       Olympic Games in Albertville (France) as reporter links between
       the various sporting events and their central studio in Meribel.
     * At the International Music Festival 92 in Bergen, Arne Nordheim
       composed a piece of music, where an organ in the church of
       Trondheim played along with the symphony orchestra in Bergen; the
       sound of the organ was transmitted via ISDN and a Layer-3 codec.
       
   Since 1992, various manufacturers are providing equipment ("codecs")
   for professional audio applications: AVT, Broadcast Electronics, CCS,
   Dialog 4, Telos.
   
    Digital Satellite Broadcasting
    
   Pioneered by WorldSpace, a worldwide satellite digital audio
   broadcasting system is under construction. Its name is "WorldStar",
   and it will use three geostationary orbit satellites called "AfriStar
   1" (21 East), "CaribStar 1" (95 West), and "AsiaStar 1" (105 East),
   with AfriStar 1 being launched in mid-1998. The other satellites will
   follow until mid-1999. Each satellite is equipped with three downlink
   spot beams that are pointed so as to cover populations that provide
   the greatest radio listener base. Each downlink uses TDM (time
   division multiplexing) to carry 96 prime rate channels (16 kbps each).
   The prime rate channels are combined to carry broadcast channels
   ranging from 16 kbps to 128 kbps; the broadcast channels are coded
   using MPEG Layer-3. The prime rate channels may even be dynamically
   allocated to meet the demands of the broadcast service (e.g. 4
   channels combined for 1 hour to allow FM quality stereo (64 kbps) for
   the transmission of a concert with classic music, followed by 1 hour
   with 4 separate news channels (16 kbps) in 4 different native
   tongues).
   
   WorldSpace is offering channels on its three satellites for lease to
   international and national broadcasters. Channel reservation
   agreements already have been signed with a number of major
   broadcasters, including Voice of America, Radio Nederland, the Kenya
   Broadcasting Corporation, the national broadcasting authority of
   Ghana, the national broadcasting authority of Zimbabwe, New Sky Media
   of Korea, and RCN of Columbia. Nearly 1 billion $ in private financing
   has been raised to cover acquisition of the satellites and for most of
   the operational costs through full system implementation in 1999.
   France«s Alcatel Espace is the spacecraft prime contractor and
   supplies the telecommunications payload.
   
   The radio receivers will be designed for maximum convenience of use at
   a minimum cost. Low cost receiver will use a small compact patch
   antenna, will require practically no pointing, and will tune
   automatically to selected channels. Higher end receivers are also
   envisioned. In a press release from 5. June 96 (Montreux,
   Switzerland), WorldSpace declared that it has awarded production
   contracts for two million receiver chips; the contracts were issued to
   SGS-Thomson and ITT Intermetall, authorizing each company for an
   initial production of one million receiver chip-sets.
   
   ITT Intermetall has already gained Layer-3 knowhow by using its
   mask-programmed DSP technology to develop a single-chip Layer-3
   decoder named "MAS 3503 C". This chip supports only MPEG-1 Layer-3.
   
    Audio-on-Demand
    
   The Internet is a world-wide packet-switched network of computers
   linked together by various types of data communications systems.
   Professional Internet providers usually access the network through
   rather high bit-rate links (e.g., primary rate ISDN with 2 Mbps or ATM
   with up to 2 Gbps). However, the average consumer uses low cost, low
   bit-rate connections (e.g., basic rate ISDN with 64 kbps or phone line
   modems with 28.8 or 14.4 kbps). The actual transmission rate depends
   on the current user load and the infrastructure of the part of the
   Internet in use. From a client«s point of view, it may unpredictably
   vary between zero and the maximum bit-rate of its network modem, with
   an average bit-rate somewhere in between.
   
   Without audio coding, downloading uncompressed high-quality audio
   files from a remote Internet server would result in unfavourably long
   transmission times. For example, with an average transmission rate of
   28.8 kbaud (optimistic guess), a single 3-min stereo track from a CD
   (31.7 Mbyte) would require a download time of more than 2 hours.
   Therefore, audio on the Internet calls for an audio coding scheme that
   maintains sound quality as far as possible and allows real-time
   decoding on a large number of computer platforms without special
   add-on hardware. Layer-3 fits very well into this scenario - real-time
   players (like WinPlay3) are available. Intranets present an
   interesting special case, as they usually provide sufficient bitrate
   to allow a number of real-time audio links. Furthermore, our
   experiments indicate that using the http protocol, a real-time
   connection with 56 (112) kbps is possible with one (two) ISDN phone
   line(s).
   
   If content providers are willing to add audio data onto their Internet
   servers, they have to consider carefully the copyright aspects of the
   music industry (e.g., artists, producers, record companies). They must
   not violate these rights by their actions! In the framework of a
   European project called MODE (for "Music-on-Demand"), we developed a
   flexible protection scheme called MMP (for "MultiMedia protection
   protocol") that effectively addresses this issue. Furthermore, MMP
   allows to distribute real-time players "virtually free".
   
   Audio servers may be used plainly for promotional purposes. E.g.,
   museums may increase the attractiveness of their WWW pages by adding
   some sound files, or mail-order services may add sound excerpts to
   their server to increase their CD sales numbers. Opticom, a spin-off
   from Fraunhofer, offers system solutions for this type of application.
   In spring 1996 (CeBit Hannover), they successfully demonstrated an
   "audio-on-demand" application via T-Online together with the Deutsche
   Telekom and a broadcasting station, the Suedwestfunk Baden-Baden.
   
   Another music sales systems has been developed by Cerberus Sound &
   Vision. The company uses a personalized real-time Layer-3 player and a
   proprietary encryption scheme to sell sound files via the Internet on
   a "per song" base. Music servers and mirror sites are currently
   located in London, New York, Tokyo and Rio; Melbourne and Berlin will
   follow soon.
   
   "Audio-on-the-Internet" is currently a very popular topic. It does not
   only comprise audio file transfers with download times as low as
   possible, but also streaming audio applications, like "Internet
   Radio". As Layer-3 offers a sound quality "better than shortwave" at a
   bitrate of 16 kbps (and, with some modifications, may even be useful
   at 8 kbps), various companies currently work on this Internet subject
   - e.g., Opticom or Telos.
   
   In a partnership with Apple, Telos introduced in September 96 the
   Audioactive technology to support "Internet Radio" applications with a
   live audio input processed by a Layer-3 NetCoder Hardware.
   
   NEW ! In December 96, Microsoft announced to support MPEG Layer-3 as
   part of their NetShow multimedia server technology.
   
   As first multimedia authoring tools, "Director Multimedia Studio 2"
   and "SoundEdit 16" (from Macromedia) use Layer-3 to generate
   compressed sound files for the "shockwave" format.
   
   Layer-3 encoders and decoders are not only available as studio
   equipment, but also as ISA-bus PC boards from Dialog 4, along with
   application software, or as low-cost (decoder only) PC boards from
   NSM; recording and playback tools are also available from Proton Data,
   along with a special decoder module (called "CenLay3") that allows to
   playback Layer-3 files via the parallel printer port. Proton Data has
   also developed a "cutting tool" that allows to manipulate audio data
   at Layer-3 level.
   
   In addition, a file-oriented Layer-3 encoder and decoder (called
   "L3ENC" and "L3DEC") is available as shareware for various platforms.
   Registration is processed by Opticom. Please note that even for
   registered users, the use of the shareware is limited to "personal
   edition" purposes.
   
    Real-time Layer-3 players
    
      WinPlay3
      
   "WinPlay3" allows the decoding simply by software on any Pentium PC in
   real time. A 80486 class CPU with a built-in floating-point-unit will
   also allow some limited operation. For the availability of supported
   modes, please refer to the following performance matrix:
   
                    Pentium 486DX4-133 486DX2-66 486DX-50 486DX-33
                    ------- ---------- --------- -------- --------
   MPEG-1 stereo      ok        ok         -         -        -
   MPEG-1 downmix*    ok        ok        ok         -        -
   MPEG-1 mono        ok        ok        ok        ok        -

   MPEG-2 stereo      ok        ok        ok        ok        -
   MPEG-2 downmix     ok        ok        ok        ok       ok
   MPEG-2 mono        ok        ok        ok        ok       ok
                                                                         
*downmix: the original stereo signal will be played back as a mono signal
"MPEG-1" = "MPEG-1 Layer-3", i.e. sample rates 32, 44.1 or 48 kHz
"MPEG-2" = "MPEG-2 Layer-3", i.e. sample rates 16, 22.05 or 24 kHz

   On a Pentium-90, WinPlay3 consumes less than 30 % of the CPU power to
   decode Layer-3 stereo @ 44.1 kHz, or around 5 % of the CPU power to
   decode Layer-3 mono @ 16 kHz.
   At least, a 8-bit stereo sound card is required. For full quality
   audio, a 16-bit card is recommended. The card«s MCI driver should
   support sampling frequencies from 8 kHz to 48 kHz.
   A standard VGA graphics card is required.
   As WinPlay3 buffers up to 4 seconds of sound data due to the
   limitations of the Microsoft Windows multitasking architecture, around
   1 MByte free physical memory must be available.
   WinPlay3 runs with the following operating systems: Microsoft Windows
   3.1/3.11 (in extended 386 mode), Windows 95 und Windows NT (long file
   names not yet supported).
   WinPlay3 supports file play back of *.mp3 files and direct play from
   an URL via HTTP. WinPlay3 can simply be integrated as an helper
   application in common browsers, for example Netscape or Mosaic.
   WinPlay3 is available at
   http://www.iis.fhg.de/departs/amm/layer3/winplay3/. The unregistered
   player is limited to a reproduction time of 20 sec, i.e. it will
   playback each plain Layer-3 file only for this time. If you want to
   use your player without limitation, you have to register your player
   with Opticom.
   
      MMP
      
   As many applications require a player that is "free" for the user, the
   latest versions of WinPlay3 (starting with version 2.0) also support
   the new "MMP" ("MultiMedia protection protocol") format.
   
   MMP is a very flexible data format that may support the following
   functions:
   
     * "unlocking" of the 20 sec playback time limitation
     * "copyright protection" by applying encryption methods to (part of)
       the data
     * "title associated data" (e.g. ISRC code, user data)
     * "expiry date" to allow only a limited use
       
   More detailed information is available at
   http://www.iis.fhg.de/departs/amm/layer3/mmp/.
   
   In a typical "audio-on-demand" application, the content provider may
   "on-the-fly" convert its plain Layer-3 data into MMP data, by using a
   "MMP tagger" software (available at Opticom). The client may use its
   unregistered player to playback these files without limitation - the
   player is "virtually free". The client need not pay fees - this issue
   now may be covered at the server side.
   
      MPEG Layer 3 Player
      
   For Mac OS users, a real-time player called "MPEG Layer 3 Player" with
   a similar look and feel (and similar features) like "WinPlay3" will be
   released very soon. This new player will (finally!) replace the much
   simpler (and somewhat buggy) pre-version 0.99 beta that has been
   available from http://www.iis.fhg.de/departs/amm/layer3/macplay3/.
   
    Layer-3 Sound on CD-ROMs
    
   CD-ROMs (and hard disks) have become most popular to store
   "multimedia" data. Even with the advent of the new DVD standard,
   memory capacity will remain a precious resource for many applications.
   For uncompressed stereo signals from a CD, more than 10 MByte are
   necessary to store one minute of music. Using Layer-3, less than 1
   MByte is enough for the same playing time. And significantly less
   memory is necessary, if some limitations in performance are
   acceptable. As CD-ROM readers (and pretty soon, writers too) have
   already gained a significant market share, typical applications focus
   today on storing compressed sound files on CD-ROMs, introducing more
   or better sound tracks into the product. Real application examples are
   video games, music catalogues or encyclopedias with sound excerpts
   (e.g., "MusicFinder" by Sygna), or talking books for blind people.
   
   NEW !!! Since fall 96, Bertelsmann is selling their new CD-ROM
   encyclopedia "Discovery 97" providing information to around 100.000
   key words, with rich multimedia information (e.g. more than 2400
   coloured photos and images, 41 interactive maps, more than 30 minutes
   of movie clips, 27 slide shows) including 150 minutes of sound tracks
   coded with MPEG Layer-3.
   
    Layer-3 Sound on Silicon
    
   Up to now, solid-state memories (RAMs, Flash-ROMs) are only used as
   audio storage devices in special (niche) applications, as the costs
   per byte are much higher than with other types of media
   (magneto-optical disks or magnetic tapes). Speech announcement systems
   for mass transit vehicles (e.g., busses, subways or trains) are an
   example for such special applications, as the rough environment
   requires to use ROM based memories. Since 1993, Meister Electronic
   manufactures speech announcement systems with Layer-3, significantly
   reducing the precious memory capacity and, at the same time,
   significantly improving the sound quality (compared with their older
   64 kbps PCM "phone sound").
   
   Today, PC-Cards with Flash-ROMs are available, offering a memory
   capacity up to 100 MByte and more, but at prohibitive high costs for a
   consumer application. Here, further advances in memory and card
   technology may trigger a new interesting market segment of
   "audio-chip-card"-applications. At a press conference in August 95 in
   Munich, Siemens Germany announced the advent of a new cost-effective
   ROM technology called the "ROS chip" (ROS = Record-on-Silicon). The
   first generation of ROS chips will be in production in 1997, with a
   storage capacity of 64 Mbit; a next generation with 256 Mbit as well
   as a one-time user programmable version will follow. The ROS chips
   will be embedded in the new "MultiMedia-Card" from Siemens, a
   cost-effective card media that will store data, text, graphics, images
   and sound. Siemens has already demonstrated a battery-powered audio
   player using a prototype "Audio-Card" containing sound tracks coded
   with MPEG-Layer-3.
   
  General Questions and Answers
  
     * Q: O.K., Layer-3 is obviously a key to many applications. Where
       are its limitations?
     * A: Well, Layer-3 is a perceptual audio coding scheme, exploiting
       the properties of the human ear, and trying to maintain the
       original sound quality as far as possible.
       In contrast, a dedicated speech codec exploits the properties of
       the human vocal tract, trying to maintain the intelligibility of
       the voice signals as far as possible. Advanced speech coding
       schemes (e.g., CS-ACELP [LD-CELP] as standardised by ITU as
       G.723.1 [G.728]) achieve a useful voice reproduction at bitrates
       as low as 5.3 [16] kbps, with a codec delay below 40 [1] ms. At
       such very low bitrates, they behave superior to Layer-3 for pure
       voice signals, and they offer the low delay that is necessary for
       full- duplex voice communications.
       In the framework of MPEG-4, scalable audio coding schemes are
       devised that combine speech coding and perceptual audio coding.
     * Q: You mentioned the codec delay. May I have some figures?
     * A: Well, the standard gives some figures of the theoretical
       minimum delay:
       Layer-1: 19 ms (<50 ms)
       Layer-2: 35 ms (100 ms)
       Layer-3: 59 ms (150 ms)
       Practical values are significantly above that. As they depend on
       the implementation, precise figures are hard to give. So the
       numbers in brackets are just rough thumb values - real codecs may
       show even higher values. So yes, there are certain applications
       that may suffer from such a delay (like feedback links for remote
       reporter units). For many other applications (like the ones
       mentioned above), delay is of minor interest.
       
  Overview about the ISO-MPEG Standard - or: What is MPEG all about?
  
     * Q: What is "MPEG"?
     * A: MPEG is the "Moving Picture Experts Group", working under the
       joint direction of the International Standards Organization (ISO)
       and the International Electro-Technical Commission (IEC). This
       group works on standards for the coding of moving pictures and
       audio. MPEG has created its own homepage, providing information on
       the what, where, when and how of the standards.
     * Q: What is MPEG-1, -2, and so on?
     * A: MPEG approaches the growing need for multimedia standards
       step-by-step. Today, three main "steps" are defined (MPEG-1,
       MPEG-2, MPEG-4).
          + MPEG-1: "Coding of Moving Pictures and Associated Audio for
            Digital Storage Media at up to about 1.5 Mbit/s"
          + MPEG-2: "Generic Coding of Moving Pictures and Associated
            Audio Information"
          + MPEG-3: originally planned mainly for HDTV applications;
            later on, it was merged into MPEG-2
          + MPEG-4: "Coding of Audio-Visual Objects"
     * Q: Are MPEG-3 and Layer-3 the same thing?
     * A: No! Layer-3 is a powerful audio coding scheme which certainly
       is part of the MPEG standard. Layer-3 is defined within the audio
       part of both existing international standards, MPEG-1 and MPEG-2.
       So please do not mix audio layers and MPEG standards!
     * Q: What is the status of MPEG-1?
     * A: Work on MPEG-1 is finished. The first three parts are
       standardized since 1992. MPEG-1 consists of five parts:
          + IS-11172-1 ("System") describes synchronization and
            multiplexing of video and audio signals.
          + IS-11172-2 ("Video") describes compression of video signals,
            focussing on progressive scan video (and mainly aiming at
            "Video-on-CD" applications).
          + IS-11172-3 ("Audio") describes a generic audio coding family,
            with three hierarchically compatible members (called
            "Layer-1", "Layer-2" and "Layer-3").
          + IS-11172-4 ("Compliance Testing") describes procedures for
            determining the characteristics of coded bitstreams and the
            decoding process and for testing compliance with the
            requirements stated in the other parts.
          + DTR-11172-5 ("Software Simulation") is a technical report
            about a full software implementation of the first three parts
            of MPEG-1.
     * Q: What is the status of MPEG-2?
     * A: MPEG-2 currently consists of nine parts. The first three parts
       are standardized since 1994, with some amendments included later
       on. Other parts are at different levels of completion.
          + IS-13818-1 ("System") describes synchronization and
            multiplexing of video and audio signals; it is also
            standardised by ITU-T as H.222.
          + IS-13818-2 ("Video") describes a generic video coding tool
            set, supporting interlaced scan; it is also standardised by
            ITU-T as H.262.
          + IS-13818-3 ("Audio") describes a backward compatible
            extension of MPEG-1 for multichannel audio coding ("surround
            sound", "multilingual sound") and a non-backward compatible
            extension to lower sample rates, to support sound
            applications with limited audio bandwidth requirements.
          + IS-13818-4 ("Conformance Testing") describes procedures for
            determining the characteristics of coded bitstreams and the
            decoding process and for testing compliance with the
            requirements stated in the other parts.
          + DTR-13818-5 ("Software Simulation") is a technical report
            about a full software implementation of the first three parts
            of MPEG-2.
          + IS-13818-6 ("System Extensions - Digital Storage Media
            Command and Control (DSM-CC))" describes a set of protocols
            for client-server applications
          + CD-13818-7 ("Audio, Non-Backwards-Compatible (NBC) - Coding")
            describes an improved audio coding scheme for mono- and
            stereophonic signals as well as for multichannel sound
          + 13818-8 ("Video, extension to 10-bit input samples") has been
            withdrawn, due to insufficient interest.
          + IS-13818-9 ("Real-Time Interface Specification for Low-Jitter
            Applications") defines timing constraints on the real-time
            delivery of MPEG-2 transport bitstreams.
          + WD-13818-10 ("Conformance Extensions - DSM-CC") describes the
            addendum to IS 13818-4 for DSM-CC
     * Q: "NBC audio"?" What is the motivation for this working group?
       What are the results?
     * A: Well, during the work for multichannel audio coding
       (IS-13818-3), it turned out that backwards compatible (BC) schemes
       suffer from the matrixing process. Matrixing is required to allow
       a MPEG-1 decoder to playback all surround channels via its two
       stereophonic channels. Unfortunately, some of the introduced
       quantisation noise may become audible after dematrixing. All in
       all, during an ISO listening test in spring 1994, BC multichannel
       coding performed poorer, compared to non-ISO coding schemes (e.g.,
       Dolby«s AC-3). So the NBC working group currently develops a new
       audio coding scheme. NBC audio achieves a significant better
       performance, not only for multichannel surround sound, but even
       for monophonic signals (here targeting "true transparency" at 64
       kbps). In spring 1996, ISO performed a listening test for
       5-channel surround sound, and NBC audio using a total bit-rate of
       320 kbps scored better than Layer-2 BC at a bit-rate of 640 kbps.
       NBC audio will also become one of the MPEG-4 audio coding
       algorithms.
     * Q: How do I get the MPEG documents?
     * A: Well, you may contact ISO, or you order it from your national
       standards body. E.g., in Germany, please contact DIN.
     * Q: Is some public C source available?
     * A: Well, there is "public C source" available on various sites,
       e.g. at ftp://ftp.fhg.de/pub/layer3/ or at
       ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/mpeg2/public_software
       / . This code has been written mainly for explanation purposes, so
       do not expect too much performance.
       
  Some Basics about MPEG Audio - or: What about Layer-1, Layer-2, Layer-3?
  
     * Q: Talking about MPEG audio, I always hear "Layer 1, 2 and 3".
       What does it mean?
     * A: MPEG describes the compression of audio signals using high
       performance perceptual coding schemes. It specifies a family of
       three audio coding schemes, simply called Layer-1, Layer-2, and
       Layer-3. From Layer-1 to Layer-3, encoder complexity and
       performance (sound quality per bitrate) are increasing.
       The three codecs are compatible in a hierarchical way, i.e. a
       Layer-N decoder may be able to decode bitstream data encoded in
       Layer-N and all Layers below N (e.g., a Layer-3 decoder may accept
       Layer-1,-2,-3, whereas a Layer-2 decoder may accept only Layer-1
       and -2.)
     * Q: So we have a family of three audio coding schemes. What does
       the MPEG standard define, exactly?
     * A: For each Layer, the standard specifies the bitstream format and
       the decoder. To allow for future improvements, it does not specify
       the encoder, but an informative chapter gives an example for an
       encoder for each Layer.
     * Q: What have the three audio Layers in common?
     * A: All Layers use the same basic structure. The coding scheme can
       be described as "perceptual noise shaping" or "perceptual subband
       / transform coding". The encoder analyzes the spectral components
       of the audio signal by calculating a filterbank (transform) and
       applies a psychoacoustic model to estimate the just noticeable
       noise-level. In its quantization and coding stage, the encoder
       tries to allocate the available number of data bits in a way to
       meet both the bitrate and masking requirements.
       The decoder is much less complex. Its only task is to synthesize
       an audio signal out of the coded spectral components.
       All Layers use the same analysis filterbank (polyphase with 32
       subbands). Layer-3 adds a MDCT transform to increase the frequency
       resolution.
       All Layers use the same "header information" in their bitstream,
       to support the hierarchical structure of the standard.
       All Layers have a similar sensitivity to biterrors. They use a
       bitstream structure that contains parts that are more sensitive to
       biterrors ("header", "bit allocation", "scalefactors", "side
       information") and parts that are less sensitive ("data of spectral
       components").
       All Layers support the insertion of programm-associated
       information ("ancillary data") into their audio data bitstream.
       All Layers may use 32, 44.1 or 48 kHz sampling frequency.
       All Layers are allowed to work with similar bitrates:
       Layer-1: from 32 kbps to 448 kbps
       Layer-2: from 32 kbps to 384 kbps
       Layer-3: from 32 kbps to 320 kbps
       The last two statements refer to MPEG-1; with MPEG-2, there is an
       extension for the sampling frequencies and bitrates (see below).
     * Q: What are the main differences between the three Layers, from a
       global view?
     * A: From Layer-1 to Layer-3, complexity increases (mainly true for
       the encoder), overall codec delay increases, and performance
       increases (sound quality per bitrate).
     * Q: What are the main differences between MPEG-1 and MPEG-2 in the
       audio part?
     * A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1,
       -2 and -3. The new audio features of MPEG-2 are a "low sample rate
       extension" to address very low bitrate applications with limited
       bandwidth requirements (the new sampling frequencies are 16, 22.05
       or 24 kHz, the bitrates extend down to 8 kbps), and a
       "multichannel extension" to address surround sound applications
       with up to 5 main audio channels (left, center, right, left
       surround, right surround) and optionally 1 extra "low frequency
       enhancement (LFE)" channel for subwoofer signals; in addition, a
       "multilingual extension" allows the inclusion of up to 7 more
       audio channels.
     * Q: Is this all compatible to each other?
     * A: Well, more or less, yes - with the execption of the low sample
       rate extension. Obviously, a pure MPEG-1 decoder is not able to
       handle the new "half" sample rates.
     * Q: You mean: compatible!? With all these extra audio channels?
       Please explain!
     * A: Compatibility has been a major topic during the MPEG-2
       definition phase. The main idea is to use the same basic bitstream
       format as defined in MPEG-1, with the main data field carrying two
       audio signals (called L0 and R0) as before, and the ancillary data
       field carrying the multichannel extension information. Without
       going further into details, two terms should be explained here:
       "forwards compatible": the MPEG-2 decoder has to accept any MPEG-1
       audio bitstream (that represents one or two audio channels)
       "backwards compatible": the MPEG-1 decoder should be able to
       decode the audio signals in the main data field (L0 and R0) of the
       MPEG-2 bitstream "Matrixing" may be used to get the surround
       information into L0 and R0: L0 = left signal + a * center signal +
       b * left surround signal R0 = right signal + a * center signal + b
       * right surround signal Therefore, a MPEG-1 decoder can reproduce
       a comprehensive downmix of the full 5- channel information. A
       MPEG-2 decoder uses the multichannel extension information (3 more
       audio signals) to reconstruct the five surround channels.
     * Q: In your footnotes, you indicate the use of some "non-ISO"
       extension inside your Fraunhofer codec, called "MPEG 2.5", to
       further improve the performance at very low bitrates (e.g. 8 kbps
       mono). What do you mean by this?
     * A: Oh, yes. Well, the MPEG-2 standard allows bitrates as low as 8
       kbps, for the low sample rate extension. At such a low bitrate,
       the useful audio bandwidth has to be limited anyway, e.g. to 3
       kHz. Therefore, the actual sample rate could be reduced, e.g. to 8
       kHz. The lower the sample rate, the better the frequency
       resolution, the worse the time resolution, and the better the
       ratio between control information and audio payload inside the
       bitstream format. As the MPEG-2 standard defines 16 kHz as lowest
       sample rate, we introduced a further extension, again dividing the
       low sample rates of MPEG-2 by 2, i.e. we introduced 8, 11.025, and
       12 kHz - and we named this extension to the extension "MPEG 2.5".
       "Layer-3" performs significantly better with 8 kbps @ 8 kHz or 16
       kbps @ 11 kHz than with 8 or 16 kbps @ 16 kHz.
       
  Advanced Features of Layer-3 - or: Why does Layer-3 perform so well?
  
     * Q: Well, I read your statement about "CD-like" performance,
       achieved at a data reduction of 4:1 (or 384 kbps total bitrate)
       with Layer-1, 6..8:1 (or 256..192 kbps total bitrate) with
       Layer-2, and 12..14:1 (or 128..112 kbps total bitrate) with
       Layer-3. Can you explain a little further?
     * A: Well, each audio Layer extends the features of the Layer with
       the lower number. The simplest form is Layer-1. It has been
       designed mainly for the DCC (Digital Compact Cassette), where it
       is used at 384 kbps (called "PASC"). Layer-2 has been designed as
       a trade-off between complexity and performance. It achieves a good
       sound quality at bitrates down to 192 kbps. Below, sound quality
       suffers. Layer-3 has been designed for low bitrates right from the
       start. It adds a number of "advanced features" to Layer-2: the
       frequency resolution is 18 times higher, which allows a Layer-3
       encoder to adapt the quantisation noise much better to the masking
       threshold only Layer-3 uses entropy coding (like MPEG video) to
       further reduce redundancy only Layer-3 uses a bit reservoir (like
       MPEG video) to suppress artefacts in critical moments and Layer-3
       may use more advanced joint-stereo coding methods
     * Q: I see. Sounds to me as if Layer-3 is something like a
       "Layer-2++". Now, tell me more about sound quality. How do you
       assess that?
     * A: Today, there is no alternative to expensive listening tests.
       During the ISO-MPEG process, a number of international listening
       tests have been performed, with a lot of trained listeners. All
       these tests used the "triple stimulus, hidden reference" method
       and the "CCIR impairment scale" to assess the sound quality. The
       listening sequence is "ABC", with A = original, BC = pair of
       original / coded signal with random sequence, and the listener has
       to evaluate both B and C with a number between 1.0 and 5.0. The
       meaning of these values is: 5.0 = transparent (this should be the
       original signal) 4.0 = perceptible, but not annoying (first
       differences noticable) 3.0 = slightly annoying 2.0 = annoying 1.0
       = very annoying
     * Q: Listening tests are certainly an expensive task. Is there
       really no alternative?
     * A: Well, at least not today. Tomorrow may be different. To assess
       sound quality with perceptual codecs, all traditional "quality"
       parameters (like signal-to-noise ratio, total harmonic distortion,
       bandwidth) are rather useless, as any codec may introduce noise
       and distortions as long as these do not affect the perceived sound
       quality. So, listening tests are necessary, and, if carefully
       prepared and performed, they lead to rather reliable results.
       Nevertheless, Fraunhofer-IIS works on the development and
       standardisation of objective sound quality assessment tools, too.
       And there is already a first product available (contact Opticom),
       a real-time measurement tool that nicely supports the analysis of
       perceptual audio codecs. If you need more information about the
       Noise- to-Mask-Ratio (NMR) technology, feel free to contact
       nmr@iis.fhg.de.
     * Q: O.K., back to these listening tests and the performance
       evaluation. Come on, tell me some results.
     * A: Well, for more details you should study one of these AES papers
       or the MPEG documents. For Layer-3, the main result is that it
       always performed superior at low bitrates (64 kbps per audio
       channel or below). Well, this is not completely surprising, as
       Layer-3 uses the same tool set as Layer-2, but with some
       additional advanced coding features that all address the demands
       of very low bitrate coding. One impressive example is the ISO-MPEG
       listening test carried out in September 94 at NTT Japan (doc.
       ISO/IEC JTC1/SC29/WG11 N0848, 11.Nov. 94). Another interesting
       result is the conclusion of the task group TG 10/2 within the ITU-
       R, which recommends the use of low bit-rate audio coding schemes
       for digital sound-broadcasting applications (ITU-R doc. BS.1115).
     * Q: Very interesting! Tell me more about this recommendation!
     * A: The task group TG 10/2 finished its work in 10/93. The
       recommendation defines three fields of broadcast applications and
       recommends Layer-2 with 180 kbps per channel for distribution and
       contribution links (20 kHz bandwidth, no audible impairments with
       up to 5 cascaded codec), Layer-2 with 128 kbps per channel for
       emission (20 kHz bandwidth), and Layer-3 with 60 (120) kbps for
       mono (stereo) signals for commentary links (15 kHz bandwidth).
       
  Basics of Perceptual Audio Coding - or: What is the trick?
  
   Sorry - under construction...
   
  References - or: Where to find more information?
  
   For around 10 years, perceptual audio coding is a permanent topic at
   various scientific conferences; e.g., the AES (Audio Engineering
   Society) organizes two conventions per year. You may find the
   following papers helpful:
   
    1. Brandenburg, Stoll, et al.: "The ISO/MPEG-Audio Codec: A Generic
       Standard for Coding of High Quality Digital Audio", 92nd AES,
       Vienna Mar. 92, pp. 3336; revised version ("ISO-MPEG-1 Audio: A
       Generic Standard...") published in the Journal of AES, Vol.42, No.
       10, Oct. 94
    2. Eberlein, Popp, et al.: "Layer-3, a Flexible Coding Standard",
       94th AES, Berlin Mar. 93, pp. 3493 3) Church, Grill, et al.: "ISDN
       and ISO/MPEG Layer-3 Audio Coding: Powerful New tools for
       Broadcast and Audio Production", 95th AES, New York Oct. 93, pp.
       3743
    3. Grill, Herre, et al.: "Improved MPEG-2 Audio Multi-Channel
       Encoding", 96th AES, Amsterdam Feb. 94, pp. 3865
    4. Witte, Dietz, et al.: "Single Chip Implementation of an ISO/MPEG
       Layer-3 Decoder", 96th AES, Amsterdam Feb. 94, pp. 3805
    5. Herre, Brandenburg, et al.: "Second Generation ISO/MPEG Audio
       Layer-3 Coding", 98th AES, Paris Feb. 95
    6. Dietz, Popp, et al.: "Audio Compression for Network Transmission",
       99th AES, New York Oct. 95, pp. 4129
    7. Brandenburg, Bosi: "Overview of MPEG-Audio: Current and Future
       Standards for Low Bit-Rate Audio Coding, 99th AES, New York Oct.
       95, pp. 4130
    8. Buchta, Meltzer, et al.: "The WorldStar Sound Format", 101st AES,
       Los Angeles Nov. 96, pp. 4385
    9. Bosi, Brandenburg, et al: "ISO/IEC MPEG-2 Advanced Audio Coding",
       101st AES, Los Angeles Nov. 96, pp. 4382
       
   Please note that these papers are not available electronically. You
   have to order the preprints ("pp. xxxx") directly from the AES.
   
  Addressess
  
     * AES, 60 East 42nd Street, Suite 2520 New York, NY 10165-2520, USA
       fax: +1 212 682 0477
       email: hq@aes.org
       http://www.aes.org/
     * AudioActive
       http://www.audioactive.com/
     * AVT Audio Video Technologies GmbH, Rathsbergstra§e 17
       D-90411 NŸrnberg, Germany
       fax: +49 911 5271 100
       contact: Wolfgang Peters
       email: WPeters@avt-nbg.de
       http://www.avt-nbg.de
     * Bertelsmann Publishing, Neumarkter Stra§e 18
       D-81664 MŸnchen, Germany
       fax: +49 89 43189 737
       email: 72662.3126@compuserve.com
       http://www.bep.de/
     * Broadcast Electronics Inc, 4100 N 24th St.
       Quincy, IL 62305-3606, USA
       fax: +1 217 224 9607
       email: bdcast@bdcast.com
       http://www.marti.bdcast.com/
     * CCS Corporate Computer Systems Europe GmbH, Ludwigstr. 45
       D-85396 Hallbergmoos, Germany
       fax: +49 811 55 16 55
       email: info@ccs-europe.com
       http://www.ccs-europe.com/
     * Cerberus Central Ltd, 84 Marylebone High Street
       London W1M 3DE, UK
       fax: +44 171 637 3842
       email: mail@cdj.co.uk
       http://www.cdj.co.uk/
     * Deutsche Telekom AG, Technologiezentrum Darmstadt
       Aussenstelle Berlin, Abteilung EK 21
       Oranienburger Str. 70, D-10117 Berlin, Germany
       fax: +49 30 2845 4146
     * Dialog 4 System Engineering GmbH, Monreposstr. 55
       D-71634 Ludwigsburg, Germany
       fax: +49 7141 22667
       email: info@dialog4.com
       http://www.dialog4.com/
     * DIN Beuth Verlag, Auslandsnormen
       D-10772 Berlin, Germany
       fax: +49 30 2601 1231
       email: postmaster@din.de
     * Fraunhofer-IIS, Am Weichselgarten 3
       D-91058 Erlangen, Germany
       contact: Harald Popp
       fax: +49 9131 776 399
       email: layer3@iis.fhg.de
       http://www.iis.fhg.de/departs/amm/layer3/
     * ISO Central Secretariat, Case postale 56,
       CH-1211 Geneva 20, Switzerland
       fax: +41 22 733 3430
       email: central@isocs.iso.ch
       http://www.iso.ch/
     * ITT Intermetall GmbH, Hans-Bunte-Str. 19
       D-79108 Freiburg, Germany
       fax: +49 761 517 2395
       email: info@itt-sc.de
     * Macromedia Inc., 600 Townsend
       San Francisco, CA 94103, USA
       fax: +1 415 626 0554
       http://www.macromedia.com/
     * Meister Electronic GmbH, Kšlner Str. 37
       D-51149 Kšln, Germany
       fax: +49 2203 1701 30
     * Microsoft Inc., One Microsoft Way
       Redmond, WA 98052 - 6399
       http://www.microsoft.com/corpinfo/PRESS/1996/Dec96/ntshw2pr.htm
     * MODE
       http://www.mode.net/
     * MPEG
       http://www.cselt.stet.it/mpeg/
     * NSM, Im Tiergarten 20 - 30
       D-55411 Bingen am Rhein, Germany
       contact: Mr. Ballhorn
       fax: +49 6721 407 519
       http://www.nsm.de/nsm_it/
     * Opticom, Am Weichselgarten 7
       D-91058 Erlangen, Germany
       fax: +49 9131 691325
       email: info@opticom.de
       http://www.opticom.de
     * Proton Data, Marrensdamm 12 b
       D-24944 Flensburg, Germany
       fax: +49 461 3816948
       email: proton.data@t-online.de
     * Siemens AG Halbleiter, P.O. Box 80 17 09
       D-81617 Muenchen, Germany
       fax: +49 89 4144 4697
       email: Christine.Born@hl.siemens.de
     * Sygna A/S, P.O.Box 191
       N-5801 Sogndal, Norway
       fax: +47 5767 6190
       email: bach@sygna.no
       http://www.mode.net/partners/sygna.html
     * Telos Systems, 2101 Superior Avenue
       Cleveland, OH 44114, USA
       fax: +1 216 241 4103
       email: info@zephyr.com
       http://www.zephyr.com/
     * WorldSpace 11 Dupont Circle, N.W., 9th Floor
       Washington, DC 20036, USA
       fax: +1 202 884 7900
       email: gene@mail.worldspace.com
       http://www.worldspace.com
       
  About us - or: What is going on at our Fraunhofer Institute?
  
     * Q: Who is or was Fraunhofer? And what does your institute do?
     * A: As researcher, inventor and entrepreneur, Joseph von Fraunhofer
       (1787 - 1826) won high acclaim for his scientific and commercial
       achievements. When the Fraunhofer-Gesellschaft was founded in
       Munich in 1949, his name was chosen as the "guiding light" of the
       association.
       Today, the Fraunhofer-Gesellschaft employs a staff of around 8.000
       persons and operates 46 research institutes in Germany and one
       resource centre in the United States, with a research volume of
       around 1 billion DM. 70 % of its income is obtained by contract
       research for public authorities as well as for industrial clients.
       The Fraunhofer Institut Integrierte Schaltungen (IIS) was founded
       in Erlangen in 1985. It is headed by Prof. Dr.Ing. Dieter Seitzer
       and Dr. Heinz GerhŠuser. Today, a staff of 160 persons works on
       projects in the field of information electronics, developing
       microelectronic solutions at chip-, board- and system level. In
       its department "Audio & Multimedia", headed by Dr. Karlheinz
       Brandenburg, around 40 engineers concentrate on the development
       and real-time implementation of signal processing algorithms in
       the field of audiovisual communications.
     * Q: So you focus on "contract research". What does this mean
       exactly?
     * A: Simply put: we have to earn our money. In case of our
       institute, we are funded by public money for less than 20 % - the
       rest of our budget has to be financed by research & development
       projects. You may call this work "applied research", i.e. in
       contrast to a university, we focus on real-world applications, and
       in contrast to an engineeering office, we focus on
       state-of-the-art applications that bear some technical risks (and
       therefore need some further research). With other words, we are
       always trying to stay at the leading edge of technology. Take
       audio coding as an example. We started in 1987, in a close
       cooperation with the University of Erlangen, to develop an
       advanced audio coding scheme for future broadcast services (Eureka
       147, DAB radio). In 1991, our algorithm ("Layer-3") became the
       most powerful member of audio coding schemes of the international
       ISO-MPEG standard. Since then, we work on industrial applications
       as well as on further audiovisual research projects, e.g. MPEG-4
       scalable audio coding, MPEG-2 NBC audio coding, or MPEG-4
       audiovisual terminals.
     * Q: I am interested in your Layer-3 technology. What can you do for
       me?
     * A: Well - basically, you may use our knowhow as a cost-effective
       road to your application. We expect a certain renumeration for our
       development work that we carried out in advance. We call this a
       "know-how share". In addition, you may want us to work on some
       special R&D tasks for you, so you have to pay for this extra
       effort, too. This is the principle. In case of Layer-3, we have
       advanced simulation sources (C) for encoder and decoder as well as
       DSP source and assembler code for decoders on DSP 5600x
       (Motorola), DSP 32C (AT&T), TMS320C30 (TI), and MAS 3503 C (ITT),
       and for encoders on a hybrid solution (32C + 5600x) as well as on
       a pure 5600x (2 DSPs) solution. We expect a single 5630x Layer-3
       encoder until the end of 1996. Depending on your specific
       technical needs, the knowhow-share sum may range from several
       10.000.- $ to more than 100.000.- $. In any case, we expect
       significantly more money for the encoder, as this is the part that
       is responsible for the performance of a Layer-3 system (and so it
       is the part where most of our knowhow is concentrated). So you
       know the framework. We are open for any discussion and any new
       ideas - so feel free to contact us.
       Oh - by the way you are interested in some rough ASIC estimations
       for a Layer-3 stereo decoder. You will need a computation power of
       around 12 MIPs, a Data ROM of around 2.5 Kwords, a Data RAM of
       around 4.5 Kwords, and a Programm ROM of around 2 to 4 Kwords
       (depending on the instruction set). The word length should be 20
       bit, at least.
     * Q: What else do I have to keep in mind, if I want to use Layer-3
       in my application? Are there patents involved? How may I address
       this topic?
     * A: You are right. For all MPEG audio coding schemes, patent rights
       exist. Using MPEG audio, you use these rights - and in order not
       to violate them, you should establish a license contract with the
       patent holders. This is true for all MPEG audio Layers. In case of
       Layer-3, there are currently two entities that may give licenses,
       Thomson Multimedia, Paris, and Fraunhofer-IIS, Erlangen. Due to an
       agreement between them, Thomson is in charge of consumer-oriented
       applications, and Fraunhofer-IIS is in charge of
       professional-oriented applications. License contracts typically
       address only the patent issue. Due to the rules of ISO-MPEG, the
       license has to be given non-exclusively on fair and reasonable
       terms. Of course, details depend on the specific business model.
       So there are four steps for a Layer-3 application. First, defining
       the technical requirements and finding the most cost-effective
       road to meet them. Second, following that road to the final
       solution. Third, defining the license rules depending on the
       business model. Four, signing the resulting license contract.
       
   Fraunhofer Institut Integrierte Schaltungen IIS, Am Weichselgarten 3,
   D-91058 Erlangen, Germany, Fax: +49-9131-776-399
   
   FAQ, 29. Februar 1997, by Harald Popp
   
     _________________________________________________________________
                                      
   
    ©Copyright by Fraunhofer Gesellschaft, Fraunhofer Institut fŸr
    Integrierte Schaltungen-A
    Am Weichselgarten 3; D-91058 Erlangen; Germany
    Tel.: +49 (0) 9131/776-0; Fax: +49 (0) 9131/776-999
    
     _________________________________________________________________

The entire AOH site is optimized to look best in Firefox® 3 on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986- AOH
We do not send spam. If you have received spam bearing an artofhacking.com email address, please forward it with full headers to abuse@artofhacking.com.