TUCoPS :: Crypto :: csslect.txt

TUCoPS :: Crypto :: csslect.txt
Lecture Notes from a lecture given at CMU on CSS

  ------------------------------------------------------------------------

                  Lecture 33 (Wednesday, December 6, 2000)

        Course: 15-412 Operating Systems: Design and Implementation
                          Lecturer: Gregory Kesden

Slides Used In Today's Lecture

     CSS (ppt)

Content Scrambling System (CSS): Introduction

     You may recently have heard about the Content Scrambling System
     (CSS) or CSS-compatible open-source software known as DeCSS. CSS,
     which includes both player-host mutual authentication and data
     encryption, is used to protect the content of DVDs from piracy
     and to enforce region-based viewing restrictions. It may also
     have other purposes and certainly has other side-effects. DeCSS,
     which is open source, allows Linux-based systems to access the
     content of DVDs by emulating a licensed player and performing the
     authentication and decryption.

     The national attention is the result of a recent federal circuit
     court ruling, in which the court held, among other things, that
     the distribution of the DeCSS source code violates the Digital
     Millenium Copyright Act (DMCA). It is my understanding that this
     ruling holds that it is illegal to distribute CSS compatible
     source code, including by URL reference ("link") to the same.

     Others, including myself, believe that there exists a very clear
     distinction between human ideas expressed unambiguously and
     concisely in source code and the machine that exists after this
     source code is compiled or interpreted, and made runnable within
     a machine's memory. I believe that ideas expressed in this source
     code are Constitutionally protected speech, and that only the
     executible machine may be considered a "circumvention
     technology". As we discussed on the first day of class, "A
     program is a specification, whereas a process is an instance of a
     program in execution."

     Furthermore, I believe that reverse-engineering for the purposes
     of ensuring compatibility, as a matter of public policy, and as a
     matter of the DMCA, should be and is protected. I believe that
     DeCSS, the product of reverse-engineering, is nothing more than a
     tool which allows Linux boxes to run .VOB programs on equipment
     other than that licensed by the monopoly DVD Copy Control
     Association. But my opinion on this issue is uneducated and
     unreliable -- I am not an attorney, and at least one court seems
     to have taken a different view.

     Although I may disagree with the law, I cannot stand in front of
     this classroom and violate it or suggest that you violate it.
     Instead, I encourage you to obey it -- and to act to change it if
     you disagree with it.

     As we'll discuss shortly, CSS is based on two Linear Feedback
     Shift Registers (LFSRs). Although very efficient in hardware,
     LFSRs are somewhat inefficient to implement in software, and
     there are some interesting programming techniques that are used
     to speed up the process. Unfortunately, I do not believe it is
     lawful for me to show that code, or equivalent code, to you today
     -- for that reason, I will not. Unfortunately, the same concern
     leads me to suggest that you refrain from reviewing the DeCSS
     source code after this lecture. Unlike my suggestion to review an
     implementation of DES, obtaining source code for DeCSS may well
     violate federal law. So today, you will literally get, "As much
     of an education as the law allows -- and no more."

     But don't take anything I've said as legal advice -- I am a
     teacher, not a lawyer. Ask an experienced and licensed attorney,
     if you have any concerns.

     So, without further delay -- let's dig in and take a look at CSS
     as an example of a stream cipher and an authentication protocol.

System Overview

     In our discussion of CSS, we are going to look at the system used
     to play DVDs in terms of three components: the DVD itself, the
     DVD player that reads the disk and delivers the content, and the
     host (computer, host board, &c).

     The DVD disk itself contains the encrypted content, as well as a
     hidden area. It is my understanding that commerciallyt writeable
     DVDs already have this area marked, so that they cannot write to
     it. The contents of this hidden area cannot be delivered, except
     to an authenticated device. Presumedly, any device which can
     authenticate has been licensed by the DVD Copy Control
     Association, and as a consequence is trusted to receive the
     information. This hidden area contains the several pieces of
     information that we will soon discuss: a table of encrypted disk
     keys, and an encrypted disk key (disk key hash).

     The player itself stores the player keys that are used to decrypt
     the disk key (more later), the region code that identifies the
     region in which the player should be used, and another secret
     that is used for authentication with the host.

     The host seems to contain a secret that is used for
     authentication. It seems that this isn't a public key encryption
     scheme, but rather a private key scheme. We'll discuss
     authentication more shortly.

     [Image]

Region Code

     One other detail:

        * Each DVD contains a region code that indicates the region of
          the world in which it is intended to be viewed.
        * Each player knows the region in which it was to be sold.
        * If the region code of the player doesn't match the region
          code on the DVD, the player won't deliver the data.
        * This is to help the MPAA ensure that DVDs don't leak out
          into parts of the world ahead of the "first showing", &c.

     Overview of Keys

          Authentication Key

             * This "secret" is used as part of the mutual
               authentication process.

          Session Key (Bus Key)

             * This key is negotiated during authentication and
               is used to encrypt the title and disk keys before
               sending them over the unprotected bus. The
               encryption is necessary to prevent eavesdropping.

          Player Key

             * This key is Licensed by the DVD Copy Control
               Association to the manufacturer of a DVD player.
               It is stored within the player. It is used to
               establish the trustworthiness of the player. It is
               used to decrypt the disk key.

          Disk Key

             * This key is used to encrypt title key. It is
               decrypted using the player key.

          Sector Key

             * Each sector has a 128-byte plain-text header.
               Bytes 80 - 84 of each sector's header contain an
               additional key used to encode the data within the
               sector.

          Title Key

             * This key is XORed with a per-sector key to encrypt
               the data within a sector

     Overview of the Process

          Step 1: Mutual Authentication

             * The host and the drive use a challenge-response
               system to establish their trustworthiness to each
               other. In the process, they negotiate a session
               key.

          Step 2: Decoding disk

             * The DVD player tries each of several player keys
               until it can decode the disk key. The disk key is
               a disk-wide secret.

          Step 3: Send disk and title keys

             * The title and bus keys are sent from the player to
               the host. The session key is used to encrypt the
               title and disk keys in transit to prevent a
               man-in-the-middle attack.

          Step 4:

             * The DVD player sends a sector to the host.

          Step 5:

             * The host decodes the title key using the disk key.

          Step 6:

             * The host decodes the sector using the title key,
               and a the sector key in the sector's header.

     Disk and Player Keys

          As we discussed, each player has a small number of
          licensed player keys. These keys can be used to decrpyt
          the disk key on a particular DVD. This disk key is used
          to decrypt title keys on the disk. Each work on the
          disk is encrypted with a title key. So in order to
          decrpyt the work, we must begin by decrypting the disk
          key.

          This disk key is stored on the hidden sector of the DVD
          along with a a table containing the disk key encrypted
          will each of the 409 possible player keys. It also
          holds the disk key encrypted with the disk key.

          The player decrypts the appropriate entry in the table
          and then verifies that it has correctly decoding the
          disk key, by decoding the encrypted disk key. The
          result, should be the disk key. That is to say that the
          decryption of the disk key, using the disk key, should
          prove to be the identity function. Players have more
          than one player key, so if the operation fails, they
          try again with an alternate.

     Linear Feedback Shift Registers (LFSRs) and Encrpytion

          So, let's begin to talk about how the encryption works.
          We're going to begin with a little bit of background,
          then look at how data is encrypted, and then look at
          how keys, such as the title key above, are encrypted.

          One technique used to encode a stream is to XOR it with
          a pseudo-random bit stream. If this random-looking bit
          stream can be regenerated by the receiver of the
          message, the receiver will be able to decode the
          message by repeating the XOR operation.

          The LFSR is one popular technique for generating a
          pseudo-random bit stream. After the LFSR is seeded with
          a value, it can be clocked to generate a stream of
          bits. Unfortunately, LFSRs aren't truly random. They
          are periodic and will eventually repeat. In general,
          the larger the LFSR, the greater its period. The period
          also depends on the particular configuration of the
          LFSR. If the initial value of an LFSR is 0, it will
          produce only 0s, this is sometimes called null cycling.

          LFSRs are often combined through addition,
          multiplexers, or logic gates, to generate less
          predictable bit streams.

          An LFSR is seeded with an initial value. With each
          clock tick, certain tapped bits of the LFRS are
          evaluated by a feedback function. The output of this
          feedback function is then shifted into the register.
          The output of the register is the bit that is shifted
          out. This process is illustrated in the slide below:

          [Image]

     CSS's LFSRs

          The CSS algorithm makes use of two LFSRs. The first is
          a 17-bit LFSR. Initially, it contains a two byte seed,
          with a 1 injected into the fourth bit, for a total of
          17 bits. The is placed into the register to prevent
          null cycling. The second LFSR operates the same way,
          except it holds 25 bits.

          Unlike typical LFSR-based stream ciphers, CSS throws
          away the bit that is shifted out of each LFSR. Instead,
          it considers the output of the feedback function to be
          both the input to the LFSR and the output.

          CSS uses a 40-bit, or 5 byte key. This is explains the
          size of the two registers: one is seeded with the first
          two bytes of the key, and the other the remaining three
          bytes of the key.

          The two LFSRs are shown in the slides below:

          [Image]

          [Image]

     LFSR Addition

          The output from the two LFSRs is combined using 8-bit
          addition. After each LFSR clocks out 8 bits of output,
          this output is added to form an output byte. The carry
          out from this addition is used as the carry in for the
          addition yielding the next output byte. It is worth
          noting that this is a pretty week way of using the
          LFSRs. Other approaches use more LFSRs, and do more
          complicated things with them, including clocking them
          at different rates, or combining them using
          multiplexers -- but not here.

          CSS actually has four different modes. Depeding on the
          mode, the output of either or both LFSRs may be
          bit-wise inverted before the addition. The table below
          shows the inverter settings for each mode:

              Invert Output of LFSR?
                Mode     LFSR-17 LFSR-25
           AuthenticationYes     No
            Session Key  No      No
             Title Key   No      Yes
                Data     Yes     No

          The slide below shows the process, including the LFSRs,
          inverters, and addition. Please remember that each
          inverter is only enabled in those modes noted as "yes"
          in the table above:

          [Image]

     Data Encryption/Decryption

          To encrypt or decrypt data, each LFSR is seeded with a
          portion of the title key. LFSR-17 is seeded with bytes
          0 and 1 and LFSR-25 is seeded with bytes 2, 3, and 4.
          These bytes are seeded with a nonce that I called the
          sector key that is read from each sector.

          The sector key is stored in bytes 80-84 of the sector.
          The first 128 bytes of each sector, the sector header,
          which includes the sector key, is plain text. The first
          two bytes (0 and 1) of the title key are XORed with the
          first two bytes of the sector (80 and 81), before
          seeding LFSR-17. Similarly, bytes 2-4 of the title key
          are XORed with bytes 82-84 of the sector, before
          seeding LFSR-25. Please also remember that a "1" is
          injected into each seed at bit 4, to make the seeds 17
          and 25 bits, respectively.

          Once the LFSRs are seeded, their output can be added
          together as described above, to form the pseudo-random
          bit stream. This bit stream is XORed with the
          plaintext, to generate the ciphertext. Much as was the
          case with DES, bytes of the plaintext are run through a
          table-based S-box prior to the XOR operation. Upon
          decoding, this operation is reversed. Although the
          initial permutation substition in DES was performed to
          improve the runtime of DES on 8-bit machines, the
          reason for this substitution is unclear to me. It
          doesn't appear to me to improve either the runtime or
          the strength of CSS -- but I could be wrong.

          The whole process is shown below and can be used for
          encoding or decoding:

          [Image]

     Key Encryption

          In addition to the encryption described above, CSS goes
          through an additional two-step mangling operation in
          the case of keys, including the title key and session
          key. This process is shown in the slide below.

          If we view this slide in terms of columns, each column
          represents one byte of the key. Lk is the output of the
          encryption step for the byte represented by the column.
          The output of the first stage of the mangling function
          feeds the input of the second stage. The first and
          second stage are otherwise identical, except for the
          fact that the output of the 4th byte of the second
          stage does not wrap around and feed the XOR in the
          first column.

          [Image]

     Mutual Authentication

          Before the DVD player will begin to send data over the
          bus to the host, it first goth through a form of weak
          mutual authentication with the host. In the process, it
          negotiates a key for use in encrypting the data in
          transit over the bus. This encrpytion is necessary
          because it would otherwise be possible to snoop the
          plaintext data right off of the bus, rendering the
          prior encryption virtually useless. The key that is
          negotiated is known as the session key or bus key.

          The negotiation begins when the host requests an
          Authentication Grant ID (AGID) from the drive. This ID
          is much like a session ID or a thread ID. It gives a
          name to this particular negotiation.

          The next thing that happens is the host generates an
          arbitrary stream of bytes called a nonce or challenge
          and sends it to the drive. The drive then encrpyts this
          stream of bytes and sends them back to the host. The
          host then decrypts the byte stream and ensures that it
          is correct. It assumes that the drive is authentic,
          because it knew the correct secret and algorithm to
          encode the nonce.

          The host performs exactly ther same operation. It
          generates a nonce, encrpyts it, and sends it to the
          host. The host in turn encrypts the nonce and sends it
          back to the drive. The drive then decrypts the nonce
          and makes sure that it is in fact correct. At this
          point, both the host and the drive trust each other.
          This seems to be a fairly weak authentication scheme,
          because it is based on a secret private key. But this
          key really can't be all that secret, since it is
          presumedly in the firmware inside of every DVD player
          and drive.

          Regardless, both the host and the drive now know each
          other's nonces. Each then takes the two nonces,
          combines them, and encrypts them as described earlier.
          The result is the bus key, a.k.a. session key. This key
          is used to encrpyt all data sent between the drive and
          the player. Since only the player and the host know the
          nonces which change every session, only the player and
          the host can generate the key needed to decrpyt the
          data.

          The only other important note is that, during
          encrpytion and decryption, a different substitution
          table is used to perform the initial substition for
          each of the keys.

          [Image]

     Weakness #1: Brute Force

          Now that we've discussed the CSS algorithm, let's see
          what we can learn from its weaknesses.

          The first thing to note is that the key is only 40 bits
          long. This isn't a terrbily long key -- it is 16 bits
          narrower than the DES key. As we discussed last class,
          even 56 bits can fall somewhat quickly to a prute force
          attack. The 40 bit length was likely selected to
          satisfy U.S. export regulations -- but that came at a
          price.

     Weakness #2: 6 Bytes of LFSR Output

          The second attack that we are going to talk about
          requires 6 bytes of LFSR output. It isn't a terribly
          useful attack, since we don't usually happen to have
          six bytes hanging around, but it is interesting to talk
          about, since it provides a 216 attack on the encryption
          algorithm. In other words, it allows us to crack the
          whole 40-bit key, if we have 6 bytes of output and
          crack the 16-bit (plus 1) register by brute force.

          Here's how it works:

            1. Take a guess at the initial state of LFSR-17.
            2. Clock out 4 bytes.
            3. Use those 4 bytes to determine the corresponding 4
               bytes of output from LFSR-25. This isn't hard,
               since the two are added -- just subtract.
            4. Use the LFSR-25 output to determine LFSR-25's
               state.
            5. Clock out 2 bytes on both LFSRs.
            6. Verify these two bytes. Celebrate or guess again.

     Weakness #2: 5 Bytes of LFSR Output

          Another attack is possible in 2 time, if we know only 5
          bytes of output. As you'll see soon, this is a much
          more practical weakness, because there is yet another
          weakness that can give us 5 bytes.

            1. Guess the initial state of LFSR-17
            2. Clock out 3 bytes
            3. Determine the corresponding output bytes from
               LFSR-25
            4. This reveals all but the highest-order bit of
               LFSR-25
            5. Try both possibilities:
                 1. Clock back 3 bytes
                 2. Select the setting where bit 4 is 1 (remember
                    this is the initial case).
                 3. It is possible that both satisfy this -- try
                    both.
            6. Verify as before

     Weakness #3: Mangled Output

          This attack can recover 5 bytes of the output of the
          LFSRs, given both the ciphertext and the plaintext.
          This 5 bytes can then be used as the 5 output bytes
          needed for the attack above. Recall the mangling
          function we talked about earlier. This attack is based
          on taking a guess and reversing that function.

            1. Guess Lk4
            2. Work backward and verify input byte
            3. This is a 28 attack.
            4. Repeat for all 5 bytes -- this gives you the 5
               bytes of known output for prior weakness.

     Weakness #5: Attacking the Disk Key Hash

          There is also a known attack that can recover the disk
          key from the disk key has in 225 time. This attack is a
          bit complex, so we won't discuss it. The important
          observation is that the existence of a 225 attack
          demonstrates a weakness in the algorithm -- that is a
          long way from 2.

     References

             * Axboe, Jens, dvd-2.2.13-5 Linux patch, 1999.
             * Fawcus, D. and Roberts, Mark, css-auth package,
               December, 1999.
             * Schneider, Bruce, Applied Cryptography, 2ed,
               Wiley, 1996, p. 372-379.
             * Stevenson, Frank A., "Cryptanalysis of Content
               Scrambling System", 8 Nov. 1999, as updated 13
               Nov. 1999.

          Please note:

          You should be aware that, in light of a recent federal
          circuit court decision, it is probably unlawful for you
          to obtain the the first two sources. To the best of my
          non-expert and incomplete knowledge, the fourth source
          has not yet been subject to judicial review in the
          United States.

          These works are cited to "give credit where credit is
          due". This citation should be viewed as proper
          attribution not suggested reading.

          It is my understanding that the recent decision did not
          incriminate presentations of CSS, such as this one, in
          detail and form insufficient to constitute a working
          implementation. But, case law in this area is
          underdeveloped. As the meaning of the law is further
          exposed, we (you and I) may find ourselves unable to
          lawfully distribute or communicate this presentation or
          its content.

          Another note: Take legal advice from a licensed
          attorney, not from me.