------------------------------------------------------------------------
Lecture 33 (Wednesday, December 6, 2000)
Course: 15-412 Operating Systems: Design and Implementation
Lecturer: Gregory Kesden
Slides Used In Today's Lecture
CSS (ppt)
Content Scrambling System (CSS): Introduction
You may recently have heard about the Content Scrambling System
(CSS) or CSS-compatible open-source software known as DeCSS. CSS,
which includes both player-host mutual authentication and data
encryption, is used to protect the content of DVDs from piracy
and to enforce region-based viewing restrictions. It may also
have other purposes and certainly has other side-effects. DeCSS,
which is open source, allows Linux-based systems to access the
content of DVDs by emulating a licensed player and performing the
authentication and decryption.
The national attention is the result of a recent federal circuit
court ruling, in which the court held, among other things, that
the distribution of the DeCSS source code violates the Digital
Millenium Copyright Act (DMCA). It is my understanding that this
ruling holds that it is illegal to distribute CSS compatible
source code, including by URL reference ("link") to the same.
Others, including myself, believe that there exists a very clear
distinction between human ideas expressed unambiguously and
concisely in source code and the machine that exists after this
source code is compiled or interpreted, and made runnable within
a machine's memory. I believe that ideas expressed in this source
code are Constitutionally protected speech, and that only the
executible machine may be considered a "circumvention
technology". As we discussed on the first day of class, "A
program is a specification, whereas a process is an instance of a
program in execution."
Furthermore, I believe that reverse-engineering for the purposes
of ensuring compatibility, as a matter of public policy, and as a
matter of the DMCA, should be and is protected. I believe that
DeCSS, the product of reverse-engineering, is nothing more than a
tool which allows Linux boxes to run .VOB programs on equipment
other than that licensed by the monopoly DVD Copy Control
Association. But my opinion on this issue is uneducated and
unreliable -- I am not an attorney, and at least one court seems
to have taken a different view.
Although I may disagree with the law, I cannot stand in front of
this classroom and violate it or suggest that you violate it.
Instead, I encourage you to obey it -- and to act to change it if
you disagree with it.
As we'll discuss shortly, CSS is based on two Linear Feedback
Shift Registers (LFSRs). Although very efficient in hardware,
LFSRs are somewhat inefficient to implement in software, and
there are some interesting programming techniques that are used
to speed up the process. Unfortunately, I do not believe it is
lawful for me to show that code, or equivalent code, to you today
-- for that reason, I will not. Unfortunately, the same concern
leads me to suggest that you refrain from reviewing the DeCSS
source code after this lecture. Unlike my suggestion to review an
implementation of DES, obtaining source code for DeCSS may well
violate federal law. So today, you will literally get, "As much
of an education as the law allows -- and no more."
But don't take anything I've said as legal advice -- I am a
teacher, not a lawyer. Ask an experienced and licensed attorney,
if you have any concerns.
So, without further delay -- let's dig in and take a look at CSS
as an example of a stream cipher and an authentication protocol.
System Overview
In our discussion of CSS, we are going to look at the system used
to play DVDs in terms of three components: the DVD itself, the
DVD player that reads the disk and delivers the content, and the
host (computer, host board, &c).
The DVD disk itself contains the encrypted content, as well as a
hidden area. It is my understanding that commerciallyt writeable
DVDs already have this area marked, so that they cannot write to
it. The contents of this hidden area cannot be delivered, except
to an authenticated device. Presumedly, any device which can
authenticate has been licensed by the DVD Copy Control
Association, and as a consequence is trusted to receive the
information. This hidden area contains the several pieces of
information that we will soon discuss: a table of encrypted disk
keys, and an encrypted disk key (disk key hash).
The player itself stores the player keys that are used to decrypt
the disk key (more later), the region code that identifies the
region in which the player should be used, and another secret
that is used for authentication with the host.
The host seems to contain a secret that is used for
authentication. It seems that this isn't a public key encryption
scheme, but rather a private key scheme. We'll discuss
authentication more shortly.
[Image]
Region Code
One other detail:
* Each DVD contains a region code that indicates the region of
the world in which it is intended to be viewed.
* Each player knows the region in which it was to be sold.
* If the region code of the player doesn't match the region
code on the DVD, the player won't deliver the data.
* This is to help the MPAA ensure that DVDs don't leak out
into parts of the world ahead of the "first showing", &c.
Overview of Keys
Authentication Key
* This "secret" is used as part of the mutual
authentication process.
Session Key (Bus Key)
* This key is negotiated during authentication and
is used to encrypt the title and disk keys before
sending them over the unprotected bus. The
encryption is necessary to prevent eavesdropping.
Player Key
* This key is Licensed by the DVD Copy Control
Association to the manufacturer of a DVD player.
It is stored within the player. It is used to
establish the trustworthiness of the player. It is
used to decrypt the disk key.
Disk Key
* This key is used to encrypt title key. It is
decrypted using the player key.
Sector Key
* Each sector has a 128-byte plain-text header.
Bytes 80 - 84 of each sector's header contain an
additional key used to encode the data within the
sector.
Title Key
* This key is XORed with a per-sector key to encrypt
the data within a sector
Overview of the Process
Step 1: Mutual Authentication
* The host and the drive use a challenge-response
system to establish their trustworthiness to each
other. In the process, they negotiate a session
key.
Step 2: Decoding disk
* The DVD player tries each of several player keys
until it can decode the disk key. The disk key is
a disk-wide secret.
Step 3: Send disk and title keys
* The title and bus keys are sent from the player to
the host. The session key is used to encrypt the
title and disk keys in transit to prevent a
man-in-the-middle attack.
Step 4:
* The DVD player sends a sector to the host.
Step 5:
* The host decodes the title key using the disk key.
Step 6:
* The host decodes the sector using the title key,
and a the sector key in the sector's header.
Disk and Player Keys
As we discussed, each player has a small number of
licensed player keys. These keys can be used to decrpyt
the disk key on a particular DVD. This disk key is used
to decrypt title keys on the disk. Each work on the
disk is encrypted with a title key. So in order to
decrpyt the work, we must begin by decrypting the disk
key.
This disk key is stored on the hidden sector of the DVD
along with a a table containing the disk key encrypted
will each of the 409 possible player keys. It also
holds the disk key encrypted with the disk key.
The player decrypts the appropriate entry in the table
and then verifies that it has correctly decoding the
disk key, by decoding the encrypted disk key. The
result, should be the disk key. That is to say that the
decryption of the disk key, using the disk key, should
prove to be the identity function. Players have more
than one player key, so if the operation fails, they
try again with an alternate.
Linear Feedback Shift Registers (LFSRs) and Encrpytion
So, let's begin to talk about how the encryption works.
We're going to begin with a little bit of background,
then look at how data is encrypted, and then look at
how keys, such as the title key above, are encrypted.
One technique used to encode a stream is to XOR it with
a pseudo-random bit stream. If this random-looking bit
stream can be regenerated by the receiver of the
message, the receiver will be able to decode the
message by repeating the XOR operation.
The LFSR is one popular technique for generating a
pseudo-random bit stream. After the LFSR is seeded with
a value, it can be clocked to generate a stream of
bits. Unfortunately, LFSRs aren't truly random. They
are periodic and will eventually repeat. In general,
the larger the LFSR, the greater its period. The period
also depends on the particular configuration of the
LFSR. If the initial value of an LFSR is 0, it will
produce only 0s, this is sometimes called null cycling.
LFSRs are often combined through addition,
multiplexers, or logic gates, to generate less
predictable bit streams.
An LFSR is seeded with an initial value. With each
clock tick, certain tapped bits of the LFRS are
evaluated by a feedback function. The output of this
feedback function is then shifted into the register.
The output of the register is the bit that is shifted
out. This process is illustrated in the slide below:
[Image]
CSS's LFSRs
The CSS algorithm makes use of two LFSRs. The first is
a 17-bit LFSR. Initially, it contains a two byte seed,
with a 1 injected into the fourth bit, for a total of
17 bits. The is placed into the register to prevent
null cycling. The second LFSR operates the same way,
except it holds 25 bits.
Unlike typical LFSR-based stream ciphers, CSS throws
away the bit that is shifted out of each LFSR. Instead,
it considers the output of the feedback function to be
both the input to the LFSR and the output.
CSS uses a 40-bit, or 5 byte key. This is explains the
size of the two registers: one is seeded with the first
two bytes of the key, and the other the remaining three
bytes of the key.
The two LFSRs are shown in the slides below:
[Image]
[Image]
LFSR Addition
The output from the two LFSRs is combined using 8-bit
addition. After each LFSR clocks out 8 bits of output,
this output is added to form an output byte. The carry
out from this addition is used as the carry in for the
addition yielding the next output byte. It is worth
noting that this is a pretty week way of using the
LFSRs. Other approaches use more LFSRs, and do more
complicated things with them, including clocking them
at different rates, or combining them using
multiplexers -- but not here.
CSS actually has four different modes. Depeding on the
mode, the output of either or both LFSRs may be
bit-wise inverted before the addition. The table below
shows the inverter settings for each mode:
Invert Output of LFSR?
Mode LFSR-17 LFSR-25
AuthenticationYes No
Session Key No No
Title Key No Yes
Data Yes No
The slide below shows the process, including the LFSRs,
inverters, and addition. Please remember that each
inverter is only enabled in those modes noted as "yes"
in the table above:
[Image]
Data Encryption/Decryption
To encrypt or decrypt data, each LFSR is seeded with a
portion of the title key. LFSR-17 is seeded with bytes
0 and 1 and LFSR-25 is seeded with bytes 2, 3, and 4.
These bytes are seeded with a nonce that I called the
sector key that is read from each sector.
The sector key is stored in bytes 80-84 of the sector.
The first 128 bytes of each sector, the sector header,
which includes the sector key, is plain text. The first
two bytes (0 and 1) of the title key are XORed with the
first two bytes of the sector (80 and 81), before
seeding LFSR-17. Similarly, bytes 2-4 of the title key
are XORed with bytes 82-84 of the sector, before
seeding LFSR-25. Please also remember that a "1" is
injected into each seed at bit 4, to make the seeds 17
and 25 bits, respectively.
Once the LFSRs are seeded, their output can be added
together as described above, to form the pseudo-random
bit stream. This bit stream is XORed with the
plaintext, to generate the ciphertext. Much as was the
case with DES, bytes of the plaintext are run through a
table-based S-box prior to the XOR operation. Upon
decoding, this operation is reversed. Although the
initial permutation substition in DES was performed to
improve the runtime of DES on 8-bit machines, the
reason for this substitution is unclear to me. It
doesn't appear to me to improve either the runtime or
the strength of CSS -- but I could be wrong.
The whole process is shown below and can be used for
encoding or decoding:
[Image]
Key Encryption
In addition to the encryption described above, CSS goes
through an additional two-step mangling operation in
the case of keys, including the title key and session
key. This process is shown in the slide below.
If we view this slide in terms of columns, each column
represents one byte of the key. Lk is the output of the
encryption step for the byte represented by the column.
The output of the first stage of the mangling function
feeds the input of the second stage. The first and
second stage are otherwise identical, except for the
fact that the output of the 4th byte of the second
stage does not wrap around and feed the XOR in the
first column.
[Image]
Mutual Authentication
Before the DVD player will begin to send data over the
bus to the host, it first goth through a form of weak
mutual authentication with the host. In the process, it
negotiates a key for use in encrypting the data in
transit over the bus. This encrpytion is necessary
because it would otherwise be possible to snoop the
plaintext data right off of the bus, rendering the
prior encryption virtually useless. The key that is
negotiated is known as the session key or bus key.
The negotiation begins when the host requests an
Authentication Grant ID (AGID) from the drive. This ID
is much like a session ID or a thread ID. It gives a
name to this particular negotiation.
The next thing that happens is the host generates an
arbitrary stream of bytes called a nonce or challenge
and sends it to the drive. The drive then encrpyts this
stream of bytes and sends them back to the host. The
host then decrypts the byte stream and ensures that it
is correct. It assumes that the drive is authentic,
because it knew the correct secret and algorithm to
encode the nonce.
The host performs exactly ther same operation. It
generates a nonce, encrpyts it, and sends it to the
host. The host in turn encrypts the nonce and sends it
back to the drive. The drive then decrypts the nonce
and makes sure that it is in fact correct. At this
point, both the host and the drive trust each other.
This seems to be a fairly weak authentication scheme,
because it is based on a secret private key. But this
key really can't be all that secret, since it is
presumedly in the firmware inside of every DVD player
and drive.
Regardless, both the host and the drive now know each
other's nonces. Each then takes the two nonces,
combines them, and encrypts them as described earlier.
The result is the bus key, a.k.a. session key. This key
is used to encrpyt all data sent between the drive and
the player. Since only the player and the host know the
nonces which change every session, only the player and
the host can generate the key needed to decrpyt the
data.
The only other important note is that, during
encrpytion and decryption, a different substitution
table is used to perform the initial substition for
each of the keys.
[Image]
Weakness #1: Brute Force
Now that we've discussed the CSS algorithm, let's see
what we can learn from its weaknesses.
The first thing to note is that the key is only 40 bits
long. This isn't a terrbily long key -- it is 16 bits
narrower than the DES key. As we discussed last class,
even 56 bits can fall somewhat quickly to a prute force
attack. The 40 bit length was likely selected to
satisfy U.S. export regulations -- but that came at a
price.
Weakness #2: 6 Bytes of LFSR Output
The second attack that we are going to talk about
requires 6 bytes of LFSR output. It isn't a terribly
useful attack, since we don't usually happen to have
six bytes hanging around, but it is interesting to talk
about, since it provides a 216 attack on the encryption
algorithm. In other words, it allows us to crack the
whole 40-bit key, if we have 6 bytes of output and
crack the 16-bit (plus 1) register by brute force.
Here's how it works:
1. Take a guess at the initial state of LFSR-17.
2. Clock out 4 bytes.
3. Use those 4 bytes to determine the corresponding 4
bytes of output from LFSR-25. This isn't hard,
since the two are added -- just subtract.
4. Use the LFSR-25 output to determine LFSR-25's
state.
5. Clock out 2 bytes on both LFSRs.
6. Verify these two bytes. Celebrate or guess again.
Weakness #2: 5 Bytes of LFSR Output
Another attack is possible in 2 time, if we know only 5
bytes of output. As you'll see soon, this is a much
more practical weakness, because there is yet another
weakness that can give us 5 bytes.
1. Guess the initial state of LFSR-17
2. Clock out 3 bytes
3. Determine the corresponding output bytes from
LFSR-25
4. This reveals all but the highest-order bit of
LFSR-25
5. Try both possibilities:
1. Clock back 3 bytes
2. Select the setting where bit 4 is 1 (remember
this is the initial case).
3. It is possible that both satisfy this -- try
both.
6. Verify as before
Weakness #3: Mangled Output
This attack can recover 5 bytes of the output of the
LFSRs, given both the ciphertext and the plaintext.
This 5 bytes can then be used as the 5 output bytes
needed for the attack above. Recall the mangling
function we talked about earlier. This attack is based
on taking a guess and reversing that function.
1. Guess Lk4
2. Work backward and verify input byte
3. This is a 28 attack.
4. Repeat for all 5 bytes -- this gives you the 5
bytes of known output for prior weakness.
Weakness #5: Attacking the Disk Key Hash
There is also a known attack that can recover the disk
key from the disk key has in 225 time. This attack is a
bit complex, so we won't discuss it. The important
observation is that the existence of a 225 attack
demonstrates a weakness in the algorithm -- that is a
long way from 2.
References
* Axboe, Jens, dvd-2.2.13-5 Linux patch, 1999.
* Fawcus, D. and Roberts, Mark, css-auth package,
December, 1999.
* Schneider, Bruce, Applied Cryptography, 2ed,
Wiley, 1996, p. 372-379.
* Stevenson, Frank A., "Cryptanalysis of Content
Scrambling System", 8 Nov. 1999, as updated 13
Nov. 1999.
Please note:
You should be aware that, in light of a recent federal
circuit court decision, it is probably unlawful for you
to obtain the the first two sources. To the best of my
non-expert and incomplete knowledge, the fourth source
has not yet been subject to judicial review in the
United States.
These works are cited to "give credit where credit is
due". This citation should be viewed as proper
attribution not suggested reading.
It is my understanding that the recent decision did not
incriminate presentations of CSS, such as this one, in
detail and form insufficient to constitute a working
implementation. But, case law in this area is
underdeveloped. As the meaning of the law is further
exposed, we (you and I) may find ourselves unable to
lawfully distribute or communicate this presentation or
its content.
Another note: Take legal advice from a licensed
attorney, not from me.
TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2025 AOH