TUCoPS :: Linux :: Discontinued

TUCoPS :: Linux :: Discontinued :: linuxasm.txt
The Linux Hacker's Intro to Assembly Language


The Linux hacker's intro to assembly language (Pt. 1) from L33tdawg

By: argc

Relevance

Assembly language has it's many opponents who argue in this day and age
of ultra efficient, high level compilers understanding and coding in
assembly is a bit antiquated. While it is true that one can produce
fast, efficent code without assembly language, knowledge of assembly is
absolutely essential in understanding deeper computer architechture.
Knowledge of assembly is also vital in reverse engineering(cracking for
all you kiddies). And of course debugging high level languages is
difficult without assembly. But before you can get to all that stuff,
you've got to learn the basics.

The Main Course

Before we dig in to some source, a little introduction is in order. When
you program in assembly language, you have to tell your microprocessor
what to do, usually one machine instruction at a time. You are also
without alot of the "helping hand" capabilities of alot of high level
languages. Also if you're writing executables in a non-secure operating
environment(say like dos)you can wreak havok on your hardware if you
make mistakes. Luckily, Linux does a decent job of sheilding itself from
the mistakes of beginners.

The Cpu has extra fast memory spaces inside of it called registers. Cpu
registers are really the main workspaces of the processor. The Cpu
addresses(locates) the registers by referring to them by name. 386 and
later cpu registers are 32 bits in size. There are alot of registers
inside of a cpu, but most of them are used for special purposes. The
only ones we will be using in these exercises will be EAX, EBX, ECX,
EDX. These registers are for general purpose and thus are called general
purpose registers.

On the Linux platform, you dont get unrestricted access to your
processor and hardware. For security reasons, your requests are relayed
to the kernel which then performs the requested instructions. The linux
kernel uses the c library of functions and system calls to process most
requests. To put it more simply, when assembly programming in linux, you
set up functions similar to the manner you do in c, and then call the
kernel to perform that operation. In order to make this clearer lets
write up a simple c program which calls one function and then translate
it into assembly code to further illustrate the point.

/* wrote.c--sample c prog */

main() {

char *buf = "This is your stringn";

write(1,buf,20);

}

All this prog does is display the string "This is your string" on the
screen. It uses the write() c function. In order to write equivalent
assembly code, a little more research is in order. To execute a system
call in assembly, you must set up your registers correctly and tell the
kernel what system call you would like to execute. System calls are
given specific numbers so you can load a register with the corresponding
system call number. Then call the kernel and the kernel will know which
call to execute. A list of system call numbers are listed in the file
"/usr/include/asm/unistd.h". If you look in that file you will see that
the write() call has a number 4 next to it. With this information we can
begin to construct an equivalent assembly program. The cpu's general
purpose registers must be loaded with the parameters of the functions
before calling the kernel, but there is one catch. In assembly they must
be loaded in reverse order. Investigating the write() syscall by doing a
"man 2 write" on the command line yields the following info regarding
the parameters.

ssize_t write(int fd, const void *buf, size_t count);

This syscall will write buffer *buf of size count to the file descriptor
fd. Here is the equivalent assembly code to the c program above.

/* wrote.s */

.data

msg:

.string "This is the stringn"

.text

global _start

_start:

movl $20, %edx ;move the byte count into edx

movl $msg, %ecx ;move the pointer to your string into ecx

movl $1, %ebx ;move the file descriptor into ebx in this case its 1(stdout)

movl $4, %eax ;move 4 into eax. this is the system call number for the kernel

int $0x80 ;call the kernel with this instruction

movl $0, %ebx ;load ebx with zero, as per the exit() syscall

movl $1, %eax ;exit system call

int $0x80

Exact language syntax differs with whatever assembler you use. In these
exercise we use the GNU gas assembler. The reasoning is simple. Everyone
that has gcc has this assemlber. Save the above source as "wrote.s". To
compile this prog type "as -o wrote.o wrote.s" without the quotes. Then
it must be linked. I'm not going to cover what that is right now, just
think of it as an extra step. To do that type "ld -o wrote wrote.o".
That will leave you with a running program.

To sum up the program flow, you take the function parameters, load them
into the registers in backwards order, call the kernel, load the
registers for the "exit" system call, and call the kernel one last time.
When using gas, variables begin with a dollar sign. Registers begin with
a percent sign. The words that begin with a "." as in ".data" denote
sections of the program. Those will be discussed further in the next
article. For now, just put 'em in. The "movl" instruction moves data
from one place to another. The "l" at the end of it stands for "long"
which means the item being moved is a 32 bit quantity. The first operand
is the data to be moved and second operand is the destination. The "int"
instruction stands for "interrupt" and generally is used to interrupt
program flow for some reason which is specified by the number next to
it. In this instance, the "$0x80" is the number for a kernel call. Like
stated earlier the function parameters are loaded in reverse order(from
right to left). And there you have a simple prog which writes your
specified string on the screen. This can obviously be done in c, but
youve learned a little more about your system internals, and thats the
important part. In my next article i will walk you through an assembly
program which does something more useful and elaborate on more aspects
of assembly language.