|
The Linux hacker's intro to assembly language (Pt. 1) from L33tdawg By: argc Relevance Assembly language has it's many opponents who argue in this day and age of ultra efficient, high level compilers understanding and coding in assembly is a bit antiquated. While it is true that one can produce fast, efficent code without assembly language, knowledge of assembly is absolutely essential in understanding deeper computer architechture. Knowledge of assembly is also vital in reverse engineering(cracking for all you kiddies). And of course debugging high level languages is difficult without assembly. But before you can get to all that stuff, you've got to learn the basics. The Main Course Before we dig in to some source, a little introduction is in order. When you program in assembly language, you have to tell your microprocessor what to do, usually one machine instruction at a time. You are also without alot of the "helping hand" capabilities of alot of high level languages. Also if you're writing executables in a non-secure operating environment(say like dos)you can wreak havok on your hardware if you make mistakes. Luckily, Linux does a decent job of sheilding itself from the mistakes of beginners. The Cpu has extra fast memory spaces inside of it called registers. Cpu registers are really the main workspaces of the processor. The Cpu addresses(locates) the registers by referring to them by name. 386 and later cpu registers are 32 bits in size. There are alot of registers inside of a cpu, but most of them are used for special purposes. The only ones we will be using in these exercises will be EAX, EBX, ECX, EDX. These registers are for general purpose and thus are called general purpose registers. On the Linux platform, you dont get unrestricted access to your processor and hardware. For security reasons, your requests are relayed to the kernel which then performs the requested instructions. The linux kernel uses the c library of functions and system calls to process most requests. To put it more simply, when assembly programming in linux, you set up functions similar to the manner you do in c, and then call the kernel to perform that operation. In order to make this clearer lets write up a simple c program which calls one function and then translate it into assembly code to further illustrate the point. /* wrote.c--sample c prog */ main() { char *buf = "This is your stringn"; write(1,buf,20); } All this prog does is display the string "This is your string" on the screen. It uses the write() c function. In order to write equivalent assembly code, a little more research is in order. To execute a system call in assembly, you must set up your registers correctly and tell the kernel what system call you would like to execute. System calls are given specific numbers so you can load a register with the corresponding system call number. Then call the kernel and the kernel will know which call to execute. A list of system call numbers are listed in the file "/usr/include/asm/unistd.h". If you look in that file you will see that the write() call has a number 4 next to it. With this information we can begin to construct an equivalent assembly program. The cpu's general purpose registers must be loaded with the parameters of the functions before calling the kernel, but there is one catch. In assembly they must be loaded in reverse order. Investigating the write() syscall by doing a "man 2 write" on the command line yields the following info regarding the parameters. ssize_t write(int fd, const void *buf, size_t count); This syscall will write buffer *buf of size count to the file descriptor fd. Here is the equivalent assembly code to the c program above. /* wrote.s */ .data msg: .string "This is the stringn" .text global _start _start: movl $20, %edx ;move the byte count into edx movl $msg, %ecx ;move the pointer to your string into ecx movl $1, %ebx ;move the file descriptor into ebx in this case its 1(stdout) movl $4, %eax ;move 4 into eax. this is the system call number for the kernel int $0x80 ;call the kernel with this instruction movl $0, %ebx ;load ebx with zero, as per the exit() syscall movl $1, %eax ;exit system call int $0x80 Exact language syntax differs with whatever assembler you use. In these exercise we use the GNU gas assembler. The reasoning is simple. Everyone that has gcc has this assemlber. Save the above source as "wrote.s". To compile this prog type "as -o wrote.o wrote.s" without the quotes. Then it must be linked. I'm not going to cover what that is right now, just think of it as an extra step. To do that type "ld -o wrote wrote.o". That will leave you with a running program. To sum up the program flow, you take the function parameters, load them into the registers in backwards order, call the kernel, load the registers for the "exit" system call, and call the kernel one last time. When using gas, variables begin with a dollar sign. Registers begin with a percent sign. The words that begin with a "." as in ".data" denote sections of the program. Those will be discussed further in the next article. For now, just put 'em in. The "movl" instruction moves data from one place to another. The "l" at the end of it stands for "long" which means the item being moved is a 32 bit quantity. The first operand is the data to be moved and second operand is the destination. The "int" instruction stands for "interrupt" and generally is used to interrupt program flow for some reason which is specified by the number next to it. In this instance, the "$0x80" is the number for a kernel call. Like stated earlier the function parameters are loaded in reverse order(from right to left). And there you have a simple prog which writes your specified string on the screen. This can obviously be done in c, but youve learned a little more about your system internals, and thats the important part. In my next article i will walk you through an assembly program which does something more useful and elaborate on more aspects of assembly language.