This approach is favored by security researchers for two reasons. Firstly, it eliminates the need to dig into the documentation, understand the API offered by the underlying library, and then write custom code to stress-test the parser in a more direct way. Secondly, it makes the fuzzing process repeatable and robust: the program is running in a separate process and is restarted with every input file, so you do not have to worry about a random memory corruption bug in the library clobbering the state of the fuzzer itself, or having weird side effects on subsequent runs of the tested tool.
Unfortunately, there is also a problem: especially for simple libraries, you may end up spending most of the time waiting for execve(), the linker, and all the library initialization routines to do their job. I’ve been thinking of ways to minimize this overhead in american fuzzy lop, but most of the ideas I had were annoyingly complicated. For example, it is possible to write a custom ELF loader and execute the program in-process while using mprotect() to temporarily lock down the memory used by the fuzzer itself – but things such as signal handling would be a mess. Another option would be to execute in a single child process, make a snapshot of the child’s process memory and then “rewind” to that image later on via /proc/pid/mem – but likewise, dealing with signals or file descriptors would require a ton of fragile hacks.
Luckily, Jann Horn figured a different, much simpler approach, and sent me a patch for afl out of the blue 🙂 It boils down to injecting a small piece of code into the fuzzed binary – a feat that can be achieved via LD_PRELOAD, via PTRACE_POKETEXT, via compile-time instrumentation, or simply by rewriting the ELF binary ahead of the time. The purpose of the injected shim is to let execve() happen, get past the linker (ideally with LD_BIND_NOW=1, so that all the hard work is done beforehand), and then stop early on in the actual program, before it gets to processing any inputs generated by the fuzzer or doing anything else of interest. In fact, in the simplest variant, we can simply stop at main().
Once the designated point in the program is reached, our shim simply waits for commands from the fuzzer; when it receives a “go” message, it calls fork() to create an identical clone of the already-loaded program; thanks to the powers of copy-on-write, the clone is created very quickly yet enjoys a robust level of isolation from its older twin. Within the child process, the injected code returns control to the original binary, letting it process the fuzzer-supplied input data (and suffer any consequences of doing so). Within the parent, the shim relays the PID of the newly-crated process to the fuzzer and goes back to the command-wait loop.
Of course, when you start dealing with process semantics on Unix, nothing is as easy as it appears at first sight; here are some of the gotchas we had to work around in the code:
- File descriptor offsets are shared between processes created with fork(). This means that any descriptors that are open at the time that our shim is executed may need to be rewound to their original position; not a significant concern if we are stopping at main() – we can just as well rewind stdin by doing lseek() in the fuzzer itself, since that’s where the descriptor originates – but it can become a hurdle if we ever aim at locations further down the line.
- In the same vein, there are some types of file descriptors we can’t fix up. The shim needs to be executed before any access to pipes, character devices, sockets, and similar non-resettable I/O. Again, not a big concern for main().
- The task of duplicating threads is more complicated and would require the shim to keep track of them all. So, in simple implementations, the shim needs to be injected before any additional threads are spawned in the binary. (Of course, threads are rare in file parser libraries, but may be more common in more heavyweight tools.)
- The fuzzer is no longer an immediate parent of the fuzzed process, and as a grandparent, it can’t directly use waitpid(); there is also no other simple, portable API to get notified about the process’ exit status. We fix that simply by having the shim do the waiting, then send the status code to the fuzzer. In theory, we should simply call the clone() syscall with the CLONE_PARENT flag, which would make the new process “inherit” the original PPID. Unfortunately, calling the syscall directly confuses glibc, because the library caches the result of getpid() when initializing – and without a way to make it reconsider, PID-dependent calls such as abort() or raise() will go astray. There is also a library wrapper for the clone() call that does update the cached PID – but the wrapper is unwieldy and insists on messing with the process’ stack.
(To be fair, PTRACE_ATTACH offers a way to temporarily adopt a process and be notified of its exit status, but it also changes process semantics in a couple of ways that need a fair amount of code to fully undo.)
Even with the gotchas taken into account, the shim isn’t complicated and has very few moving parts – a welcome relief compared to the solutions I had in mind earlier on. It reads commands via a pipe at file descriptor 198, uses fd 199 to send messages back to parent, and does just the bare minimum to get things sorted out. A slightly abridged verion of the code is:
__afl_forkserver: /* Phone home and tell the parent that we're OK. */ pushl $4 /* length */ pushl $__afl_temp /* data */ pushl $199 /* file desc */ call write addl $12, %esp __afl_fork_wait_loop: /* Wait for parent by reading from the pipe. This will block until the parent sends us something. Abort if read fails. */ pushl $4 /* length */ pushl $__afl_temp /* data */ pushl $198 /* file desc */ call read addl $12, %esp cmpl $4, %eax jne __afl_die /* Once woken up, create a clone of our process. */ call fork cmpl $0, %eax jl __afl_die je __afl_fork_resume /* In parent process: write PID to pipe, then wait for child. Parent will handle timeouts and SIGKILL the child as needed. */ movl %eax, __afl_fork_pid pushl $4 /* length */ pushl $__afl_fork_pid /* data */ pushl $199 /* file desc */ call write addl $12, %esp pushl $2 /* WUNTRACED */ pushl $__afl_temp /* status */ pushl __afl_fork_pid /* PID */ call waitpid addl $12, %esp cmpl $0, %eax jle __afl_die /* Relay wait status to pipe, then loop back. */ pushl $4 /* length */ pushl $__afl_temp /* data */ pushl $199 /* file desc */ call write addl $12, %esp jmp __afl_fork_wait_loop __afl_fork_resume: /* In child process: close fds, resume execution. */ pushl $198 call close pushl $199 call close addl $8, %esp ret
But, was it worth it? The answer is a resounding “yes”: the stop-at-main() logic, already shipping with afl 0.36b, can speed up the fuzzing of many common image libraries by a factor of two or more. It’s actually almost unexpected, given that we still keep doing fork(), a syscall with a lingering reputation for being very slow.
The next challenge is devising a way to move the shim down the stream, so that we can also skip any common program initialization steps, such as reading config files – and stop just few instructions shy of the point where the application tries to read the mutated data we are messing with. Jann’s original patch has a solution that relies on ptrace() to detect file access; but we’ve been brainstorming several other ways.
PS. On a related note, some readers might enjoy this.