OLD | NEW |
(Empty) | |
| 1 ================== |
| 2 american fuzzy lop |
| 3 ================== |
| 4 |
| 5 Written and maintained by Michal Zalewski <lcamtuf@google.com> |
| 6 |
| 7 Copyright 2013, 2014, 2015, 2016 Google Inc. All rights reserved. |
| 8 Released under terms and conditions of Apache License, Version 2.0. |
| 9 |
| 10 For new versions and additional information, check out: |
| 11 http://lcamtuf.coredump.cx/afl/ |
| 12 |
| 13 To compare notes with other users or get notified about major new features, |
| 14 send a mail to <afl-users+subscribe@googlegroups.com>. |
| 15 |
| 16 ** See QuickStartGuide.txt if you don't have time to read this file. ** |
| 17 |
| 18 1) Challenges of guided fuzzing |
| 19 ------------------------------- |
| 20 |
| 21 Fuzzing is one of the most powerful and proven strategies for identifying |
| 22 security issues in real-world software; it is responsible for the vast |
| 23 majority of remote code execution and privilege escalation bugs found to date |
| 24 in security-critical software. |
| 25 |
| 26 Unfortunately, fuzzing is also relatively shallow; blind, random mutations |
| 27 make it very unlikely to reach certain code paths in the tested code, leaving |
| 28 some vulnerabilities firmly outside the reach of this technique. |
| 29 |
| 30 There have been numerous attempts to solve this problem. One of the early |
| 31 approaches - pioneered by Tavis Ormandy - is corpus distillation. The method |
| 32 relies on coverage signals to select a subset of interesting seeds from a |
| 33 massive, high-quality corpus of candidate files, and then fuzz them by |
| 34 traditional means. The approach works exceptionally well, but requires such |
| 35 a corpus to be readily available. In addition, block coverage measurements |
| 36 provide only a very simplistic understanding of program state, and are less |
| 37 useful for guiding the fuzzing effort in the long haul. |
| 38 |
| 39 Other, more sophisticated research has focused on techniques such as program |
| 40 flow analysis ("concolic execution"), symbolic execution, or static analysis. |
| 41 All these methods are extremely promising in experimental settings, but tend |
| 42 to suffer from reliability and performance problems in practical uses - and |
| 43 currently do not offer a viable alternative to "dumb" fuzzing techniques. |
| 44 |
| 45 2) The afl-fuzz approach |
| 46 ------------------------ |
| 47 |
| 48 American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple |
| 49 but rock-solid instrumentation-guided genetic algorithm. It uses a modified |
| 50 form of edge coverage to effortlessly pick up subtle, local-scale changes to |
| 51 program control flow. |
| 52 |
| 53 Simplifying a bit, the overall algorithm can be summed up as: |
| 54 |
| 55 1) Load user-supplied initial test cases into the queue, |
| 56 |
| 57 2) Take next input file from the queue, |
| 58 |
| 59 3) Attempt to trim the test case to the smallest size that doesn't alter |
| 60 the measured behavior of the program, |
| 61 |
| 62 4) Repeatedly mutate the file using a balanced and well-researched variety |
| 63 of traditional fuzzing strategies, |
| 64 |
| 65 5) If any of the generated mutations resulted in a new state transition |
| 66 recorded by the instrumentation, add mutated output as a new entry in the |
| 67 queue. |
| 68 |
| 69 6) Go to 2. |
| 70 |
| 71 The discovered test cases are also periodically culled to eliminate ones that |
| 72 have been obsoleted by newer, higher-coverage finds; and undergo several other |
| 73 instrumentation-driven effort minimization steps. |
| 74 |
| 75 As a side result of the fuzzing process, the tool creates a small, |
| 76 self-contained corpus of interesting test cases. These are extremely useful |
| 77 for seeding other, labor- or resource-intensive testing regimes - for example, |
| 78 for stress-testing browsers, office applications, graphics suites, or |
| 79 closed-source tools. |
| 80 |
| 81 The fuzzer is thoroughly tested to deliver out-of-the-box performance far |
| 82 superior to blind fuzzing or coverage-only tools. |
| 83 |
| 84 3) Instrumenting programs for use with AFL |
| 85 ------------------------------------------ |
| 86 |
| 87 When source code is available, instrumentation can be injected by a companion |
| 88 tool that works as a drop-in replacement for gcc or clang in any standard build |
| 89 process for third-party code. |
| 90 |
| 91 The instrumentation has a fairly modest performance impact; in conjunction with |
| 92 other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast |
| 93 or even faster than possible with traditional tools. |
| 94 |
| 95 The correct way to recompile the target program may vary depending on the |
| 96 specifics of the build process, but a nearly-universal approach would be: |
| 97 |
| 98 $ CC=/path/to/afl/afl-gcc ./configure |
| 99 $ make clean all |
| 100 |
| 101 For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++. |
| 102 |
| 103 The clang wrappers (afl-clang and afl-clang++) can be used in the same way; |
| 104 clang users may also opt to leverage a higher-performance instrumentation mode, |
| 105 as described in llvm_mode/README.llvm. |
| 106 |
| 107 When testing libraries, you need to find or write a simple program that reads |
| 108 data from stdin or from a file and passes it to the tested library. In such a |
| 109 case, it is essential to link this executable against a static version of the |
| 110 instrumented library, or to make sure that the correct .so file is loaded at |
| 111 runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static |
| 112 build, usually possible via: |
| 113 |
| 114 $ CC=/path/to/afl/afl-gcc ./configure --disable-shared |
| 115 |
| 116 Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to |
| 117 automatically enable code hardening options that make it easier to detect |
| 118 simple memory bugs. |
| 119 |
| 120 PS. ASAN users are advised to review notes_for_asan.txt file for important |
| 121 caveats. |
| 122 |
| 123 4) Instrumenting binary-only apps |
| 124 --------------------------------- |
| 125 |
| 126 When source code is *NOT* available, the fuzzer offers experimental support for |
| 127 fast, on-the-fly instrumentation of black-box binaries. This is accomplished |
| 128 with a version of QEMU running in the lesser-known "user space emulation" mode. |
| 129 |
| 130 QEMU is a project separate from AFL, but you can conveniently build the |
| 131 feature by doing: |
| 132 |
| 133 $ cd qemu_mode |
| 134 $ ./build_qemu_support.sh |
| 135 |
| 136 For additional instructions and caveats, see qemu_mode/README.qemu. |
| 137 |
| 138 The mode is approximately 2-5x slower than compile-time instrumentation, is |
| 139 less conductive to parallelization, and may have some other quirks. |
| 140 |
| 141 5) Choosing initial test cases |
| 142 ------------------------------ |
| 143 |
| 144 To operate correctly, the fuzzer requires one or more starting file that |
| 145 contains a good example of the input data normally expected by the targeted |
| 146 application. There are two basic rules: |
| 147 |
| 148 - Keep the files small. Under 1 kB is ideal, although not strictly necessary. |
| 149 For a discussion of why size matters, see perf_tips.txt. |
| 150 |
| 151 - Use multiple test cases only if they are functionally different from |
| 152 each other. There is no point in using fifty different vacation photos |
| 153 to fuzz an image library. |
| 154 |
| 155 You can find many good examples of starting files in the testcases/ subdirectory |
| 156 that comes with this tool. |
| 157 |
| 158 PS. If a large corpus of data is available for screening, you may want to use |
| 159 the afl-cmin utility to identify a subset of functionally distinct files that |
| 160 exercise different code paths in the target binary. |
| 161 |
| 162 6) Fuzzing binaries |
| 163 ------------------- |
| 164 |
| 165 The fuzzing process itself is carried out by the afl-fuzz utility. This program |
| 166 requires a read-only directory with initial test cases, a separate place to |
| 167 store its findings, plus a path to the binary to test. |
| 168 |
| 169 For target binaries that accept input directly from stdin, the usual syntax is: |
| 170 |
| 171 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...] |
| 172 |
| 173 For programs that take input from a file, use '@@' to mark the location in |
| 174 the target's command line where the input file name should be placed. The |
| 175 fuzzer will substitute this for you: |
| 176 |
| 177 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@ |
| 178 |
| 179 You can also use the -f option to have the mutated data written to a specific |
| 180 file. This is useful if the program expects a particular file extension or so. |
| 181 |
| 182 Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command |
| 183 line) or in a traditional, blind-fuzzer mode (specify -n). |
| 184 |
| 185 You can use -t and -m to override the default timeout and memory limit for the |
| 186 executed process; rare examples of targets that may need these settings touched |
| 187 include compilers and video decoders. |
| 188 |
| 189 Tips for optimizing fuzzing performance are discussed in perf_tips.txt. |
| 190 |
| 191 Note that afl-fuzz starts by performing an array of deterministic fuzzing |
| 192 steps, which can take several days. If you want quick & dirty results right |
| 193 away, akin to zzuf or honggfuzz, add the -d option to the command line. |
| 194 |
| 195 7) Interpreting output |
| 196 ---------------------- |
| 197 |
| 198 See the status_screen.txt file for information on how to interpret the |
| 199 displayed stats and monitor the health of the process. Be sure to consult this |
| 200 file especially if any UI elements are highlighted in red. |
| 201 |
| 202 The fuzzing process will continue until you press Ctrl-C. At minimum, you want |
| 203 to allow the fuzzer to complete one queue cycle, which may take anywhere from a |
| 204 couple of hours to a week or so. |
| 205 |
| 206 There are three subdirectories created within the output directory and updated |
| 207 in real time: |
| 208 |
| 209 - queue/ - test cases for every distinctive execution path, plus all the |
| 210 starting files given by the user. This is the synthesized corpus |
| 211 mentioned in section 2. |
| 212 |
| 213 Before using this corpus for any other purposes, you can shrink |
| 214 it to a smaller size using the afl-cmin tool. The tool will find |
| 215 a smaller subset of files offering equivalent edge coverage. |
| 216 |
| 217 - crashes/ - unique test cases that cause the tested program to receive a |
| 218 fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are |
| 219 grouped by the received signal. |
| 220 |
| 221 - hangs/ - unique test cases that cause the tested program to time out. Note |
| 222 that when default (aggressive) timeout settings are in effect, |
| 223 this can be slightly noisy due to latency spikes and other |
| 224 natural phenomena. |
| 225 |
| 226 Crashes and hangs are considered "unique" if the associated execution paths |
| 227 involve any state transitions not seen in previously-recorded faults. If a |
| 228 single bug can be reached in multiple ways, there will be some count inflation |
| 229 early in the process, but this should quickly taper off. |
| 230 |
| 231 The file names for crashes and hangs are correlated with parent, non-faulting |
| 232 queue entries. This should help with debugging. |
| 233 |
| 234 When you can't reproduce a crash found by afl-fuzz, the most likely cause is |
| 235 that you are not setting the same memory limit as used by the tool. Try: |
| 236 |
| 237 $ LIMIT_MB=50 |
| 238 $ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... ) |
| 239 |
| 240 Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD, |
| 241 also change -Sv to -Sd. |
| 242 |
| 243 Any existing output directory can be also used to resume aborted jobs; try: |
| 244 |
| 245 $ ./afl-fuzz -i- -o existing_output_dir [...etc...] |
| 246 |
| 247 If you have gnuplot installed, you can also generate some pretty graphs for any |
| 248 active fuzzing task using afl-plot. For an example of how this looks like, |
| 249 see http://lcamtuf.coredump.cx/afl/plot/. |
| 250 |
| 251 8) Parallelized fuzzing |
| 252 ----------------------- |
| 253 |
| 254 Every instance of afl-fuzz takes up roughly one core. This means that on |
| 255 multi-core systems, parallelization is necessary to fully utilize the hardware. |
| 256 For tips on how to fuzz a common target on multiple cores or multiple networked |
| 257 machines, please refer to parallel_fuzzing.txt. |
| 258 |
| 259 The parallel fuzzing mode also offers a simple way for interfacing AFL to other |
| 260 fuzzers, to symbolic or concolic execution engines, and so forth; again, see the |
| 261 last section of parallel_fuzzing.txt for tips. |
| 262 |
| 263 9) Fuzzer dictionaries |
| 264 ---------------------- |
| 265 |
| 266 By default, afl-fuzz mutation engine is optimized for compact data formats - |
| 267 say, images, multimedia, compressed data, regular expression syntax, or shell |
| 268 scripts. It is somewhat less suited for languages with particularly verbose and |
| 269 redundant verbiage - notably including HTML, SQL, or JavaScript. |
| 270 |
| 271 To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to |
| 272 seed the fuzzing process with an optional dictionary of language keywords, |
| 273 magic headers, or other special tokens associated with the targeted data type |
| 274 - and use that to reconstruct the underlying grammar on the go: |
| 275 |
| 276 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html |
| 277 |
| 278 To use this feature, you first need to create a dictionary in one of the two |
| 279 formats discussed in testcases/README.testcases; and then point the fuzzer to |
| 280 it via the -x option in the command line. |
| 281 |
| 282 There is no way to provide more structured descriptions of the underlying |
| 283 syntax, but the fuzzer will likely figure out some of this based on the |
| 284 instrumentation feedback alone. This actually works in practice, say: |
| 285 |
| 286 http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html |
| 287 |
| 288 PS. Even when no explicit dictionary is given, afl-fuzz will try to extract |
| 289 existing syntax tokens in the input corpus by watching the instrumentation |
| 290 very closely during deterministic byte flips. This works for some types of |
| 291 parsers and grammars, but isn't nearly as good as the -x mode. |
| 292 |
| 293 10) Crash triage |
| 294 ---------------- |
| 295 |
| 296 The coverage-based grouping of crashes usually produces a small data set that |
| 297 can be quickly triaged manually or with a very simple GDB or Valgrind script. |
| 298 Every crash is also traceable to its parent non-crashing test case in the |
| 299 queue, making it easier to diagnose faults. |
| 300 |
| 301 Having said that, it's important to acknowledge that some fuzzing crashes can be |
| 302 difficult quickly evaluate for exploitability without a lot of debugging and |
| 303 code analysis work. To assist with this task, afl-fuzz supports a very unique |
| 304 "crash exploration" mode enabled with the -C flag. |
| 305 |
| 306 In this mode, the fuzzer takes one or more crashing test cases as the input, |
| 307 and uses its feedback-driven fuzzing strategies to very quickly enumerate all |
| 308 code paths that can be reached in the program while keeping it in the |
| 309 crashing state. |
| 310 |
| 311 Mutations that do not result in a crash are rejected; so are any changes that |
| 312 do not affect the execution path. |
| 313 |
| 314 The output is a small corpus of files that can be very rapidly examined to see |
| 315 what degree of control the attacker has over the faulting address, or whether |
| 316 it is possible to get past an initial out-of-bounds read - and see what lies |
| 317 beneath. |
| 318 |
| 319 Oh, one more thing: for test case minimization, give afl-tmin a try. The tool |
| 320 can be operated in a very simple way: |
| 321 |
| 322 $ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...] |
| 323 |
| 324 The tool works with crashing and non-crashing test cases alike. In the crash |
| 325 mode, it will happily accept instrumented and non-instrumented binaries. In the |
| 326 non-crashing mode, the minimizer relies on standard AFL instrumentation to make |
| 327 the file simpler without altering the execution path. |
| 328 |
| 329 The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with |
| 330 afl-fuzz. |
| 331 |
| 332 Another recent addition to AFL is the afl-analyze tool. It takes an input |
| 333 file, attempts to sequentially flip bytes, and observes the behavior of the |
| 334 tested program. It then color-codes the input based on which sections appear to |
| 335 be critical, and which are not; while not bulletproof, it can often offer quick |
| 336 insights into complex file formats. More info about its operation can be found |
| 337 near the end of technical_details.txt. |
| 338 |
| 339 11) Common-sense risks |
| 340 ---------------------- |
| 341 |
| 342 Please keep in mind that, similarly to many other computationally-intensive |
| 343 tasks, fuzzing may put strain on your hardware and on the OS. In particular: |
| 344 |
| 345 - Your CPU will run hot and will need adequate cooling. In most cases, if |
| 346 cooling is insufficient or stops working properly, CPU speeds will be |
| 347 automatically throttled. That said, especially when fuzzing on less |
| 348 suitable hardware (laptops, smartphones, etc), it's not entirely impossible |
| 349 for something to blow up. |
| 350 |
| 351 - Targeted programs may end up erratically grabbing gigabytes of memory or |
| 352 filling up disk space with junk files. AFL tries to enforce basic memory |
| 353 limits, but can't prevent each and every possible mishap. The bottom line |
| 354 is that you shouldn't be fuzzing on systems where the prospect of data loss |
| 355 is not an acceptable risk. |
| 356 |
| 357 - Fuzzing involves billions of reads and writes to the filesystem. On modern |
| 358 systems, this will be usually heavily cached, resulting in fairly modest |
| 359 "physical" I/O - but there are many factors that may alter this equation. |
| 360 It is your responsibility to monitor for potential trouble; with very heavy |
| 361 I/O, the lifespan of many HDDs and SSDs may be reduced. |
| 362 |
| 363 A good way to monitor disk I/O on Linux is the 'iostat' command: |
| 364 |
| 365 $ iostat -d 3 -x -k [...optional disk ID...] |
| 366 |
| 367 12) Known limitations & areas for improvement |
| 368 --------------------------------------------- |
| 369 |
| 370 Here are some of the most important caveats for AFL: |
| 371 |
| 372 - AFL detects faults by checking for the first spawned process dying due to |
| 373 a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for |
| 374 these signals may need to have the relevant code commented out. In the same |
| 375 vein, faults in child processed spawned by the fuzzed target may evade |
| 376 detection unless you manually add some code to catch that. |
| 377 |
| 378 - As with any other brute-force tool, the fuzzer offers limited coverage if |
| 379 encryption, checksums, cryptographic signatures, or compression are used to |
| 380 wholly wrap the actual data format to be tested. |
| 381 |
| 382 To work around this, you can comment out the relevant checks (see |
| 383 experimental/libpng_no_checksum/ for inspiration); if this is not possible, |
| 384 you can also write a postprocessor, as explained in |
| 385 experimental/post_library/. |
| 386 |
| 387 - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This |
| 388 isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for |
| 389 tips. |
| 390 |
| 391 - There is no direct support for fuzzing network services, background |
| 392 daemons, or interactive apps that require UI interaction to work. You may |
| 393 need to make simple code changes to make them behave in a more traditional |
| 394 way. Preeny may offer a relatively simple option, too - see: |
| 395 https://github.com/zardus/preeny |
| 396 |
| 397 Some useful tips for modifying network-based services can be also found at: |
| 398 https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop |
| 399 |
| 400 - AFL doesn't output human-readable coverage data. If you want to monitor |
| 401 coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov |
| 402 |
| 403 Beyond this, see INSTALL for platform-specific tips. |
| 404 |
| 405 13) Special thanks |
| 406 ------------------ |
| 407 |
| 408 Many of the improvements to afl-fuzz wouldn't be possible without feedback, |
| 409 bug reports, or patches from: |
| 410 |
| 411 Jann Horn Hanno Boeck |
| 412 Felix Groebert Jakub Wilk |
| 413 Richard W. M. Jones Alexander Cherepanov |
| 414 Tom Ritter Hovik Manucharyan |
| 415 Sebastian Roschke Eberhard Mattes |
| 416 Padraig Brady Ben Laurie |
| 417 @dronesec Luca Barbato |
| 418 Tobias Ospelt Thomas Jarosch |
| 419 Martin Carpenter Mudge Zatko |
| 420 Joe Zbiciak Ryan Govostes |
| 421 Michael Rash William Robinet |
| 422 Jonathan Gray Filipe Cabecinhas |
| 423 Nico Weber Jodie Cunningham |
| 424 Andrew Griffiths Parker Thompson |
| 425 Jonathan Neuschfer Tyler Nighswander |
| 426 Ben Nagy Samir Aguiar |
| 427 Aidan Thornton Aleksandar Nikolich |
| 428 Sam Hakim Laszlo Szekeres |
| 429 David A. Wheeler Turo Lamminen |
| 430 Andreas Stieger Richard Godbee |
| 431 Louis Dassy teor2345 |
| 432 Alex Moneger Dmitry Vyukov |
| 433 Keegan McAllister Kostya Serebryany |
| 434 Richo Healey Martijn Bogaard |
| 435 rc0r Jonathan Foote |
| 436 Christian Holler Dominique Pelle |
| 437 Jacek Wielemborek Leo Barnes |
| 438 Jeremy Barnes Jeff Trull |
| 439 Guillaume Endignoux ilovezfs |
| 440 Daniel Godas-Lopez Franjo Ivancic |
| 441 Austin Seipp Daniel Komaromy |
| 442 Daniel Binderman |
| 443 |
| 444 Thank you! |
| 445 |
| 446 14) Contact |
| 447 ----------- |
| 448 |
| 449 Questions? Concerns? Bug reports? The author can be usually reached at |
| 450 <lcamtuf@google.com>. |
| 451 |
| 452 There is also a mailing list for the project; to join, send a mail to |
| 453 <afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse |
| 454 archives first, try: |
| 455 |
| 456 https://groups.google.com/group/afl-users |
| 457 |
| 458 PS. If you wish to submit raw code to be incorporated into the project, please |
| 459 be aware that the copyright on most of AFL is claimed by Google. While you do |
| 460 retain copyright on your contributions, they do ask people to agree to a simple |
| 461 CLA first: |
| 462 |
| 463 https://cla.developers.google.com/clas |
| 464 |
| 465 Sorry about the hassle. Of course, no CLA is required for feature requests or |
| 466 bug reports. |
OLD | NEW |