Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(10)

Side by Side Diff: third_party/afl/src/docs/README

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Fix nits Created 4 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 ==================
2 american fuzzy lop
3 ==================
4
5 Written and maintained by Michal Zalewski <lcamtuf@google.com>
6
7 Copyright 2013, 2014, 2015, 2016 Google Inc. All rights reserved.
8 Released under terms and conditions of Apache License, Version 2.0.
9
10 For new versions and additional information, check out:
11 http://lcamtuf.coredump.cx/afl/
12
13 To compare notes with other users or get notified about major new features,
14 send a mail to <afl-users+subscribe@googlegroups.com>.
15
16 ** See QuickStartGuide.txt if you don't have time to read this file. **
17
18 1) Challenges of guided fuzzing
19 -------------------------------
20
21 Fuzzing is one of the most powerful and proven strategies for identifying
22 security issues in real-world software; it is responsible for the vast
23 majority of remote code execution and privilege escalation bugs found to date
24 in security-critical software.
25
26 Unfortunately, fuzzing is also relatively shallow; blind, random mutations
27 make it very unlikely to reach certain code paths in the tested code, leaving
28 some vulnerabilities firmly outside the reach of this technique.
29
30 There have been numerous attempts to solve this problem. One of the early
31 approaches - pioneered by Tavis Ormandy - is corpus distillation. The method
32 relies on coverage signals to select a subset of interesting seeds from a
33 massive, high-quality corpus of candidate files, and then fuzz them by
34 traditional means. The approach works exceptionally well, but requires such
35 a corpus to be readily available. In addition, block coverage measurements
36 provide only a very simplistic understanding of program state, and are less
37 useful for guiding the fuzzing effort in the long haul.
38
39 Other, more sophisticated research has focused on techniques such as program
40 flow analysis ("concolic execution"), symbolic execution, or static analysis.
41 All these methods are extremely promising in experimental settings, but tend
42 to suffer from reliability and performance problems in practical uses - and
43 currently do not offer a viable alternative to "dumb" fuzzing techniques.
44
45 2) The afl-fuzz approach
46 ------------------------
47
48 American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple
49 but rock-solid instrumentation-guided genetic algorithm. It uses a modified
50 form of edge coverage to effortlessly pick up subtle, local-scale changes to
51 program control flow.
52
53 Simplifying a bit, the overall algorithm can be summed up as:
54
55 1) Load user-supplied initial test cases into the queue,
56
57 2) Take next input file from the queue,
58
59 3) Attempt to trim the test case to the smallest size that doesn't alter
60 the measured behavior of the program,
61
62 4) Repeatedly mutate the file using a balanced and well-researched variety
63 of traditional fuzzing strategies,
64
65 5) If any of the generated mutations resulted in a new state transition
66 recorded by the instrumentation, add mutated output as a new entry in the
67 queue.
68
69 6) Go to 2.
70
71 The discovered test cases are also periodically culled to eliminate ones that
72 have been obsoleted by newer, higher-coverage finds; and undergo several other
73 instrumentation-driven effort minimization steps.
74
75 As a side result of the fuzzing process, the tool creates a small,
76 self-contained corpus of interesting test cases. These are extremely useful
77 for seeding other, labor- or resource-intensive testing regimes - for example,
78 for stress-testing browsers, office applications, graphics suites, or
79 closed-source tools.
80
81 The fuzzer is thoroughly tested to deliver out-of-the-box performance far
82 superior to blind fuzzing or coverage-only tools.
83
84 3) Instrumenting programs for use with AFL
85 ------------------------------------------
86
87 When source code is available, instrumentation can be injected by a companion
88 tool that works as a drop-in replacement for gcc or clang in any standard build
89 process for third-party code.
90
91 The instrumentation has a fairly modest performance impact; in conjunction with
92 other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast
93 or even faster than possible with traditional tools.
94
95 The correct way to recompile the target program may vary depending on the
96 specifics of the build process, but a nearly-universal approach would be:
97
98 $ CC=/path/to/afl/afl-gcc ./configure
99 $ make clean all
100
101 For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++.
102
103 The clang wrappers (afl-clang and afl-clang++) can be used in the same way;
104 clang users may also opt to leverage a higher-performance instrumentation mode,
105 as described in llvm_mode/README.llvm.
106
107 When testing libraries, you need to find or write a simple program that reads
108 data from stdin or from a file and passes it to the tested library. In such a
109 case, it is essential to link this executable against a static version of the
110 instrumented library, or to make sure that the correct .so file is loaded at
111 runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static
112 build, usually possible via:
113
114 $ CC=/path/to/afl/afl-gcc ./configure --disable-shared
115
116 Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to
117 automatically enable code hardening options that make it easier to detect
118 simple memory bugs.
119
120 PS. ASAN users are advised to review notes_for_asan.txt file for important
121 caveats.
122
123 4) Instrumenting binary-only apps
124 ---------------------------------
125
126 When source code is *NOT* available, the fuzzer offers experimental support for
127 fast, on-the-fly instrumentation of black-box binaries. This is accomplished
128 with a version of QEMU running in the lesser-known "user space emulation" mode.
129
130 QEMU is a project separate from AFL, but you can conveniently build the
131 feature by doing:
132
133 $ cd qemu_mode
134 $ ./build_qemu_support.sh
135
136 For additional instructions and caveats, see qemu_mode/README.qemu.
137
138 The mode is approximately 2-5x slower than compile-time instrumentation, is
139 less conductive to parallelization, and may have some other quirks.
140
141 5) Choosing initial test cases
142 ------------------------------
143
144 To operate correctly, the fuzzer requires one or more starting file that
145 contains a good example of the input data normally expected by the targeted
146 application. There are two basic rules:
147
148 - Keep the files small. Under 1 kB is ideal, although not strictly necessary.
149 For a discussion of why size matters, see perf_tips.txt.
150
151 - Use multiple test cases only if they are functionally different from
152 each other. There is no point in using fifty different vacation photos
153 to fuzz an image library.
154
155 You can find many good examples of starting files in the testcases/ subdirectory
156 that comes with this tool.
157
158 PS. If a large corpus of data is available for screening, you may want to use
159 the afl-cmin utility to identify a subset of functionally distinct files that
160 exercise different code paths in the target binary.
161
162 6) Fuzzing binaries
163 -------------------
164
165 The fuzzing process itself is carried out by the afl-fuzz utility. This program
166 requires a read-only directory with initial test cases, a separate place to
167 store its findings, plus a path to the binary to test.
168
169 For target binaries that accept input directly from stdin, the usual syntax is:
170
171 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...]
172
173 For programs that take input from a file, use '@@' to mark the location in
174 the target's command line where the input file name should be placed. The
175 fuzzer will substitute this for you:
176
177 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@
178
179 You can also use the -f option to have the mutated data written to a specific
180 file. This is useful if the program expects a particular file extension or so.
181
182 Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command
183 line) or in a traditional, blind-fuzzer mode (specify -n).
184
185 You can use -t and -m to override the default timeout and memory limit for the
186 executed process; rare examples of targets that may need these settings touched
187 include compilers and video decoders.
188
189 Tips for optimizing fuzzing performance are discussed in perf_tips.txt.
190
191 Note that afl-fuzz starts by performing an array of deterministic fuzzing
192 steps, which can take several days. If you want quick & dirty results right
193 away, akin to zzuf or honggfuzz, add the -d option to the command line.
194
195 7) Interpreting output
196 ----------------------
197
198 See the status_screen.txt file for information on how to interpret the
199 displayed stats and monitor the health of the process. Be sure to consult this
200 file especially if any UI elements are highlighted in red.
201
202 The fuzzing process will continue until you press Ctrl-C. At minimum, you want
203 to allow the fuzzer to complete one queue cycle, which may take anywhere from a
204 couple of hours to a week or so.
205
206 There are three subdirectories created within the output directory and updated
207 in real time:
208
209 - queue/ - test cases for every distinctive execution path, plus all the
210 starting files given by the user. This is the synthesized corpus
211 mentioned in section 2.
212
213 Before using this corpus for any other purposes, you can shrink
214 it to a smaller size using the afl-cmin tool. The tool will find
215 a smaller subset of files offering equivalent edge coverage.
216
217 - crashes/ - unique test cases that cause the tested program to receive a
218 fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are
219 grouped by the received signal.
220
221 - hangs/ - unique test cases that cause the tested program to time out. Note
222 that when default (aggressive) timeout settings are in effect,
223 this can be slightly noisy due to latency spikes and other
224 natural phenomena.
225
226 Crashes and hangs are considered "unique" if the associated execution paths
227 involve any state transitions not seen in previously-recorded faults. If a
228 single bug can be reached in multiple ways, there will be some count inflation
229 early in the process, but this should quickly taper off.
230
231 The file names for crashes and hangs are correlated with parent, non-faulting
232 queue entries. This should help with debugging.
233
234 When you can't reproduce a crash found by afl-fuzz, the most likely cause is
235 that you are not setting the same memory limit as used by the tool. Try:
236
237 $ LIMIT_MB=50
238 $ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... )
239
240 Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD,
241 also change -Sv to -Sd.
242
243 Any existing output directory can be also used to resume aborted jobs; try:
244
245 $ ./afl-fuzz -i- -o existing_output_dir [...etc...]
246
247 If you have gnuplot installed, you can also generate some pretty graphs for any
248 active fuzzing task using afl-plot. For an example of how this looks like,
249 see http://lcamtuf.coredump.cx/afl/plot/.
250
251 8) Parallelized fuzzing
252 -----------------------
253
254 Every instance of afl-fuzz takes up roughly one core. This means that on
255 multi-core systems, parallelization is necessary to fully utilize the hardware.
256 For tips on how to fuzz a common target on multiple cores or multiple networked
257 machines, please refer to parallel_fuzzing.txt.
258
259 The parallel fuzzing mode also offers a simple way for interfacing AFL to other
260 fuzzers, to symbolic or concolic execution engines, and so forth; again, see the
261 last section of parallel_fuzzing.txt for tips.
262
263 9) Fuzzer dictionaries
264 ----------------------
265
266 By default, afl-fuzz mutation engine is optimized for compact data formats -
267 say, images, multimedia, compressed data, regular expression syntax, or shell
268 scripts. It is somewhat less suited for languages with particularly verbose and
269 redundant verbiage - notably including HTML, SQL, or JavaScript.
270
271 To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to
272 seed the fuzzing process with an optional dictionary of language keywords,
273 magic headers, or other special tokens associated with the targeted data type
274 - and use that to reconstruct the underlying grammar on the go:
275
276 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html
277
278 To use this feature, you first need to create a dictionary in one of the two
279 formats discussed in testcases/README.testcases; and then point the fuzzer to
280 it via the -x option in the command line.
281
282 There is no way to provide more structured descriptions of the underlying
283 syntax, but the fuzzer will likely figure out some of this based on the
284 instrumentation feedback alone. This actually works in practice, say:
285
286 http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html
287
288 PS. Even when no explicit dictionary is given, afl-fuzz will try to extract
289 existing syntax tokens in the input corpus by watching the instrumentation
290 very closely during deterministic byte flips. This works for some types of
291 parsers and grammars, but isn't nearly as good as the -x mode.
292
293 10) Crash triage
294 ----------------
295
296 The coverage-based grouping of crashes usually produces a small data set that
297 can be quickly triaged manually or with a very simple GDB or Valgrind script.
298 Every crash is also traceable to its parent non-crashing test case in the
299 queue, making it easier to diagnose faults.
300
301 Having said that, it's important to acknowledge that some fuzzing crashes can be
302 difficult quickly evaluate for exploitability without a lot of debugging and
303 code analysis work. To assist with this task, afl-fuzz supports a very unique
304 "crash exploration" mode enabled with the -C flag.
305
306 In this mode, the fuzzer takes one or more crashing test cases as the input,
307 and uses its feedback-driven fuzzing strategies to very quickly enumerate all
308 code paths that can be reached in the program while keeping it in the
309 crashing state.
310
311 Mutations that do not result in a crash are rejected; so are any changes that
312 do not affect the execution path.
313
314 The output is a small corpus of files that can be very rapidly examined to see
315 what degree of control the attacker has over the faulting address, or whether
316 it is possible to get past an initial out-of-bounds read - and see what lies
317 beneath.
318
319 Oh, one more thing: for test case minimization, give afl-tmin a try. The tool
320 can be operated in a very simple way:
321
322 $ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...]
323
324 The tool works with crashing and non-crashing test cases alike. In the crash
325 mode, it will happily accept instrumented and non-instrumented binaries. In the
326 non-crashing mode, the minimizer relies on standard AFL instrumentation to make
327 the file simpler without altering the execution path.
328
329 The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with
330 afl-fuzz.
331
332 Another recent addition to AFL is the afl-analyze tool. It takes an input
333 file, attempts to sequentially flip bytes, and observes the behavior of the
334 tested program. It then color-codes the input based on which sections appear to
335 be critical, and which are not; while not bulletproof, it can often offer quick
336 insights into complex file formats. More info about its operation can be found
337 near the end of technical_details.txt.
338
339 11) Common-sense risks
340 ----------------------
341
342 Please keep in mind that, similarly to many other computationally-intensive
343 tasks, fuzzing may put strain on your hardware and on the OS. In particular:
344
345 - Your CPU will run hot and will need adequate cooling. In most cases, if
346 cooling is insufficient or stops working properly, CPU speeds will be
347 automatically throttled. That said, especially when fuzzing on less
348 suitable hardware (laptops, smartphones, etc), it's not entirely impossible
349 for something to blow up.
350
351 - Targeted programs may end up erratically grabbing gigabytes of memory or
352 filling up disk space with junk files. AFL tries to enforce basic memory
353 limits, but can't prevent each and every possible mishap. The bottom line
354 is that you shouldn't be fuzzing on systems where the prospect of data loss
355 is not an acceptable risk.
356
357 - Fuzzing involves billions of reads and writes to the filesystem. On modern
358 systems, this will be usually heavily cached, resulting in fairly modest
359 "physical" I/O - but there are many factors that may alter this equation.
360 It is your responsibility to monitor for potential trouble; with very heavy
361 I/O, the lifespan of many HDDs and SSDs may be reduced.
362
363 A good way to monitor disk I/O on Linux is the 'iostat' command:
364
365 $ iostat -d 3 -x -k [...optional disk ID...]
366
367 12) Known limitations & areas for improvement
368 ---------------------------------------------
369
370 Here are some of the most important caveats for AFL:
371
372 - AFL detects faults by checking for the first spawned process dying due to
373 a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for
374 these signals may need to have the relevant code commented out. In the same
375 vein, faults in child processed spawned by the fuzzed target may evade
376 detection unless you manually add some code to catch that.
377
378 - As with any other brute-force tool, the fuzzer offers limited coverage if
379 encryption, checksums, cryptographic signatures, or compression are used to
380 wholly wrap the actual data format to be tested.
381
382 To work around this, you can comment out the relevant checks (see
383 experimental/libpng_no_checksum/ for inspiration); if this is not possible,
384 you can also write a postprocessor, as explained in
385 experimental/post_library/.
386
387 - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This
388 isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for
389 tips.
390
391 - There is no direct support for fuzzing network services, background
392 daemons, or interactive apps that require UI interaction to work. You may
393 need to make simple code changes to make them behave in a more traditional
394 way. Preeny may offer a relatively simple option, too - see:
395 https://github.com/zardus/preeny
396
397 Some useful tips for modifying network-based services can be also found at:
398 https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop
399
400 - AFL doesn't output human-readable coverage data. If you want to monitor
401 coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov
402
403 Beyond this, see INSTALL for platform-specific tips.
404
405 13) Special thanks
406 ------------------
407
408 Many of the improvements to afl-fuzz wouldn't be possible without feedback,
409 bug reports, or patches from:
410
411 Jann Horn Hanno Boeck
412 Felix Groebert Jakub Wilk
413 Richard W. M. Jones Alexander Cherepanov
414 Tom Ritter Hovik Manucharyan
415 Sebastian Roschke Eberhard Mattes
416 Padraig Brady Ben Laurie
417 @dronesec Luca Barbato
418 Tobias Ospelt Thomas Jarosch
419 Martin Carpenter Mudge Zatko
420 Joe Zbiciak Ryan Govostes
421 Michael Rash William Robinet
422 Jonathan Gray Filipe Cabecinhas
423 Nico Weber Jodie Cunningham
424 Andrew Griffiths Parker Thompson
425 Jonathan Neuschfer Tyler Nighswander
426 Ben Nagy Samir Aguiar
427 Aidan Thornton Aleksandar Nikolich
428 Sam Hakim Laszlo Szekeres
429 David A. Wheeler Turo Lamminen
430 Andreas Stieger Richard Godbee
431 Louis Dassy teor2345
432 Alex Moneger Dmitry Vyukov
433 Keegan McAllister Kostya Serebryany
434 Richo Healey Martijn Bogaard
435 rc0r Jonathan Foote
436 Christian Holler Dominique Pelle
437 Jacek Wielemborek Leo Barnes
438 Jeremy Barnes Jeff Trull
439 Guillaume Endignoux ilovezfs
440 Daniel Godas-Lopez Franjo Ivancic
441 Austin Seipp Daniel Komaromy
442 Daniel Binderman
443
444 Thank you!
445
446 14) Contact
447 -----------
448
449 Questions? Concerns? Bug reports? The author can be usually reached at
450 <lcamtuf@google.com>.
451
452 There is also a mailing list for the project; to join, send a mail to
453 <afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse
454 archives first, try:
455
456 https://groups.google.com/group/afl-users
457
458 PS. If you wish to submit raw code to be incorporated into the project, please
459 be aware that the copyright on most of AFL is claimed by Google. While you do
460 retain copyright on your contributions, they do ask people to agree to a simple
461 CLA first:
462
463 https://cla.developers.google.com/clas
464
465 Sorry about the hassle. Of course, no CLA is required for feature requests or
466 bug reports.
OLDNEW
« no previous file with comments | « third_party/afl/src/docs/QuickStartGuide.txt ('k') | third_party/afl/src/docs/env_variables.txt » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698