third_party/afl/src/docs/README - Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/

Side by Side Diff: third_party/afl/src/docs/README

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Fix nits Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 ==================

	2 american fuzzy lop

	3 ==================

	4

	5 Written and maintained by Michal Zalewski <lcamtuf@google.com>

	6

	7 Copyright 2013, 2014, 2015, 2016 Google Inc. All rights reserved.

	8 Released under terms and conditions of Apache License, Version 2.0.

	9

	10 For new versions and additional information, check out:

	11 http://lcamtuf.coredump.cx/afl/

	12

	13 To compare notes with other users or get notified about major new features,

	14 send a mail to <afl-users+subscribe@googlegroups.com>.

	15

	16 See QuickStartGuide.txt if you don't have time to read this file.

	17

	18 1) Challenges of guided fuzzing

	19 -------------------------------

	20

	21 Fuzzing is one of the most powerful and proven strategies for identifying

	22 security issues in real-world software; it is responsible for the vast

	23 majority of remote code execution and privilege escalation bugs found to date

	24 in security-critical software.

	25

	26 Unfortunately, fuzzing is also relatively shallow; blind, random mutations

	27 make it very unlikely to reach certain code paths in the tested code, leaving

	28 some vulnerabilities firmly outside the reach of this technique.

	29

	30 There have been numerous attempts to solve this problem. One of the early

	31 approaches - pioneered by Tavis Ormandy - is corpus distillation. The method

	32 relies on coverage signals to select a subset of interesting seeds from a

	33 massive, high-quality corpus of candidate files, and then fuzz them by

	34 traditional means. The approach works exceptionally well, but requires such

	35 a corpus to be readily available. In addition, block coverage measurements

	36 provide only a very simplistic understanding of program state, and are less

	37 useful for guiding the fuzzing effort in the long haul.

	38

	39 Other, more sophisticated research has focused on techniques such as program

	40 flow analysis ("concolic execution"), symbolic execution, or static analysis.

	41 All these methods are extremely promising in experimental settings, but tend

	42 to suffer from reliability and performance problems in practical uses - and

	43 currently do not offer a viable alternative to "dumb" fuzzing techniques.

	44

	45 2) The afl-fuzz approach

	46 ------------------------

	47

	48 American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple

	49 but rock-solid instrumentation-guided genetic algorithm. It uses a modified

	50 form of edge coverage to effortlessly pick up subtle, local-scale changes to

	51 program control flow.

	52

	53 Simplifying a bit, the overall algorithm can be summed up as:

	54

	55 1) Load user-supplied initial test cases into the queue,

	56

	57 2) Take next input file from the queue,

	58

	59 3) Attempt to trim the test case to the smallest size that doesn't alter

	60 the measured behavior of the program,

	61

	62 4) Repeatedly mutate the file using a balanced and well-researched variety

	63 of traditional fuzzing strategies,

	64

	65 5) If any of the generated mutations resulted in a new state transition

	66 recorded by the instrumentation, add mutated output as a new entry in the

	67 queue.

	68

	69 6) Go to 2.

	70

	71 The discovered test cases are also periodically culled to eliminate ones that

	72 have been obsoleted by newer, higher-coverage finds; and undergo several other

	73 instrumentation-driven effort minimization steps.

	74

	75 As a side result of the fuzzing process, the tool creates a small,

	76 self-contained corpus of interesting test cases. These are extremely useful

	77 for seeding other, labor- or resource-intensive testing regimes - for example,

	78 for stress-testing browsers, office applications, graphics suites, or

	79 closed-source tools.

	80

	81 The fuzzer is thoroughly tested to deliver out-of-the-box performance far

	82 superior to blind fuzzing or coverage-only tools.

	83

	84 3) Instrumenting programs for use with AFL

	85 ------------------------------------------

	86

	87 When source code is available, instrumentation can be injected by a companion

	88 tool that works as a drop-in replacement for gcc or clang in any standard build

	89 process for third-party code.

	90

	91 The instrumentation has a fairly modest performance impact; in conjunction with

	92 other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast

	93 or even faster than possible with traditional tools.

	94

	95 The correct way to recompile the target program may vary depending on the

	96 specifics of the build process, but a nearly-universal approach would be:

	97

	98 $ CC=/path/to/afl/afl-gcc ./configure

	99 $ make clean all

	100

	101 For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++.

	102

	103 The clang wrappers (afl-clang and afl-clang++) can be used in the same way;

	104 clang users may also opt to leverage a higher-performance instrumentation mode,

	105 as described in llvm_mode/README.llvm.

	106

	107 When testing libraries, you need to find or write a simple program that reads

	108 data from stdin or from a file and passes it to the tested library. In such a

	109 case, it is essential to link this executable against a static version of the

	110 instrumented library, or to make sure that the correct .so file is loaded at

	111 runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static

	112 build, usually possible via:

	113

	114 $ CC=/path/to/afl/afl-gcc ./configure --disable-shared

	115

	116 Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to

	117 automatically enable code hardening options that make it easier to detect

	118 simple memory bugs.

	119

	120 PS. ASAN users are advised to review notes_for_asan.txt file for important

	121 caveats.

	122

	123 4) Instrumenting binary-only apps

	124 ---------------------------------

	125

	126 When source code is NOT available, the fuzzer offers experimental support for

	127 fast, on-the-fly instrumentation of black-box binaries. This is accomplished

	128 with a version of QEMU running in the lesser-known "user space emulation" mode.

	129

	130 QEMU is a project separate from AFL, but you can conveniently build the

	131 feature by doing:

	132

	133 $ cd qemu_mode

	134 $ ./build_qemu_support.sh

	135

	136 For additional instructions and caveats, see qemu_mode/README.qemu.

	137

	138 The mode is approximately 2-5x slower than compile-time instrumentation, is

	139 less conductive to parallelization, and may have some other quirks.

	140

	141 5) Choosing initial test cases

	142 ------------------------------

	143

	144 To operate correctly, the fuzzer requires one or more starting file that

	145 contains a good example of the input data normally expected by the targeted

	146 application. There are two basic rules:

	147

	148 - Keep the files small. Under 1 kB is ideal, although not strictly necessary.

	149 For a discussion of why size matters, see perf_tips.txt.

	150

	151 - Use multiple test cases only if they are functionally different from

	152 each other. There is no point in using fifty different vacation photos

	153 to fuzz an image library.

	154

	155 You can find many good examples of starting files in the testcases/ subdirectory

	156 that comes with this tool.

	157

	158 PS. If a large corpus of data is available for screening, you may want to use

	159 the afl-cmin utility to identify a subset of functionally distinct files that

	160 exercise different code paths in the target binary.

	161

	162 6) Fuzzing binaries

	163 -------------------

	164

	165 The fuzzing process itself is carried out by the afl-fuzz utility. This program

	166 requires a read-only directory with initial test cases, a separate place to

	167 store its findings, plus a path to the binary to test.

	168

	169 For target binaries that accept input directly from stdin, the usual syntax is:

	170

	171 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...]

	172

	173 For programs that take input from a file, use '@@' to mark the location in

	174 the target's command line where the input file name should be placed. The

	175 fuzzer will substitute this for you:

	176

	177 $ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@

	178

	179 You can also use the -f option to have the mutated data written to a specific

	180 file. This is useful if the program expects a particular file extension or so.

	181

	182 Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command

	183 line) or in a traditional, blind-fuzzer mode (specify -n).

	184

	185 You can use -t and -m to override the default timeout and memory limit for the

	186 executed process; rare examples of targets that may need these settings touched

	187 include compilers and video decoders.

	188

	189 Tips for optimizing fuzzing performance are discussed in perf_tips.txt.

	190

	191 Note that afl-fuzz starts by performing an array of deterministic fuzzing

	192 steps, which can take several days. If you want quick & dirty results right

	193 away, akin to zzuf or honggfuzz, add the -d option to the command line.

	194

	195 7) Interpreting output

	196 ----------------------

	197

	198 See the status_screen.txt file for information on how to interpret the

	199 displayed stats and monitor the health of the process. Be sure to consult this

	200 file especially if any UI elements are highlighted in red.

	201

	202 The fuzzing process will continue until you press Ctrl-C. At minimum, you want

	203 to allow the fuzzer to complete one queue cycle, which may take anywhere from a

	204 couple of hours to a week or so.

	205

	206 There are three subdirectories created within the output directory and updated

	207 in real time:

	208

	209 - queue/ - test cases for every distinctive execution path, plus all the

	210 starting files given by the user. This is the synthesized corpus

	211 mentioned in section 2.

	212

	213 Before using this corpus for any other purposes, you can shrink

	214 it to a smaller size using the afl-cmin tool. The tool will find

	215 a smaller subset of files offering equivalent edge coverage.

	216

	217 - crashes/ - unique test cases that cause the tested program to receive a

	218 fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are

	219 grouped by the received signal.

	220

	221 - hangs/ - unique test cases that cause the tested program to time out. Note

	222 that when default (aggressive) timeout settings are in effect,

	223 this can be slightly noisy due to latency spikes and other

	224 natural phenomena.

	225

	226 Crashes and hangs are considered "unique" if the associated execution paths

	227 involve any state transitions not seen in previously-recorded faults. If a

	228 single bug can be reached in multiple ways, there will be some count inflation

	229 early in the process, but this should quickly taper off.

	230

	231 The file names for crashes and hangs are correlated with parent, non-faulting

	232 queue entries. This should help with debugging.

	233

	234 When you can't reproduce a crash found by afl-fuzz, the most likely cause is

	235 that you are not setting the same memory limit as used by the tool. Try:

	236

	237 $ LIMIT_MB=50

	238 $ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... )

	239

	240 Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD,

	241 also change -Sv to -Sd.

	242

	243 Any existing output directory can be also used to resume aborted jobs; try:

	244

	245 $ ./afl-fuzz -i- -o existing_output_dir [...etc...]

	246

	247 If you have gnuplot installed, you can also generate some pretty graphs for any

	248 active fuzzing task using afl-plot. For an example of how this looks like,

	249 see http://lcamtuf.coredump.cx/afl/plot/.

	250

	251 8) Parallelized fuzzing

	252 -----------------------

	253

	254 Every instance of afl-fuzz takes up roughly one core. This means that on

	255 multi-core systems, parallelization is necessary to fully utilize the hardware.

	256 For tips on how to fuzz a common target on multiple cores or multiple networked

	257 machines, please refer to parallel_fuzzing.txt.

	258

	259 The parallel fuzzing mode also offers a simple way for interfacing AFL to other

	260 fuzzers, to symbolic or concolic execution engines, and so forth; again, see the

	261 last section of parallel_fuzzing.txt for tips.

	262

	263 9) Fuzzer dictionaries

	264 ----------------------

	265

	266 By default, afl-fuzz mutation engine is optimized for compact data formats -

	267 say, images, multimedia, compressed data, regular expression syntax, or shell

	268 scripts. It is somewhat less suited for languages with particularly verbose and

	269 redundant verbiage - notably including HTML, SQL, or JavaScript.

	270

	271 To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to

	272 seed the fuzzing process with an optional dictionary of language keywords,

	273 magic headers, or other special tokens associated with the targeted data type

	274 - and use that to reconstruct the underlying grammar on the go:

	275

	276 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html

	277

	278 To use this feature, you first need to create a dictionary in one of the two

	279 formats discussed in testcases/README.testcases; and then point the fuzzer to

	280 it via the -x option in the command line.

	281

	282 There is no way to provide more structured descriptions of the underlying

	283 syntax, but the fuzzer will likely figure out some of this based on the

	284 instrumentation feedback alone. This actually works in practice, say:

	285

	286 http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html

	287

	288 PS. Even when no explicit dictionary is given, afl-fuzz will try to extract

	289 existing syntax tokens in the input corpus by watching the instrumentation

	290 very closely during deterministic byte flips. This works for some types of

	291 parsers and grammars, but isn't nearly as good as the -x mode.

	292

	293 10) Crash triage

	294 ----------------

	295

	296 The coverage-based grouping of crashes usually produces a small data set that

	297 can be quickly triaged manually or with a very simple GDB or Valgrind script.

	298 Every crash is also traceable to its parent non-crashing test case in the

	299 queue, making it easier to diagnose faults.

	300

	301 Having said that, it's important to acknowledge that some fuzzing crashes can be

	302 difficult quickly evaluate for exploitability without a lot of debugging and

	303 code analysis work. To assist with this task, afl-fuzz supports a very unique

	304 "crash exploration" mode enabled with the -C flag.

	305

	306 In this mode, the fuzzer takes one or more crashing test cases as the input,

	307 and uses its feedback-driven fuzzing strategies to very quickly enumerate all

	308 code paths that can be reached in the program while keeping it in the

	309 crashing state.

	310

	311 Mutations that do not result in a crash are rejected; so are any changes that

	312 do not affect the execution path.

	313

	314 The output is a small corpus of files that can be very rapidly examined to see

	315 what degree of control the attacker has over the faulting address, or whether

	316 it is possible to get past an initial out-of-bounds read - and see what lies

	317 beneath.

	318

	319 Oh, one more thing: for test case minimization, give afl-tmin a try. The tool

	320 can be operated in a very simple way:

	321

	322 $ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...]

	323

	324 The tool works with crashing and non-crashing test cases alike. In the crash

	325 mode, it will happily accept instrumented and non-instrumented binaries. In the

	326 non-crashing mode, the minimizer relies on standard AFL instrumentation to make

	327 the file simpler without altering the execution path.

	328

	329 The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with

	330 afl-fuzz.

	331

	332 Another recent addition to AFL is the afl-analyze tool. It takes an input

	333 file, attempts to sequentially flip bytes, and observes the behavior of the

	334 tested program. It then color-codes the input based on which sections appear to

	335 be critical, and which are not; while not bulletproof, it can often offer quick

	336 insights into complex file formats. More info about its operation can be found

	337 near the end of technical_details.txt.

	338

	339 11) Common-sense risks

	340 ----------------------

	341

	342 Please keep in mind that, similarly to many other computationally-intensive

	343 tasks, fuzzing may put strain on your hardware and on the OS. In particular:

	344

	345 - Your CPU will run hot and will need adequate cooling. In most cases, if

	346 cooling is insufficient or stops working properly, CPU speeds will be

	347 automatically throttled. That said, especially when fuzzing on less

	348 suitable hardware (laptops, smartphones, etc), it's not entirely impossible

	349 for something to blow up.

	350

	351 - Targeted programs may end up erratically grabbing gigabytes of memory or

	352 filling up disk space with junk files. AFL tries to enforce basic memory

	353 limits, but can't prevent each and every possible mishap. The bottom line

	354 is that you shouldn't be fuzzing on systems where the prospect of data loss

	355 is not an acceptable risk.

	356

	357 - Fuzzing involves billions of reads and writes to the filesystem. On modern

	358 systems, this will be usually heavily cached, resulting in fairly modest

	359 "physical" I/O - but there are many factors that may alter this equation.

	360 It is your responsibility to monitor for potential trouble; with very heavy

	361 I/O, the lifespan of many HDDs and SSDs may be reduced.

	362

	363 A good way to monitor disk I/O on Linux is the 'iostat' command:

	364

	365 $ iostat -d 3 -x -k [...optional disk ID...]

	366

	367 12) Known limitations & areas for improvement

	368 ---------------------------------------------

	369

	370 Here are some of the most important caveats for AFL:

	371

	372 - AFL detects faults by checking for the first spawned process dying due to

	373 a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for

	374 these signals may need to have the relevant code commented out. In the same

	375 vein, faults in child processed spawned by the fuzzed target may evade

	376 detection unless you manually add some code to catch that.

	377

	378 - As with any other brute-force tool, the fuzzer offers limited coverage if

	379 encryption, checksums, cryptographic signatures, or compression are used to

	380 wholly wrap the actual data format to be tested.

	381

	382 To work around this, you can comment out the relevant checks (see

	383 experimental/libpng_no_checksum/ for inspiration); if this is not possible,

	384 you can also write a postprocessor, as explained in

	385 experimental/post_library/.

	386

	387 - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This

	388 isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for

	389 tips.

	390

	391 - There is no direct support for fuzzing network services, background

	392 daemons, or interactive apps that require UI interaction to work. You may

	393 need to make simple code changes to make them behave in a more traditional

	394 way. Preeny may offer a relatively simple option, too - see:

	395 https://github.com/zardus/preeny

	396

	397 Some useful tips for modifying network-based services can be also found at:

	398 https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop

	399

	400 - AFL doesn't output human-readable coverage data. If you want to monitor

	401 coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov

	402

	403 Beyond this, see INSTALL for platform-specific tips.

	404

	405 13) Special thanks

	406 ------------------

	407

	408 Many of the improvements to afl-fuzz wouldn't be possible without feedback,

	409 bug reports, or patches from:

	410

	411 Jann Horn Hanno Boeck

	412 Felix Groebert Jakub Wilk

	413 Richard W. M. Jones Alexander Cherepanov

	414 Tom Ritter Hovik Manucharyan

	415 Sebastian Roschke Eberhard Mattes

	416 Padraig Brady Ben Laurie

	417 @dronesec Luca Barbato

	418 Tobias Ospelt Thomas Jarosch

	419 Martin Carpenter Mudge Zatko

	420 Joe Zbiciak Ryan Govostes

	421 Michael Rash William Robinet

	422 Jonathan Gray Filipe Cabecinhas

	423 Nico Weber Jodie Cunningham

	424 Andrew Griffiths Parker Thompson

	425 Jonathan Neuschfer Tyler Nighswander

	426 Ben Nagy Samir Aguiar

	427 Aidan Thornton Aleksandar Nikolich

	428 Sam Hakim Laszlo Szekeres

	429 David A. Wheeler Turo Lamminen

	430 Andreas Stieger Richard Godbee

	431 Louis Dassy teor2345

	432 Alex Moneger Dmitry Vyukov

	433 Keegan McAllister Kostya Serebryany

	434 Richo Healey Martijn Bogaard

	435 rc0r Jonathan Foote

	436 Christian Holler Dominique Pelle

	437 Jacek Wielemborek Leo Barnes

	438 Jeremy Barnes Jeff Trull

	439 Guillaume Endignoux ilovezfs

	440 Daniel Godas-Lopez Franjo Ivancic

	441 Austin Seipp Daniel Komaromy

	442 Daniel Binderman

	443

	444 Thank you!

	445

	446 14) Contact

	447 -----------

	448

	449 Questions? Concerns? Bug reports? The author can be usually reached at

	450 <lcamtuf@google.com>.

	451

	452 There is also a mailing list for the project; to join, send a mail to

	453 <afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse

	454 archives first, try:

	455

	456 https://groups.google.com/group/afl-users

	457

	458 PS. If you wish to submit raw code to be incorporated into the project, please

	459 be aware that the copyright on most of AFL is claimed by Google. While you do

	460 retain copyright on your contributions, they do ask people to agree to a simple

	461 CLA first:

	462

	463 https://cla.developers.google.com/clas

	464

	465 Sorry about the hassle. Of course, no CLA is required for feature requests or

	466 bug reports.

OLD	NEW

« no previous file with comments | « third_party/afl/src/docs/QuickStartGuide.txt ('k') | third_party/afl/src/docs/env_variables.txt » ('j') | no next file with comments »