third_party/afl/src/docs/README - Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/

Unified Diff: third_party/afl/src/docs/README

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Fix nits Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: third_party/afl/src/docs/README

diff --git a/third_party/afl/src/docs/README b/third_party/afl/src/docs/README

new file mode 100644

index 0000000000000000000000000000000000000000..642e57b77f8917f8b799b9fa37351e43418f5043

--- /dev/null

+++ b/third_party/afl/src/docs/README

@@ -0,0 +1,466 @@

+==================

+american fuzzy lop

+==================

+ Written and maintained by Michal Zalewski <lcamtuf@google.com>

+ Released under terms and conditions of Apache License, Version 2.0.

+ For new versions and additional information, check out:

+ http://lcamtuf.coredump.cx/afl/

+ To compare notes with other users or get notified about major new features,

+ send a mail to <afl-users+subscribe@googlegroups.com>.

+ ** See QuickStartGuide.txt if you don't have time to read this file. **

+1) Challenges of guided fuzzing

+-------------------------------

+Fuzzing is one of the most powerful and proven strategies for identifying

+security issues in real-world software; it is responsible for the vast

+majority of remote code execution and privilege escalation bugs found to date

+in security-critical software.

+Unfortunately, fuzzing is also relatively shallow; blind, random mutations

+make it very unlikely to reach certain code paths in the tested code, leaving

+some vulnerabilities firmly outside the reach of this technique.

+There have been numerous attempts to solve this problem. One of the early

+approaches - pioneered by Tavis Ormandy - is corpus distillation. The method

+relies on coverage signals to select a subset of interesting seeds from a

+massive, high-quality corpus of candidate files, and then fuzz them by

+traditional means. The approach works exceptionally well, but requires such

+a corpus to be readily available. In addition, block coverage measurements

+provide only a very simplistic understanding of program state, and are less

+useful for guiding the fuzzing effort in the long haul.

+Other, more sophisticated research has focused on techniques such as program

+flow analysis ("concolic execution"), symbolic execution, or static analysis.

+All these methods are extremely promising in experimental settings, but tend

+to suffer from reliability and performance problems in practical uses - and

+currently do not offer a viable alternative to "dumb" fuzzing techniques.

+2) The afl-fuzz approach

+------------------------

+American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simple

+but rock-solid instrumentation-guided genetic algorithm. It uses a modified

+form of edge coverage to effortlessly pick up subtle, local-scale changes to

+program control flow.

+Simplifying a bit, the overall algorithm can be summed up as:

+ 1) Load user-supplied initial test cases into the queue,

+ 2) Take next input file from the queue,

+ 3) Attempt to trim the test case to the smallest size that doesn't alter

+ the measured behavior of the program,

+ 4) Repeatedly mutate the file using a balanced and well-researched variety

+ of traditional fuzzing strategies,

+ 5) If any of the generated mutations resulted in a new state transition

+ recorded by the instrumentation, add mutated output as a new entry in the

+ queue.

+ 6) Go to 2.

+The discovered test cases are also periodically culled to eliminate ones that

+have been obsoleted by newer, higher-coverage finds; and undergo several other

+instrumentation-driven effort minimization steps.

+As a side result of the fuzzing process, the tool creates a small,

+self-contained corpus of interesting test cases. These are extremely useful

+for seeding other, labor- or resource-intensive testing regimes - for example,

+for stress-testing browsers, office applications, graphics suites, or

+closed-source tools.

+The fuzzer is thoroughly tested to deliver out-of-the-box performance far

+superior to blind fuzzing or coverage-only tools.

+3) Instrumenting programs for use with AFL

+------------------------------------------

+When source code is available, instrumentation can be injected by a companion

+tool that works as a drop-in replacement for gcc or clang in any standard build

+process for third-party code.

+The instrumentation has a fairly modest performance impact; in conjunction with

+other optimizations implemented by afl-fuzz, most programs can be fuzzed as fast

+or even faster than possible with traditional tools.

+The correct way to recompile the target program may vary depending on the

+specifics of the build process, but a nearly-universal approach would be:

+$ CC=/path/to/afl/afl-gcc ./configure

+$ make clean all

+For C++ programs, you'd would also want to set CXX=/path/to/afl/afl-g++.

+The clang wrappers (afl-clang and afl-clang++) can be used in the same way;

+clang users may also opt to leverage a higher-performance instrumentation mode,

+as described in llvm_mode/README.llvm.

+When testing libraries, you need to find or write a simple program that reads

+data from stdin or from a file and passes it to the tested library. In such a

+case, it is essential to link this executable against a static version of the

+instrumented library, or to make sure that the correct .so file is loaded at

+runtime (usually by setting LD_LIBRARY_PATH). The simplest option is a static

+build, usually possible via:

+$ CC=/path/to/afl/afl-gcc ./configure --disable-shared

+Setting AFL_HARDEN=1 when calling 'make' will cause the CC wrapper to

+automatically enable code hardening options that make it easier to detect

+simple memory bugs.

+PS. ASAN users are advised to review notes_for_asan.txt file for important

+caveats.

+4) Instrumenting binary-only apps

+---------------------------------

+When source code is *NOT* available, the fuzzer offers experimental support for

+fast, on-the-fly instrumentation of black-box binaries. This is accomplished

+with a version of QEMU running in the lesser-known "user space emulation" mode.

+QEMU is a project separate from AFL, but you can conveniently build the

+feature by doing:

+$ cd qemu_mode

+$ ./build_qemu_support.sh

+For additional instructions and caveats, see qemu_mode/README.qemu.

+The mode is approximately 2-5x slower than compile-time instrumentation, is

+less conductive to parallelization, and may have some other quirks.

+5) Choosing initial test cases

+------------------------------

+To operate correctly, the fuzzer requires one or more starting file that

+contains a good example of the input data normally expected by the targeted

+application. There are two basic rules:

+ - Keep the files small. Under 1 kB is ideal, although not strictly necessary.

+ For a discussion of why size matters, see perf_tips.txt.

+ - Use multiple test cases only if they are functionally different from

+ each other. There is no point in using fifty different vacation photos

+ to fuzz an image library.

+You can find many good examples of starting files in the testcases/ subdirectory

+that comes with this tool.

+PS. If a large corpus of data is available for screening, you may want to use

+the afl-cmin utility to identify a subset of functionally distinct files that

+exercise different code paths in the target binary.

+6) Fuzzing binaries

+-------------------

+The fuzzing process itself is carried out by the afl-fuzz utility. This program

+requires a read-only directory with initial test cases, a separate place to

+store its findings, plus a path to the binary to test.

+For target binaries that accept input directly from stdin, the usual syntax is:

+$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...]

+For programs that take input from a file, use '@@' to mark the location in

+the target's command line where the input file name should be placed. The

+fuzzer will substitute this for you:

+$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@

+You can also use the -f option to have the mutated data written to a specific

+file. This is useful if the program expects a particular file extension or so.

+Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the command

+line) or in a traditional, blind-fuzzer mode (specify -n).

+You can use -t and -m to override the default timeout and memory limit for the

+executed process; rare examples of targets that may need these settings touched

+include compilers and video decoders.

+Tips for optimizing fuzzing performance are discussed in perf_tips.txt.

+Note that afl-fuzz starts by performing an array of deterministic fuzzing

+steps, which can take several days. If you want quick & dirty results right

+away, akin to zzuf or honggfuzz, add the -d option to the command line.

+7) Interpreting output

+----------------------

+See the status_screen.txt file for information on how to interpret the

+displayed stats and monitor the health of the process. Be sure to consult this

+file especially if any UI elements are highlighted in red.

+The fuzzing process will continue until you press Ctrl-C. At minimum, you want

+to allow the fuzzer to complete one queue cycle, which may take anywhere from a

+couple of hours to a week or so.

+There are three subdirectories created within the output directory and updated

+in real time:

+ - queue/ - test cases for every distinctive execution path, plus all the

+ starting files given by the user. This is the synthesized corpus

+ mentioned in section 2.

+ Before using this corpus for any other purposes, you can shrink

+ it to a smaller size using the afl-cmin tool. The tool will find

+ a smaller subset of files offering equivalent edge coverage.

+ - crashes/ - unique test cases that cause the tested program to receive a

+ fatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are

+ grouped by the received signal.

+ - hangs/ - unique test cases that cause the tested program to time out. Note

+ that when default (aggressive) timeout settings are in effect,

+ this can be slightly noisy due to latency spikes and other

+ natural phenomena.

+Crashes and hangs are considered "unique" if the associated execution paths

+involve any state transitions not seen in previously-recorded faults. If a

+single bug can be reached in multiple ways, there will be some count inflation

+early in the process, but this should quickly taper off.

+The file names for crashes and hangs are correlated with parent, non-faulting

+queue entries. This should help with debugging.

+When you can't reproduce a crash found by afl-fuzz, the most likely cause is

+that you are not setting the same memory limit as used by the tool. Try:

+$ LIMIT_MB=50

+$ ( ulimit -Sv $[LIMIT_MB << 10]; /path/to/tested_binary ... )

+Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD,

+also change -Sv to -Sd.

+Any existing output directory can be also used to resume aborted jobs; try:

+$ ./afl-fuzz -i- -o existing_output_dir [...etc...]

+If you have gnuplot installed, you can also generate some pretty graphs for any

+active fuzzing task using afl-plot. For an example of how this looks like,

+see http://lcamtuf.coredump.cx/afl/plot/.

+8) Parallelized fuzzing

+-----------------------

+Every instance of afl-fuzz takes up roughly one core. This means that on

+multi-core systems, parallelization is necessary to fully utilize the hardware.

+For tips on how to fuzz a common target on multiple cores or multiple networked

+machines, please refer to parallel_fuzzing.txt.

+The parallel fuzzing mode also offers a simple way for interfacing AFL to other

+fuzzers, to symbolic or concolic execution engines, and so forth; again, see the

+last section of parallel_fuzzing.txt for tips.

+9) Fuzzer dictionaries

+----------------------

+By default, afl-fuzz mutation engine is optimized for compact data formats -

+say, images, multimedia, compressed data, regular expression syntax, or shell

+scripts. It is somewhat less suited for languages with particularly verbose and

+redundant verbiage - notably including HTML, SQL, or JavaScript.

+To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way to

+seed the fuzzing process with an optional dictionary of language keywords,

+magic headers, or other special tokens associated with the targeted data type

+- and use that to reconstruct the underlying grammar on the go:

+ http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html

+To use this feature, you first need to create a dictionary in one of the two

+formats discussed in testcases/README.testcases; and then point the fuzzer to

+it via the -x option in the command line.

+There is no way to provide more structured descriptions of the underlying

+syntax, but the fuzzer will likely figure out some of this based on the

+instrumentation feedback alone. This actually works in practice, say:

+ http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html

+PS. Even when no explicit dictionary is given, afl-fuzz will try to extract

+existing syntax tokens in the input corpus by watching the instrumentation

+very closely during deterministic byte flips. This works for some types of

+parsers and grammars, but isn't nearly as good as the -x mode.

+10) Crash triage

+----------------

+The coverage-based grouping of crashes usually produces a small data set that

+can be quickly triaged manually or with a very simple GDB or Valgrind script.

+Every crash is also traceable to its parent non-crashing test case in the

+queue, making it easier to diagnose faults.

+Having said that, it's important to acknowledge that some fuzzing crashes can be

+difficult quickly evaluate for exploitability without a lot of debugging and

+code analysis work. To assist with this task, afl-fuzz supports a very unique

+"crash exploration" mode enabled with the -C flag.

+In this mode, the fuzzer takes one or more crashing test cases as the input,

+and uses its feedback-driven fuzzing strategies to very quickly enumerate all

+code paths that can be reached in the program while keeping it in the

+crashing state.

+Mutations that do not result in a crash are rejected; so are any changes that

+do not affect the execution path.

+The output is a small corpus of files that can be very rapidly examined to see

+what degree of control the attacker has over the faulting address, or whether

+it is possible to get past an initial out-of-bounds read - and see what lies

+beneath.

+Oh, one more thing: for test case minimization, give afl-tmin a try. The tool

+can be operated in a very simple way:

+$ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...]

+The tool works with crashing and non-crashing test cases alike. In the crash

+mode, it will happily accept instrumented and non-instrumented binaries. In the

+non-crashing mode, the minimizer relies on standard AFL instrumentation to make

+the file simpler without altering the execution path.

+The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible with

+afl-fuzz.

+Another recent addition to AFL is the afl-analyze tool. It takes an input

+file, attempts to sequentially flip bytes, and observes the behavior of the

+tested program. It then color-codes the input based on which sections appear to

+be critical, and which are not; while not bulletproof, it can often offer quick

+insights into complex file formats. More info about its operation can be found

+near the end of technical_details.txt.

+11) Common-sense risks

+----------------------

+Please keep in mind that, similarly to many other computationally-intensive

+tasks, fuzzing may put strain on your hardware and on the OS. In particular:

+ - Your CPU will run hot and will need adequate cooling. In most cases, if

+ cooling is insufficient or stops working properly, CPU speeds will be

+ automatically throttled. That said, especially when fuzzing on less

+ suitable hardware (laptops, smartphones, etc), it's not entirely impossible

+ for something to blow up.

+ - Targeted programs may end up erratically grabbing gigabytes of memory or

+ filling up disk space with junk files. AFL tries to enforce basic memory

+ limits, but can't prevent each and every possible mishap. The bottom line

+ is that you shouldn't be fuzzing on systems where the prospect of data loss

+ is not an acceptable risk.

+ - Fuzzing involves billions of reads and writes to the filesystem. On modern

+ systems, this will be usually heavily cached, resulting in fairly modest

+ "physical" I/O - but there are many factors that may alter this equation.

+ It is your responsibility to monitor for potential trouble; with very heavy

+ I/O, the lifespan of many HDDs and SSDs may be reduced.

+ A good way to monitor disk I/O on Linux is the 'iostat' command:

+ $ iostat -d 3 -x -k [...optional disk ID...]

+12) Known limitations & areas for improvement

+---------------------------------------------

+Here are some of the most important caveats for AFL:

+ - AFL detects faults by checking for the first spawned process dying due to

+ a signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for

+ these signals may need to have the relevant code commented out. In the same

+ vein, faults in child processed spawned by the fuzzed target may evade

+ detection unless you manually add some code to catch that.

+ - As with any other brute-force tool, the fuzzer offers limited coverage if

+ encryption, checksums, cryptographic signatures, or compression are used to

+ wholly wrap the actual data format to be tested.

+ To work around this, you can comment out the relevant checks (see

+ experimental/libpng_no_checksum/ for inspiration); if this is not possible,

+ you can also write a postprocessor, as explained in

+ experimental/post_library/.

+ - There are some unfortunate trade-offs with ASAN and 64-bit binaries. This

+ isn't due to any specific fault of afl-fuzz; see notes_for_asan.txt for

+ tips.

+ - There is no direct support for fuzzing network services, background

+ daemons, or interactive apps that require UI interaction to work. You may

+ need to make simple code changes to make them behave in a more traditional

+ way. Preeny may offer a relatively simple option, too - see:

+ https://github.com/zardus/preeny

+ Some useful tips for modifying network-based services can be also found at:

+ https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop

+ - AFL doesn't output human-readable coverage data. If you want to monitor

+ coverage, use afl-cov from Michael Rash: https://github.com/mrash/afl-cov

+Beyond this, see INSTALL for platform-specific tips.

+13) Special thanks

+------------------

+Many of the improvements to afl-fuzz wouldn't be possible without feedback,

+bug reports, or patches from:

+ Jann Horn Hanno Boeck

+ Felix Groebert Jakub Wilk

+ Richard W. M. Jones Alexander Cherepanov

+ Tom Ritter Hovik Manucharyan

+ Sebastian Roschke Eberhard Mattes

+ Padraig Brady Ben Laurie

+ @dronesec Luca Barbato

+ Tobias Ospelt Thomas Jarosch

+ Martin Carpenter Mudge Zatko

+ Joe Zbiciak Ryan Govostes

+ Michael Rash William Robinet

+ Jonathan Gray Filipe Cabecinhas

+ Nico Weber Jodie Cunningham

+ Andrew Griffiths Parker Thompson

+ Jonathan Neuschfer Tyler Nighswander

+ Ben Nagy Samir Aguiar

+ Aidan Thornton Aleksandar Nikolich

+ Sam Hakim Laszlo Szekeres

+ David A. Wheeler Turo Lamminen

+ Andreas Stieger Richard Godbee

+ Louis Dassy teor2345

+ Alex Moneger Dmitry Vyukov

+ Keegan McAllister Kostya Serebryany

+ Richo Healey Martijn Bogaard

+ rc0r Jonathan Foote

+ Christian Holler Dominique Pelle

+ Jacek Wielemborek Leo Barnes

+ Jeremy Barnes Jeff Trull

+ Guillaume Endignoux ilovezfs

+ Daniel Godas-Lopez Franjo Ivancic

+ Austin Seipp Daniel Komaromy

+ Daniel Binderman

+Thank you!

+14) Contact

+-----------

+Questions? Concerns? Bug reports? The author can be usually reached at

+<lcamtuf@google.com>.

+There is also a mailing list for the project; to join, send a mail to

+<afl-users+subscribe@googlegroups.com>. Or, if you prefer to browse

+archives first, try:

+ https://groups.google.com/group/afl-users

+PS. If you wish to submit raw code to be incorporated into the project, please

+be aware that the copyright on most of AFL is claimed by Google. While you do

+retain copyright on your contributions, they do ask people to agree to a simple

+CLA first:

+ https://cla.developers.google.com/clas

+Sorry about the hassle. Of course, no CLA is required for feature requests or

+bug reports.

« no previous file with comments | « third_party/afl/src/docs/QuickStartGuide.txt ('k') | third_party/afl/src/docs/env_variables.txt » ('j') | no next file with comments »