third_party/afl/src/docs/historical_notes.txt - Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/

Side by Side Diff: third_party/afl/src/docs/historical_notes.txt

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Fix nits Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 ================

	2 Historical notes

	3 ================

	4

	5 This doc talks about the rationale of some of the high-level design decisions

	6 for American Fuzzy Lop. It's adopted from a discussion with Rob Graham.

	7 See README for the general instruction manual, and technical_details.txt for

	8 additional implementation-level insights.

	9

	10 1) Influences

	11 -------------

	12

	13 In short, afl-fuzz is inspired chiefly by the work done by Tavis Ormandy back

	14 in 2007. Tavis did some very persuasive experiments using gcov block coverage

	15 to select optimal test cases out of a large corpus of data, and then using

	16 them as a starting point for traditional fuzzing workflows.

	17

	18 (By "persuasive", I mean: netting a significant number of interesting

	19 vulnerabilities.)

	20

	21 In parallel to this, both Tavis and I were interested in evolutionary fuzzing.

	22 Tavis had his experiments, and I was working on a tool called bunny-the-fuzzer,

	23 released somewhere in 2007.

	24

	25 Bunny used a generational algorithm not much different from afl-fuzz, but

	26 also tried to reason about the relationship between various input bits and

	27 the internal state of the program, with hopes of deriving some additional value

	28 from that. The reasoning / correlation part was probably in part inspired by

	29 other projects done around the same time by Will Drewry and Chris Evans.

	30

	31 The state correlation approach sounded very sexy on paper, but ultimately, made

	32 the fuzzer complicated, brittle, and cumbersome to use; every other target

	33 program would require a tweak or two. Because Bunny didn't fare a whole lot

	34 better than less sophisticated brute-force tools, I eventually decided to write

	35 it off. You can still find its original documentation at:

	36

	37 https://code.google.com/p/bunny-the-fuzzer/wiki/BunnyDoc

	38

	39 There has been a fair amount of independent work, too. Most notably, a few

	40 weeks earlier that year, Jared DeMott had a Defcon presentation about a

	41 coverage-driven fuzzer that relied on coverage as a fitness function.

	42

	43 Jared's approach was by no means identical to what afl-fuzz does, but it was in

	44 the same ballpark. His fuzzer tried to explicitly solve for the maximum coverage

	45 with a single input file; in comparison, afl simply selects for cases that do

	46 something new (which yields better results - see technical_details.txt).

	47

	48 A few years later, Gabriel Campana released fuzzgrind, a tool that relied purely

	49 on Valgrind and a constraint solver to maximize coverage without any brute-force

	50 bits; and Microsoft Research folks talked extensively about their still

	51 non-public, solver-based SAGE framework.

	52

	53 In the past six years or so, I've also seen a fair number of academic papers

	54 that dealt with smart fuzzing (focusing chiefly on symbolic execution) and a

	55 couple papers that discussed proof-of-concept applications of genetic

	56 algorithms with the same goals in mind. I'm unconvinced how practical most of

	57 these experiments were; I suspect that many of them suffer from the

	58 bunny-the-fuzzer's curse of being cool on paper and in carefully designed

	59 experiments, but failing the ultimate test of being able to find new,

	60 worthwhile security bugs in otherwise well-fuzzed, real-world software.

	61

	62 In some ways, the baseline that the "cool" solutions have to compete against is

	63 a lot more impressive than it may seem, making it difficult for competitors to

	64 stand out. For a singular example, check out the work by Gynvael and Mateusz

	65 Jurczyk, applying "dumb" fuzzing to ffmpeg, a prominent and security-critical

	66 component of modern browsers and media players:

	67

	68 http://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-fixes.htm l

	69

	70 Effortlessly getting comparable results with state-of-the-art symbolic execution

	71 in equally complex software still seems fairly unlikely, and hasn't been

	72 demonstrated in practice so far.

	73

	74 But I digress; ultimately, attribution is hard, and glorying the fundamental

	75 concepts behind AFL is probably a waste of time. The devil is very much in the

	76 often-overlooked details, which brings us to...

	77

	78 2) Design goals for afl-fuzz

	79 ----------------------------

	80

	81 In short, I believe that the current implementation of afl-fuzz takes care of

	82 several itches that seemed impossible to scratch with other tools:

	83

	84 1) Speed. It's genuinely hard to compete with brute force when your "smart"

	85 approach is resource-intensive. If your instrumentation makes it 10x more

	86 likely to find a bug, but runs 100x slower, your users are getting a bad

	87 deal.

	88

	89 To avoid starting with a handicap, afl-fuzz is meant to let you fuzz most of

	90 the intended targets at roughly their native speed - so even if it doesn't

	91 add value, you do not lose much.

	92

	93 On top of this, the tool leverages instrumentation to actually reduce the

	94 amount of work in a couple of ways: for example, by carefully trimming the

	95 corpus or skipping non-functional but non-trimmable regions in the input

	96 files.

	97

	98 2) Rock-solid reliability. It's hard to compete with brute force if your

	99 approach is brittle and fails unexpectedly. Automated testing is attractive

	100 because it's simple to use and scalable; anything that goes against these

	101 principles is an unwelcome trade-off and means that your tool will be used

	102 less often and with less consistent results.

	103

	104 Most of the approaches based on symbolic execution, taint tracking, or

	105 complex syntax-aware instrumentation are currently fairly unreliable with

	106 real-world targets. Perhaps more importantly, their failure modes can render

	107 them strictly worse than "dumb" tools, and such degradation can be difficult

	108 for less experienced users to notice and correct.

	109

	110 In contrast, afl-fuzz is designed to be rock solid, chiefly by keeping it

	111 simple. In fact, at its core, it's designed to be just a very good

	112 traditional fuzzer with a wide range of interesting, well-researched

	113 strategies to go by. The fancy parts just help it focus the effort in

	114 places where it matters the most.

	115

	116 3) Simplicity. The author of a testing framework is probably the only person

	117 who truly understands the impact of all the settings offered by the tool -

	118 and who can dial them in just right. Yet, even the most rudimentary fuzzer

	119 frameworks often come with countless knobs and fuzzing ratios that need to

	120 be guessed by the operator ahead of the time. This can do more harm than

	121 good.

	122

	123 AFL is designed to avoid this as much as possible. The three knobs you

	124 can play with are the output file, the memory limit, and the ability to

	125 override the default, auto-calibrated timeout. The rest is just supposed to

	126 work. When it doesn't, user-friendly error messages outline the probable

	127 causes and workarounds, and get you back on track right away.

	128

	129 4) Chainability. Most general-purpose fuzzers can't be easily employed

	130 against resource-hungry or interaction-heavy tools, necessitating the

	131 creation of custom in-process fuzzers or the investment of massive CPU

	132 power (most of which is wasted on tasks not directly related to the code

	133 we actually want to test).

	134

	135 AFL tries to scratch this itch by allowing users to use more lightweight

	136 targets (e.g., standalone image parsing libraries) to create small

	137 corpora of interesting test cases that can be fed into a manual testing

	138 process or a UI harness later on.

	139

	140 As mentioned in technical_details.txt, AFL does all this not by systematically

	141 applying a single overarching CS concept, but by experimenting with a variety

	142 of small, complementary methods that were shown to reliably yields results

	143 better than chance. The use of instrumentation is a part of that toolkit, but is

	144 far from being the most important one.

	145

	146 Ultimately, what matters is that afl-fuzz is designed to find cool bugs - and

	147 has a pretty robust track record of doing just that.

OLD	NEW

« no previous file with comments | « third_party/afl/src/docs/env_variables.txt ('k') | third_party/afl/src/docs/notes_for_asan.txt » ('j') | no next file with comments »