Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(9)

Side by Side Diff: third_party/afl/src/docs/historical_notes.txt

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Fix nits Created 4 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « third_party/afl/src/docs/env_variables.txt ('k') | third_party/afl/src/docs/notes_for_asan.txt » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 ================
2 Historical notes
3 ================
4
5 This doc talks about the rationale of some of the high-level design decisions
6 for American Fuzzy Lop. It's adopted from a discussion with Rob Graham.
7 See README for the general instruction manual, and technical_details.txt for
8 additional implementation-level insights.
9
10 1) Influences
11 -------------
12
13 In short, afl-fuzz is inspired chiefly by the work done by Tavis Ormandy back
14 in 2007. Tavis did some very persuasive experiments using gcov block coverage
15 to select optimal test cases out of a large corpus of data, and then using
16 them as a starting point for traditional fuzzing workflows.
17
18 (By "persuasive", I mean: netting a significant number of interesting
19 vulnerabilities.)
20
21 In parallel to this, both Tavis and I were interested in evolutionary fuzzing.
22 Tavis had his experiments, and I was working on a tool called bunny-the-fuzzer,
23 released somewhere in 2007.
24
25 Bunny used a generational algorithm not much different from afl-fuzz, but
26 also tried to reason about the relationship between various input bits and
27 the internal state of the program, with hopes of deriving some additional value
28 from that. The reasoning / correlation part was probably in part inspired by
29 other projects done around the same time by Will Drewry and Chris Evans.
30
31 The state correlation approach sounded very sexy on paper, but ultimately, made
32 the fuzzer complicated, brittle, and cumbersome to use; every other target
33 program would require a tweak or two. Because Bunny didn't fare a whole lot
34 better than less sophisticated brute-force tools, I eventually decided to write
35 it off. You can still find its original documentation at:
36
37 https://code.google.com/p/bunny-the-fuzzer/wiki/BunnyDoc
38
39 There has been a fair amount of independent work, too. Most notably, a few
40 weeks earlier that year, Jared DeMott had a Defcon presentation about a
41 coverage-driven fuzzer that relied on coverage as a fitness function.
42
43 Jared's approach was by no means identical to what afl-fuzz does, but it was in
44 the same ballpark. His fuzzer tried to explicitly solve for the maximum coverage
45 with a single input file; in comparison, afl simply selects for cases that do
46 something new (which yields better results - see technical_details.txt).
47
48 A few years later, Gabriel Campana released fuzzgrind, a tool that relied purely
49 on Valgrind and a constraint solver to maximize coverage without any brute-force
50 bits; and Microsoft Research folks talked extensively about their still
51 non-public, solver-based SAGE framework.
52
53 In the past six years or so, I've also seen a fair number of academic papers
54 that dealt with smart fuzzing (focusing chiefly on symbolic execution) and a
55 couple papers that discussed proof-of-concept applications of genetic
56 algorithms with the same goals in mind. I'm unconvinced how practical most of
57 these experiments were; I suspect that many of them suffer from the
58 bunny-the-fuzzer's curse of being cool on paper and in carefully designed
59 experiments, but failing the ultimate test of being able to find new,
60 worthwhile security bugs in otherwise well-fuzzed, real-world software.
61
62 In some ways, the baseline that the "cool" solutions have to compete against is
63 a lot more impressive than it may seem, making it difficult for competitors to
64 stand out. For a singular example, check out the work by Gynvael and Mateusz
65 Jurczyk, applying "dumb" fuzzing to ffmpeg, a prominent and security-critical
66 component of modern browsers and media players:
67
68 http://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-fixes.htm l
69
70 Effortlessly getting comparable results with state-of-the-art symbolic execution
71 in equally complex software still seems fairly unlikely, and hasn't been
72 demonstrated in practice so far.
73
74 But I digress; ultimately, attribution is hard, and glorying the fundamental
75 concepts behind AFL is probably a waste of time. The devil is very much in the
76 often-overlooked details, which brings us to...
77
78 2) Design goals for afl-fuzz
79 ----------------------------
80
81 In short, I believe that the current implementation of afl-fuzz takes care of
82 several itches that seemed impossible to scratch with other tools:
83
84 1) Speed. It's genuinely hard to compete with brute force when your "smart"
85 approach is resource-intensive. If your instrumentation makes it 10x more
86 likely to find a bug, but runs 100x slower, your users are getting a bad
87 deal.
88
89 To avoid starting with a handicap, afl-fuzz is meant to let you fuzz most of
90 the intended targets at roughly their native speed - so even if it doesn't
91 add value, you do not lose much.
92
93 On top of this, the tool leverages instrumentation to actually reduce the
94 amount of work in a couple of ways: for example, by carefully trimming the
95 corpus or skipping non-functional but non-trimmable regions in the input
96 files.
97
98 2) Rock-solid reliability. It's hard to compete with brute force if your
99 approach is brittle and fails unexpectedly. Automated testing is attractive
100 because it's simple to use and scalable; anything that goes against these
101 principles is an unwelcome trade-off and means that your tool will be used
102 less often and with less consistent results.
103
104 Most of the approaches based on symbolic execution, taint tracking, or
105 complex syntax-aware instrumentation are currently fairly unreliable with
106 real-world targets. Perhaps more importantly, their failure modes can render
107 them strictly worse than "dumb" tools, and such degradation can be difficult
108 for less experienced users to notice and correct.
109
110 In contrast, afl-fuzz is designed to be rock solid, chiefly by keeping it
111 simple. In fact, at its core, it's designed to be just a very good
112 traditional fuzzer with a wide range of interesting, well-researched
113 strategies to go by. The fancy parts just help it focus the effort in
114 places where it matters the most.
115
116 3) Simplicity. The author of a testing framework is probably the only person
117 who truly understands the impact of all the settings offered by the tool -
118 and who can dial them in just right. Yet, even the most rudimentary fuzzer
119 frameworks often come with countless knobs and fuzzing ratios that need to
120 be guessed by the operator ahead of the time. This can do more harm than
121 good.
122
123 AFL is designed to avoid this as much as possible. The three knobs you
124 can play with are the output file, the memory limit, and the ability to
125 override the default, auto-calibrated timeout. The rest is just supposed to
126 work. When it doesn't, user-friendly error messages outline the probable
127 causes and workarounds, and get you back on track right away.
128
129 4) Chainability. Most general-purpose fuzzers can't be easily employed
130 against resource-hungry or interaction-heavy tools, necessitating the
131 creation of custom in-process fuzzers or the investment of massive CPU
132 power (most of which is wasted on tasks not directly related to the code
133 we actually want to test).
134
135 AFL tries to scratch this itch by allowing users to use more lightweight
136 targets (e.g., standalone image parsing libraries) to create small
137 corpora of interesting test cases that can be fed into a manual testing
138 process or a UI harness later on.
139
140 As mentioned in technical_details.txt, AFL does all this not by systematically
141 applying a single overarching CS concept, but by experimenting with a variety
142 of small, complementary methods that were shown to reliably yields results
143 better than chance. The use of instrumentation is a part of that toolkit, but is
144 far from being the most important one.
145
146 Ultimately, what matters is that afl-fuzz is designed to find cool bugs - and
147 has a pretty robust track record of doing just that.
OLDNEW
« no previous file with comments | « third_party/afl/src/docs/env_variables.txt ('k') | third_party/afl/src/docs/notes_for_asan.txt » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698