third_party/afl/src/docs/technical_details.txt - Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/

Side by Side Diff: third_party/afl/src/docs/technical_details.txt

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Fix nits Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 ===================================

	2 Technical "whitepaper" for afl-fuzz

	3 ===================================

	4

	5 This document provides a quick overview of the guts of American Fuzzy Lop.

	6 See README for the general instruction manual; and for a discussion of

	7 motivations and design goals behind AFL, see historical_notes.txt.

	8

	9 0) Design statement

	10 -------------------

	11

	12 American Fuzzy Lop does its best not to focus on any singular principle of

	13 operation and not be a proof-of-concept for any specific theory. The tool can

	14 be thought of as a collection of hacks that have been tested in practice,

	15 found to be surprisingly effective, and have been implemented in the simplest,

	16 most robust way I could think of at the time.

	17

	18 Many of the resulting features are made possible thanks to the availability of

	19 lightweight instrumentation that served as a foundation for the tool, but this

	20 mechanism should be thought of merely as a means to an end. The only true

	21 governing principles are speed, reliability, and ease of use.

	22

	23 1) Coverage measurements

	24 ------------------------

	25

	26 The instrumentation injected into compiled programs captures branch (edge)

	27 coverage, along with coarse branch-taken hit counts. The code injected at

	28 branch points is essentially equivalent to:

	29

	30 cur_location = <COMPILE_TIME_RANDOM>;

	31 shared_mem[cur_location ^ prev_location]++;

	32 prev_location = cur_location >> 1;

	33

	34 The cur_location value is generated randomly to simplify the process of

	35 linking complex projects and keep the XOR output distributed uniformly.

	36

	37 The shared_mem[] array is a 64 kB SHM region passed to the instrumented binary

	38 by the caller. Every byte set in the output map can be thought of as a hit for

	39 a particular (branch_src, branch_dst) tuple in the instrumented code.

	40

	41 The size of the map is chosen so that collisions are sporadic with almost all

	42 of the intended targets, which usually sport between 2k and 10k discoverable

	43 branch points:

	44

	45 Branch cnt \| Colliding tuples \| Example targets

	46 ------------+------------------+-----------------

	47 1,000 \| 0.75% \| giflib, lzo

	48 2,000 \| 1.5% \| zlib, tar, xz

	49 5,000 \| 3.5% \| libpng, libwebp

	50 10,000 \| 7% \| libxml

	51 20,000 \| 14% \| sqlite

	52 50,000 \| 30% \| -

	53

	54 At the same time, its size is small enough to allow the map to be analyzed

	55 in a matter of microseconds on the receiving end, and to effortlessly fit

	56 within L2 cache.

	57

	58 This form of coverage provides considerably more insight into the execution

	59 path of the program than simple block coverage. In particular, it trivially

	60 distinguishes between the following execution traces:

	61

	62 A -> B -> C -> D -> E (tuples: AB, BC, CD, DE)

	63 A -> B -> D -> C -> E (tuples: AB, BD, DC, CE)

	64

	65 This aids the discovery of subtle fault conditions in the underlying code,

	66 because security vulnerabilities are more often associated with unexpected

	67 or incorrect state transitions than with merely reaching a new basic block.

	68

	69 The reason for the shift operation in the last line of the pseudocode shown

	70 earlier in this section is to preserve the directionality of tuples (without

	71 this, A ^ B would be indistinguishable from B ^ A) and to retain the identity

	72 of tight loops (otherwise, A ^ A would be obviously equal to B ^ B).

	73

	74 The absence of simple saturating arithmetic opcodes on Intel CPUs means that

	75 the hit counters can sometimes wrap around to zero. Since this is a fairly

	76 unlikely and localized event, it's seen as an acceptable performance trade-off.

	77

	78 2) Detecting new behaviors

	79 --------------------------

	80

	81 The fuzzer maintains a global map of tuples seen in previous executions; this

	82 data can be rapidly compared with individual traces and updated in just a couple

	83 of dword- or qword-wide instructions and a simple loop.

	84

	85 When a mutated input produces an execution trace containing new tuples, the

	86 corresponding input file is preserved and routed for additional processing

	87 later on (see section #3). Inputs that do not trigger new local-scale state

	88 transitions in the execution trace are discarded, even if their overall

	89 instrumentation output pattern is unique.

	90

	91 This approach allows for a very fine-grained and long-term exploration of

	92 program state while not having to perform any computationally intensive and

	93 fragile global comparisons of complex execution traces, and while avoiding the

	94 scourge of path explosion.

	95

	96 To illustrate the properties of the algorithm, consider that the second trace

	97 shown below would be considered substantially new because of the presence of

	98 new tuples (CA, AE):

	99

	100 #1: A -> B -> C -> D -> E

	101 #2: A -> B -> C -> A -> E

	102

	103 At the same time, with #2 processed, the following pattern will not be seen

	104 as unique, despite having a markedly different execution path:

	105

	106 #3: A -> B -> C -> A -> B -> C -> A -> B -> C -> D -> E

	107

	108 In addition to detecting new tuples, the fuzzer also considers coarse tuple

	109 hit counts. These are divided into several buckets:

	110

	111 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+

	112

	113 To some extent, the number of buckets is an implementation artifact: it allows

	114 an in-place mapping of an 8-bit counter generated by the instrumentation to

	115 an 8-position bitmap relied on by the fuzzer executable to keep track of the

	116 already-seen execution counts for each tuple.

	117

	118 Changes within the range of a single bucket are ignored; transition from one

	119 bucket to another is flagged as an interesting change in program control flow,

	120 and is routed to the evolutionary process outlined in the section below.

	121

	122 The hit count behavior provides a way to distinguish between potentially

	123 interesting control flow changes, such as a block of code being executed

	124 twice when it was normally hit only once. At the same time, it is fairly

	125 insensitive to empirically less notable changes, such as a loop going from

	126 47 cycles to 48. The counters also provide some degree of "accidental"

	127 immunity against tuple collisions in dense trace maps.

	128

	129 The execution is policed fairly heavily through memory and execution time

	130 limits; by default, the timeout is set at 5x the initially-calibrated

	131 execution speed, rounded up to 20 ms. The aggressive timeouts are meant to

	132 prevent dramatic fuzzer performance degradation by descending into tarpits

	133 that, say, improve coverage by 1% while being 100x slower; we pragmatically

	134 reject them and hope that the fuzzer will find a less expensive way to reach

	135 the same code. Empirical testing strongly suggests that more generous time

	136 limits are not worth the cost.

	137

	138 3) Evolving the input queue

	139 ---------------------------

	140

	141 Mutated test cases that produced new state transitions within the program are

	142 added to the input queue and used as a starting point for future rounds of

	143 fuzzing. They supplement, but do not automatically replace, existing finds.

	144

	145 This approach allows the tool to progressively explore various disjoint and

	146 possibly mutually incompatible features of the underlying data format, as

	147 shown in this image:

	148

	149 http://lcamtuf.coredump.cx/afl/afl_gzip.png

	150

	151 Several practical examples of the results of this algorithm are discussed

	152 here:

	153

	154 http://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html

	155 http://lcamtuf.blogspot.com/2014/11/afl-fuzz-nobody-expects-cdata-sections.htm l

	156

	157 The synthetic corpus produced by this process is essentially a compact

	158 collection of "hmm, this does something new!" input files, and can be used to

	159 seed any other testing processes down the line (for example, to manually

	160 stress-test resource-intensive desktop apps).

	161

	162 With this approach, the queue for most targets grows to somewhere between 1k

	163 and 10k entries; approximately 10-30% of this is attributable to the discovery

	164 of new tuples, and the remainder is associated with changes in hit counts.

	165

	166 The following table compares the relative ability to discover file syntax and

	167 explore program states when using several different approaches to guided

	168 fuzzing. The instrumented target was GNU patch 2.7.3 compiled with -O3 and

	169 seeded with a dummy text file; the session consisted of a single pass over the

	170 input queue with afl-fuzz:

	171

	172 Fuzzer guidance \| Blocks \| Edges \| Edge hit \| Highest-coverage

	173 strategy used \| reached \| reached \| cnt var \| test case generated

	174 ------------------+---------+---------+----------+---------------------------

	175 (Initial file) \| 156 \| 163 \| 1.00 \| (none)

	176 \| \| \| \|

	177 Blind fuzzing S \| 182 \| 205 \| 2.23 \| First 2 B of RCS diff

	178 Blind fuzzing L \| 228 \| 265 \| 2.23 \| First 4 B of -c mode diff

	179 Block coverage \| 855 \| 1,130 \| 1.57 \| Almost-valid RCS diff

	180 Edge coverage \| 1,452 \| 2,070 \| 2.18 \| One-chunk -c mode diff

	181 AFL model \| 1,765 \| 2,597 \| 4.99 \| Four-chunk -c mode diff

	182

	183 The first entry for blind fuzzing ("S") corresponds to executing just a single

	184 round of testing; the second set of figures ("L") shows the fuzzer running in a

	185 loop for a number of execution cycles comparable with that of the instrumented

	186 runs, which required more time to fully process the growing queue.

	187

	188 Roughly similar results have been obtained in a separate experiment where the

	189 fuzzer was modified to compile out all the random fuzzing stages and leave just

	190 a series of rudimentary, sequential operations such as walking bit flips.

	191 Because this mode would be incapable of altering the size of the input file,

	192 the sessions were seeded with a valid unified diff:

	193

	194 Queue extension \| Blocks \| Edges \| Edge hit \| Number of unique

	195 strategy used \| reached \| reached \| cnt var \| crashes found

	196 ------------------+---------+---------+----------+------------------

	197 (Initial file) \| 624 \| 717 \| 1.00 \| -

	198 \| \| \| \|

	199 Blind fuzzing \| 1,101 \| 1,409 \| 1.60 \| 0

	200 Block coverage \| 1,255 \| 1,649 \| 1.48 \| 0

	201 Edge coverage \| 1,259 \| 1,734 \| 1.72 \| 0

	202 AFL model \| 1,452 \| 2,040 \| 3.16 \| 1

	203

	204 Some of the earlier work on evolutionary fuzzing suggested maintaining just a

	205 single test case and selecting for mutations that improve coverage. At least

	206 in the tests described above, this "greedy" method appeared to offer no

	207 substantial benefits over blind fuzzing.

	208

	209 4) Culling the corpus

	210 ---------------------

	211

	212 The progressive state exploration approach outlined above means that some of

	213 the test cases synthesized later on in the game may have edge coverage that

	214 is a strict superset of the coverage provided by their ancestors.

	215

	216 To optimize the fuzzing effort, AFL periodically re-evaluates the queue using a

	217 fast algorithm that selects a smaller subset of test cases that still cover

	218 every tuple seen so far, and whose characteristics make them particularly

	219 favorable to the tool.

	220

	221 The algorithm works by assigning every queue entry a score proportional to its

	222 execution latency and file size; and then selecting lowest-scoring candidates

	223 for each tuple.

	224

	225 The tuples are then processed sequentially using a simple workflow:

	226

	227 1) Find next tuple not yet in the temporary working set,

	228

	229 2) Locate the winning queue entry for this tuple,

	230

	231 3) Register all tuples present in that entry's trace in the working set,

	232

	233 4) Go to #1 if there are any missing tuples in the set.

	234

	235 The generated corpus of "favored" entries is usually 5-10x smaller than the

	236 starting data set. Non-favored entries are not discarded, but they are skipped

	237 with varying probabilities when encountered in the queue:

	238

	239 - If there are new, yet-to-be-fuzzed favorites present in the queue, 99%

	240 of non-favored entries will be skipped to get to the favored ones.

	241

	242 - If there are no new favorites:

	243

	244 - If the current non-favored entry was fuzzed before, it will be skipped

	245 95% of the time.

	246

	247 - If it hasn't gone through any fuzzing rounds yet, the odds of skipping

	248 drop down to 75%.

	249

	250 Based on empirical testing, this provides a reasonable balance between queue

	251 cycling speed and test case diversity.

	252

	253 Slightly more sophisticated but much slower culling can be performed on input

	254 or output corpora with afl-cmin. This tool permanently discards the redundant

	255 entries and produces a smaller corpus suitable for use with afl-fuzz or

	256 external tools.

	257

	258 5) Trimming input files

	259 -----------------------

	260

	261 File size has a dramatic impact on fuzzing performance, both because large

	262 files make the target binary slower, and because they reduce the likelihood

	263 that a mutation would touch important format control structures, rather than

	264 redundant data blocks. This is discussed in more detail in perf_tips.txt.

	265

	266 The possibility of a bad starting corpus provided by the user aside, some

	267 types of mutations can have the effect of iteratively increasing the size of

	268 the generated files, so it is important to counter this trend.

	269

	270 Luckily, the instrumentation feedback provides a simple way to automatically

	271 trim down input files while ensuring that the changes made to the files have no

	272 impact on the execution path.

	273

	274 The built-in trimmer in afl-fuzz attempts to sequentially remove blocks of data

	275 with variable length and stepover; any deletion that doesn't affect the checksum

	276 of the trace map is committed to disk. The trimmer is not designed to be

	277 particularly thorough; instead, it tries to strike a balance between precision

	278 and the number of execve() calls spent on the process. The average per-file

	279 gains are around 5-20%.

	280

	281 The standalone afl-tmin tool uses a more exhaustive, iterative algorithm, and

	282 also attempts to perform alphabet normalization on the trimmed files.

	283

	284 6) Fuzzing strategies

	285 ---------------------

	286

	287 The feedback provided by the instrumentation makes it easy to understand the

	288 value of various fuzzing strategies and optimize their parameters so that they

	289 work equally well across a wide range of file types. The strategies used by

	290 afl-fuzz are generally format-agnostic and are discussed in more detail here:

	291

	292 http://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html

	293

	294 It is somewhat notable that especially early on, most of the work done by

	295 afl-fuzz is actually highly deterministic, and progresses to random stacked

	296 modifications and test case splicing only at a later stage. The deterministic

	297 strategies include:

	298

	299 - Sequential bit flips with varying lengths and stepovers,

	300

	301 - Sequential addition and subtraction of small integers,

	302

	303 - Sequential insertion of known interesting integers (0, 1, INT_MAX, etc),

	304

	305 The non-deterministic steps include stacked bit flips, insertions, deletions,

	306 arithmetics, and splicing of different test cases.

	307

	308 Their relative yields and execve() costs have been investigated and are

	309 discussed in the aforementioned blog post.

	310

	311 For the reasons discussed in historical_notes.txt (chiefly, performance,

	312 simplicity, and reliability), AFL generally does not try to reason about the

	313 relationship between specific mutations and program states; the fuzzing steps

	314 are nominally blind, and are guided only by the evolutionary design of the

	315 input queue.

	316

	317 That said, there is one (trivial) exception to this rule: when a new queue

	318 entry goes through the initial set of deterministic fuzzing steps, and some

	319 regions in the file are observed to have no effect on the checksum of the

	320 execution path, they may be excluded from the remaining phases of

	321 deterministic fuzzing - and proceed straight to random tweaks. Especially for

	322 verbose, human-readable data formats, this can reduce the number of execs by

	323 10-40% or so without an appreciable drop in coverage. In extreme cases, such

	324 as normally block-aligned tar archives, the gains can be as high as 90%.

	325

	326 Because the underlying "effector maps" are local every queue entry and remain

	327 in force only during deterministic stages that do not alter the size or the

	328 general layout of the underlying file, this mechanism appears to work very

	329 reliably and proved to be simple to implement.

	330

	331 7) Dictionaries

	332 ---------------

	333

	334 The feedback provided by the instrumentation makes it easy to automatically

	335 identify syntax tokens in some types of input files, and to detect that certain

	336 combinations of predefined or auto-detected dictionary terms constitute a

	337 valid grammar for the tested parser.

	338

	339 A discussion of how these features are implemented within afl-fuzz can be found

	340 here:

	341

	342 http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html

	343

	344 In essence, when basic, typically easily-obtained syntax tokens are combined

	345 together in a purely random manner, the instrumentation and the evolutionary

	346 design of the queue together provide a feedback mechanism to differentiate

	347 between meaningless mutations and ones that trigger new behaviors in the

	348 instrumented code - and to incrementally build more complex syntax on top of

	349 this discovery.

	350

	351 The dictionaries have been shown to enable the fuzzer to rapidly reconstruct

	352 the grammar of highly verbose and complex languages such as JavaScript, SQL,

	353 or XML; several examples of generated SQL statements are given in the blog

	354 post mentioned above.

	355

	356 8) De-duping crashes

	357 --------------------

	358

	359 De-duplication of crashes is one of the more important problems for any

	360 competent fuzzing tool. Many of the naive approaches run into problems; in

	361 particular, looking just at the faulting address may lead to completely

	362 unrelated issues being clustered together if the fault happens in a common

	363 library function (say, strcmp, strcpy); while checksumming call stack

	364 backtraces can lead to extreme crash count inflation if the fault can be

	365 reached through a number of different, possibly recursive code paths.

	366

	367 The solution implemented in afl-fuzz considers a crash unique if any of two

	368 conditions are met:

	369

	370 - The crash trace includes a tuple not seen in any of the previous crashes,

	371

	372 - The crash trace is missing a tuple that was always present in earlier

	373 faults.

	374

	375 The approach is vulnerable to some path count inflation early on, but exhibits

	376 a very strong self-limiting effect, similar to the execution path analysis

	377 logic that is the cornerstone of afl-fuzz.

	378

	379 9) Investigating crashes

	380 ------------------------

	381

	382 The exploitability of many types of crashes can be ambiguous; afl-fuzz tries

	383 to address this by providing a crash exploration mode where a known-faulting

	384 test case is fuzzed in a manner very similar to the normal operation of the

	385 fuzzer, but with a constraint that causes any non-crashing mutations to be

	386 thrown away.

	387

	388 A detailed discussion of the value of this approach can be found here:

	389

	390 http://lcamtuf.blogspot.com/2014/11/afl-fuzz-crash-exploration-mode.html

	391

	392 The method uses instrumentation feedback to explore the state of the crashing

	393 program to get past the ambiguous faulting condition and then isolate the

	394 newly-found inputs for human review.

	395

	396 On the subject of crashes, it is worth noting that in contrast to normal

	397 queue entries, crashing inputs are not trimmed; they are kept exactly as

	398 discovered to make it easier to compare them to the parent, non-crashing entry

	399 in the queue. That said, afl-tmin can be used to shrink them at will.

	400

	401 10) The fork server

	402 -------------------

	403

	404 To improve performance, afl-fuzz uses a "fork server", where the fuzzed process

	405 goes through execve(), linking, and libc initialization only once, and is then

	406 cloned from a stopped process image by leveraging copy-on-write. The

	407 implementation is described in more detail here:

	408

	409 http://lcamtuf.blogspot.com/2014/10/fuzzing-binaries-without-execve.html

	410

	411 The fork server is an integral aspect of the injected instrumentation and

	412 simply stops at the first instrumented function to await commands from

	413 afl-fuzz.

	414

	415 With fast targets, the fork server can offer considerable performance gains,

	416 usually between 1.5x and 2x. It is also possible to:

	417

	418 - Use the fork server in manual ("deferred") mode, skipping over larger,

	419 user-selected chunks of initialization code. With some targets, this can

	420 produce 10x+ performance gains.

	421

	422 - Enable "persistent" mode, where a single process is used to try out

	423 multiple inputs, greatly limiting the overhead of repetitive fork()

	424 calls. As with the previous mode, this requires custom modifications,

	425 but can improve the performance of fast targets by a factor of 5 or more

	426 - approximating the benefits of in-process fuzzing jobs.

	427

	428 11) Parallelization

	429 -------------------

	430

	431 The parallelization mechanism relies on periodically examining the queues

	432 produced by independently-running instances on other CPU cores or on remote

	433 machines, and then selectively pulling in the test cases that produce behaviors

	434 not yet seen by the fuzzer at hand.

	435

	436 This allows for extreme flexibility in fuzzer setup, including running synced

	437 instances against different parsers of a common data format, often with

	438 synergistic effects.

	439

	440 For more information about this design, see parallel_fuzzing.txt.

	441

	442 12) Binary-only instrumentation

	443 -------------------------------

	444

	445 Instrumentation of black-box, binary-only targets is accomplished with the

	446 help of a separately-built version of QEMU in "user emulation" mode. This also

	447 allows the execution of cross-architecture code - say, ARM binaries on x86.

	448

	449 QEMU uses basic blocks as translation units; the instrumentation is implemented

	450 on top of this and uses a model roughly analogous to the compile-time hooks:

	451

	452 if (block_address > elf_text_start && block_address < elf_text_end) {

	453

	454 cur_location = (block_address >> 4) ^ (block_address << 8);

	455 shared_mem[cur_location ^ prev_location]++;

	456 prev_location = cur_location >> 1;

	457

	458 }

	459

	460 The shift-and-XOR-based scrambling in the second line is used to mask the

	461 effects of instruction alignment.

	462

	463 The start-up of binary translators such as QEMU, DynamoRIO, and PIN is fairly

	464 slow; to counter this, the QEMU mode leverages a fork server similar to that

	465 used for compiler-instrumented code, effectively spawning copies of an

	466 already-initialized process paused at _start.

	467

	468 First-time translation of a new basic block also incurs substantial latency. To

	469 eliminate this problem, the AFL fork server is extended by providing a channel

	470 between the running emulator and the parent process. The channel is used

	471 to notify the parent about the addresses of any newly-encountered blocks and to

	472 add them to the translation cache that will be replicated for future child

	473 processes.

	474

	475 As a result of these two optimizations, the overhead of the QEMU mode is

	476 roughly 2-5x, compared to 100x+ for PIN.

	477

	478 13) The afl-analyze tool

	479 ------------------------

	480

	481 The file format analyzer is a simple extension of the minimization algorithm

	482 discussed earlier on; instead of attempting to remove no-op blocks, the tool

	483 performs a series of walking byte flips and then annotates runs of bytes

	484 in the input file.

	485

	486 It uses the following classification scheme:

	487

	488 - "No-op blocks" - segments where bit flips cause no apparent changes to

	489 control flow. Common examples may be comment sections, pixel data within

	490 a bitmap file, etc.

	491

	492 - "Superficial content" - segments where some, but not all, bitflips

	493 produce some control flow changes. Examples may include strings in rich

	494 documents (e.g., XML, RTF).

	495

	496 - "Critical stream" - a sequence of bytes where all bit flips alter control

	497 flow in different but correlated ways. This may be compressed data,

	498 non-atomically compared keywords or magic values, etc.

	499

	500 - "Suspected length field" - small, atomic integer that, when touched in

	501 any way, causes a consistent change to program control flow, suggestive

	502 of a failed length check.

	503

	504 - "Suspected cksum or magic int" - an integer that behaves similarly to a

	505 length field, but has a numerical value that makes the length explanation

	506 unlikely. This is suggestive of a checksum or other "magic" integer.

	507

	508 - "Suspected checksummed block" - a long block of data where any change

	509 always triggers the same new execution path. Likely caused by failing

	510 a checksum or a similar integrity check before any subsequent parsing

	511 takes place.

	512

	513 - "Magic value section" - a generic token where changes cause the type

	514 of binary behavior outlined earlier, but that doesn't meet any of the

	515 other criteria. May be an atomically compared keyword or so.

OLD	NEW

« no previous file with comments | « third_party/afl/src/docs/status_screen.txt ('k') | third_party/afl/src/experimental/README.experiments » ('j') | no next file with comments »