Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(469)

Unified Diff: third_party/afl/src/docs/technical_details.txt

Issue 2238013002: Roll src/third_party/afl/src/ 2.14b..2.30b (16 versions). (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Note in "Local Modifications" that we have removed dictionaries/. Created 4 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « third_party/afl/src/docs/status_screen.txt ('k') | third_party/afl/src/docs/visualization/afl_gzip.png » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: third_party/afl/src/docs/technical_details.txt
diff --git a/third_party/afl/src/docs/technical_details.txt b/third_party/afl/src/docs/technical_details.txt
index ec789a31999cb807a1063abf0dbca59540bedc43..3ec487414e7d87698cb1933d6b9023186fc58256 100644
--- a/third_party/afl/src/docs/technical_details.txt
+++ b/third_party/afl/src/docs/technical_details.txt
@@ -85,8 +85,8 @@ of dword- or qword-wide instructions and a simple loop.
When a mutated input produces an execution trace containing new tuples, the
corresponding input file is preserved and routed for additional processing
later on (see section #3). Inputs that do not trigger new local-scale state
-transitions in the execution trace are discarded, even if their overall
-instrumentation output pattern is unique.
+transitions in the execution trace (i.e., produce no new tuples) are discarded,
+even if their overall control flow sequence is unique.
This approach allows for a very fine-grained and long-term exploration of
program state while not having to perform any computationally intensive and
@@ -101,7 +101,7 @@ new tuples (CA, AE):
#2: A -> B -> C -> A -> E
At the same time, with #2 processed, the following pattern will not be seen
-as unique, despite having a markedly different execution path:
+as unique, despite having a markedly different overall execution path:
#3: A -> B -> C -> A -> B -> C -> A -> B -> C -> D -> E
@@ -142,9 +142,9 @@ Mutated test cases that produced new state transitions within the program are
added to the input queue and used as a starting point for future rounds of
fuzzing. They supplement, but do not automatically replace, existing finds.
-This approach allows the tool to progressively explore various disjoint and
-possibly mutually incompatible features of the underlying data format, as
-shown in this image:
+In contrast to more greedy genetic algorithms, this approach allows the tool
+to progressively explore various disjoint and possibly mutually incompatible
+features of the underlying data format, as shown in this image:
http://lcamtuf.coredump.cx/afl/afl_gzip.png
@@ -201,10 +201,10 @@ the sessions were seeded with a valid unified diff:
Edge coverage | 1,259 | 1,734 | 1.72 | 0
AFL model | 1,452 | 2,040 | 3.16 | 1
-Some of the earlier work on evolutionary fuzzing suggested maintaining just a
-single test case and selecting for mutations that improve coverage. At least
-in the tests described above, this "greedy" method appeared to offer no
-substantial benefits over blind fuzzing.
+At noted earlier on, some of the prior work on genetic fuzzing relied on
+maintaining a single test case and evolving it to maximize coverage. At least
+in the tests described above, this "greedy" approach appears to confer no
+substantial benefits over blind fuzzing strategies.
4) Culling the corpus
---------------------
@@ -263,9 +263,9 @@ files make the target binary slower, and because they reduce the likelihood
that a mutation would touch important format control structures, rather than
redundant data blocks. This is discussed in more detail in perf_tips.txt.
-The possibility of a bad starting corpus provided by the user aside, some
-types of mutations can have the effect of iteratively increasing the size of
-the generated files, so it is important to counter this trend.
+The possibility that the user will provide a low-quality starting corpus aside,
+some types of mutations can have the effect of iteratively increasing the size
+of the generated files, so it is important to counter this trend.
Luckily, the instrumentation feedback provides a simple way to automatically
trim down input files while ensuring that the changes made to the files have no
@@ -275,11 +275,11 @@ The built-in trimmer in afl-fuzz attempts to sequentially remove blocks of data
with variable length and stepover; any deletion that doesn't affect the checksum
of the trace map is committed to disk. The trimmer is not designed to be
particularly thorough; instead, it tries to strike a balance between precision
-and the number of execve() calls spent on the process. The average per-file
-gains are around 5-20%.
+and the number of execve() calls spent on the process, selecting the block size
+and stepover to match. The average per-file gains are around 5-20%.
The standalone afl-tmin tool uses a more exhaustive, iterative algorithm, and
-also attempts to perform alphabet normalization on the trimmed files.
+also attempts to perform alphabet normalization on the trimmed files.
6) Fuzzing strategies
---------------------
@@ -302,11 +302,16 @@ strategies include:
- Sequential insertion of known interesting integers (0, 1, INT_MAX, etc),
-The non-deterministic steps include stacked bit flips, insertions, deletions,
-arithmetics, and splicing of different test cases.
+The purpose of opening with deterministic steps is related to their tendency to
+produce compact test cases and small diffs between the non-crashing and crashing
+inputs.
-Their relative yields and execve() costs have been investigated and are
-discussed in the aforementioned blog post.
+With deterministic fuzzing out of the way, the non-deterministic steps include
+stacked bit flips, insertions, deletions, arithmetics, and splicing of different
+test cases.
+
+The relative yields and execve() costs of all these strategies have been
+investigated and are discussed in the aforementioned blog post.
For the reasons discussed in historical_notes.txt (chiefly, performance,
simplicity, and reliability), AFL generally does not try to reason about the
@@ -315,13 +320,14 @@ are nominally blind, and are guided only by the evolutionary design of the
input queue.
That said, there is one (trivial) exception to this rule: when a new queue
-entry goes through the initial set of deterministic fuzzing steps, and some
-regions in the file are observed to have no effect on the checksum of the
+entry goes through the initial set of deterministic fuzzing steps, and tweaks to
+some regions in the file are observed to have no effect on the checksum of the
execution path, they may be excluded from the remaining phases of
-deterministic fuzzing - and proceed straight to random tweaks. Especially for
-verbose, human-readable data formats, this can reduce the number of execs by
-10-40% or so without an appreciable drop in coverage. In extreme cases, such
-as normally block-aligned tar archives, the gains can be as high as 90%.
+deterministic fuzzing - and the fuzzer may proceed straight to random tweaks.
+Especially for verbose, human-readable data formats, this can reduce the number
+of execs by 10-40% or so without an appreciable drop in coverage. In extreme
+cases, such as normally block-aligned tar archives, the gains can be as high as
+90%.
Because the underlying "effector maps" are local every queue entry and remain
in force only during deterministic stages that do not alter the size or the
@@ -353,6 +359,14 @@ the grammar of highly verbose and complex languages such as JavaScript, SQL,
or XML; several examples of generated SQL statements are given in the blog
post mentioned above.
+Interestingly, the AFL instrumentation also allows the fuzzer to automatically
+isolate syntax tokens already present in an input file. It can do so by looking
+for run of bytes that, when flipped, produce a consistent change to the
+program's execution path; this is suggestive of an underlying atomic comparison
+to a predefined value baked into the code. The fuzzer relies on this signal
+to build compact "auto dictionaries" that are then used in conjunction with
+other fuzzing strategies.
+
8) De-duping crashes
--------------------
@@ -416,22 +430,25 @@ With fast targets, the fork server can offer considerable performance gains,
usually between 1.5x and 2x. It is also possible to:
- Use the fork server in manual ("deferred") mode, skipping over larger,
- user-selected chunks of initialization code. With some targets, this can
+ user-selected chunks of initialization code. It requires very modest
+ code changes to the targeted program, and With some targets, can
produce 10x+ performance gains.
- Enable "persistent" mode, where a single process is used to try out
multiple inputs, greatly limiting the overhead of repetitive fork()
- calls. As with the previous mode, this requires custom modifications,
+ calls. This generally requires some code changes to the targeted program,
but can improve the performance of fast targets by a factor of 5 or more
- - approximating the benefits of in-process fuzzing jobs.
+ - approximating the benefits of in-process fuzzing jobs while still
+ maintaining very robust isolation between the fuzzer process and the
+ targeted binary.
11) Parallelization
-------------------
The parallelization mechanism relies on periodically examining the queues
produced by independently-running instances on other CPU cores or on remote
-machines, and then selectively pulling in the test cases that produce behaviors
-not yet seen by the fuzzer at hand.
+machines, and then selectively pulling in the test cases that, when tried
+out locally, produce behaviors not yet seen by the fuzzer at hand.
This allows for extreme flexibility in fuzzer setup, including running synced
instances against different parsers of a common data format, often with
« no previous file with comments | « third_party/afl/src/docs/status_screen.txt ('k') | third_party/afl/src/docs/visualization/afl_gzip.png » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698