Chromium Code Reviews

Side by Side Diff: third_party/afl/src/docs/parallel_fuzzing.txt

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Fix nits Created 4 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff |
« no previous file with comments | « third_party/afl/src/docs/notes_for_asan.txt ('k') | third_party/afl/src/docs/perf_tips.txt » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 =========================
2 Tips for parallel fuzzing
3 =========================
4
5 This document talks about synchronizing afl-fuzz jobs on a single machine
6 or across a fleet of systems. See README for the general instruction manual.
7
8 1) Introduction
9 ---------------
10
11 Every copy of afl-fuzz will take up one CPU core. This means that on an
12 n-core system, you can almost always run around n concurrent fuzzing jobs with
13 virtually no performance hit (you can use the afl-gotcpu tool to make sure).
14
15 In fact, if you rely on just a single job on a multi-core system, you will
16 be underutilizing the hardware. So, parallelization is usually the right
17 way to go.
18
19 When targeting multiple unrelated binaries or using the tool in "dumb" (-n)
20 mode, it is perfectly fine to just start up several fully separate instances
21 of afl-fuzz. The picture gets more complicated when you want to have multiple
22 fuzzers hammering a common target: if a hard-to-hit but interesting test case
23 is synthesized by one fuzzer, the remaining instances will not be able to use
24 that input to guide their work.
25
26 To help with this problem, afl-fuzz offers a simple way to synchronize test
27 cases on the fly.
28
29 2) Single-system parallelization
30 --------------------------------
31
32 If you wish to parallelize a single job across multiple cores on a local
33 system, simply create a new, empty output directory ("sync dir") that will be
34 shared by all the instances of afl-fuzz; and then come up with a naming scheme
35 for every instance - say, "fuzzer01", "fuzzer02", etc.
36
37 Run the first one ("master", -M) like this:
38
39 $ ./afl-fuzz -i testcase_dir -o sync_dir -M fuzzer01 [...other stuff...]
40
41 ...and then, start up secondary (-S) instances like this:
42
43 $ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer02 [...other stuff...]
44 $ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer03 [...other stuff...]
45
46 Each fuzzer will keep its state in a separate subdirectory, like so:
47
48 /path/to/sync_dir/fuzzer01/
49
50 Each instance will also periodically rescan the top-level sync directory
51 for any test cases found by other fuzzers - and will incorporate them into
52 its own fuzzing when they are deemed interesting enough.
53
54 The only difference between the -M and -S modes is that the master instance
55 will still perform deterministic checks; while the secondary instances will
56 proceed straight to random tweaks. If you don't want to do deterministic
57 fuzzing at all, it's OK to run all instances with -S. With very slow or complex
58 targets, or when running heavily parallelized jobs, this is usually a good plan.
59
60 You can monitor the progress of your jobs from the command line with the
61 provided afl-whatsup tool. When the instances are no longer finding new paths,
62 it's probably time to stop.
63
64 WARNING: Exercise caution when explicitly specifying the -f option. Each fuzzer
65 must use a separate temporary file; otherwise, things will go south. One safe
66 example may be:
67
68 $ ./afl-fuzz [...] -S fuzzer10 -f file10.txt ./fuzzed/binary @@
69 $ ./afl-fuzz [...] -S fuzzer11 -f file11.txt ./fuzzed/binary @@
70 $ ./afl-fuzz [...] -S fuzzer12 -f file12.txt ./fuzzed/binary @@
71
72 This is not a concern if you use @@ without -f and let afl-fuzz come up with the
73 file name.
74
75 3) Multi-system parallelization
76 -------------------------------
77
78 The basic operating principle for multi-system parallelization is similar to
79 the mechanism explained in section 2. The key difference is that you need to
80 write a simple script that performs two actions:
81
82 - Uses SSH with authorized_keys to connect to every machine and retrieve
83 a tar archive of the /path/to/sync_dir/<fuzzer_id>/queue/ directories for
84 every <fuzzer_id> local to the machine. It's best to use a naming scheme
85 that includes host name in the fuzzer ID, so that you can do something
86 like:
87
88 for s in {1..10}; do
89 ssh user@host${s} "tar -czf - sync/host${s}_fuzzid*/[qf]*" >host${s}.tgz
90 done
91
92 - Distributes and unpacks these files on all the remaining machines, e.g.:
93
94 for s in {1..10}; do
95 for d in {1..10}; do
96 test "$s" = "$d" && continue
97 ssh user@host${d} 'tar -kxzf -' <host${s}.tgz
98 done
99 done
100
101 There is an example of such a script in experimental/distributed_fuzzing/;
102 you can also find a more featured, experimental tool developed by
103 Martijn Bogaard at:
104
105 https://github.com/MartijnB/disfuzz-afl
106
107 Another client-server implementation from Richo Healey is:
108
109 https://github.com/richo/roving
110
111 Note that these third-party tools are unsafe to run on systems exposed to the
112 Internet or to untrusted users.
113
114 When developing custom test case sync code, there are several optimizations
115 to keep in mind:
116
117 - The synchronization does not have to happen very often; running the
118 task every 30 minutes or so may be perfectly fine.
119
120 - There is no need to synchronize crashes/ or hangs/; you only need to
121 copy over queue/* (and ideally, also fuzzer_stats).
122
123 - It is not necessary (and not advisable!) to overwrite existing files;
124 the -k option in tar is a good way to avoid that.
125
126 - There is no need to fetch directories for fuzzers that are not running
127 locally on a particular machine, and were simply copied over onto that
128 system during earlier runs.
129
130 - For large fleets, you will want to consolidate tarballs for each host,
131 as this will let you use n SSH connections for sync, rather than n*(n-1).
132
133 You may also want to implement staged synchronization. For example, you
134 could have 10 groups of systems, with group 1 pushing test cases only
135 to group 2; group 2 pushing them only to group 3; and so on, with group
136 eventually 10 feeding back to group 1.
137
138 This arrangement would allow test interesting cases to propagate across
139 the fleet without having to copy every fuzzer queue to every single host.
140
141 - You do not want a "master" instance of afl-fuzz on every system; you should
142 run them all with -S, and just designate a single process somewhere within
143 the fleet to run with -M.
144
145 It is *not* advisable to skip the synchronization script and run the fuzzers
146 directly on a network filesystem; unexpected latency and unkillable processes
147 in I/O wait state can mess things up.
148
149 4) Remote monitoring and data collection
150 ----------------------------------------
151
152 You can use screen, nohup, tmux, or something equivalent to run remote
153 instances of afl-fuzz. If you redirect the program's output to a file, it will
154 automatically switch from a fancy UI to more limited status reports. There is
155 also basic machine-readable information always written to the fuzzer_stats file
156 in the output directory. Locally, that information can be interpreted with
157 afl-whatsup.
158
159 In principle, you can use the status screen of the master (-M) instance to
160 monitor the overall fuzzing progress and decide when to stop. In this
161 mode, the most important signal is just that no new paths are being found
162 for a longer while. If you do not have a master instance, just pick any
163 single secondary instance to watch and go by that.
164
165 You can also rely on that instance's output directory to collect the
166 synthesized corpus that covers all the noteworthy paths discovered anywhere
167 within the fleet. Secondary (-S) instances do not require any special
168 monitoring, other than just making sure that they are up.
169
170 Keep in mind that crashing inputs are *not* automatically propagated to the
171 master instance, so you may still want to monitor for crashes fleet-wide
172 from within your synchronization or health checking scripts (see afl-whatsup).
173
174 5) Asymmetric setups
175 --------------------
176
177 It is perhaps worth noting that all of the following is permitted:
178
179 - Running afl-fuzz with conjunction with other guided tools that can extend
180 coverage (e.g., via concolic execution). Third-party tools simply need to
181 follow the protocol described above for pulling new test cases from
182 out_dir/<fuzzer_id>/queue/* and writing their own finds to sequentially
183 numbered id:nnnnnn files in out_dir/<ext_tool_id>/queue/*.
184
185 - Running some of the synchronized fuzzers with different (but related)
186 target binaries. For example, simultaneously stress-testing several
187 different JPEG parsers (say, IJG jpeg and libjpeg-turbo) while sharing
188 the discovered test cases can have synergistic effects and improve the
189 overall coverage.
190
191 (In this case, running one -M instance per each binary is a good plan.)
192
193 - Having some of the fuzzers invoke the binary in different ways.
194 For example, 'djpeg' supports several DCT modes, configurable with
195 a command-line flag, while 'dwebp' supports incremental and one-shot
196 decoding. In some scenarios, going after multiple distinct modes and then
197 pooling test cases will improve coverage.
198
199 - Much less convincingly, running the synchronized fuzzers with different
200 starting test cases (e.g., progressive and standard JPEG) or dictionaries.
201 The synchronization mechanism ensures that the test sets will get fairly
202 homogeneous over time, but it introduces some initial variability.
OLDNEW
« no previous file with comments | « third_party/afl/src/docs/notes_for_asan.txt ('k') | third_party/afl/src/docs/perf_tips.txt » ('j') | no next file with comments »

Powered by Google App Engine