third_party/afl/src/docs/parallel_fuzzing.txt - Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/

Side by Side Diff: third_party/afl/src/docs/parallel_fuzzing.txt

Issue 2075883002: Add American Fuzzy Lop (afl) to third_party/afl/ (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Fix nits Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 =========================

	2 Tips for parallel fuzzing

	3 =========================

	4

	5 This document talks about synchronizing afl-fuzz jobs on a single machine

	6 or across a fleet of systems. See README for the general instruction manual.

	7

	8 1) Introduction

	9 ---------------

	10

	11 Every copy of afl-fuzz will take up one CPU core. This means that on an

	12 n-core system, you can almost always run around n concurrent fuzzing jobs with

	13 virtually no performance hit (you can use the afl-gotcpu tool to make sure).

	14

	15 In fact, if you rely on just a single job on a multi-core system, you will

	16 be underutilizing the hardware. So, parallelization is usually the right

	17 way to go.

	18

	19 When targeting multiple unrelated binaries or using the tool in "dumb" (-n)

	20 mode, it is perfectly fine to just start up several fully separate instances

	21 of afl-fuzz. The picture gets more complicated when you want to have multiple

	22 fuzzers hammering a common target: if a hard-to-hit but interesting test case

	23 is synthesized by one fuzzer, the remaining instances will not be able to use

	24 that input to guide their work.

	25

	26 To help with this problem, afl-fuzz offers a simple way to synchronize test

	27 cases on the fly.

	28

	29 2) Single-system parallelization

	30 --------------------------------

	31

	32 If you wish to parallelize a single job across multiple cores on a local

	33 system, simply create a new, empty output directory ("sync dir") that will be

	34 shared by all the instances of afl-fuzz; and then come up with a naming scheme

	35 for every instance - say, "fuzzer01", "fuzzer02", etc.

	36

	37 Run the first one ("master", -M) like this:

	38

	39 $ ./afl-fuzz -i testcase_dir -o sync_dir -M fuzzer01 [...other stuff...]

	40

	41 ...and then, start up secondary (-S) instances like this:

	42

	43 $ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer02 [...other stuff...]

	44 $ ./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer03 [...other stuff...]

	45

	46 Each fuzzer will keep its state in a separate subdirectory, like so:

	47

	48 /path/to/sync_dir/fuzzer01/

	49

	50 Each instance will also periodically rescan the top-level sync directory

	51 for any test cases found by other fuzzers - and will incorporate them into

	52 its own fuzzing when they are deemed interesting enough.

	53

	54 The only difference between the -M and -S modes is that the master instance

	55 will still perform deterministic checks; while the secondary instances will

	56 proceed straight to random tweaks. If you don't want to do deterministic

	57 fuzzing at all, it's OK to run all instances with -S. With very slow or complex

	58 targets, or when running heavily parallelized jobs, this is usually a good plan.

	59

	60 You can monitor the progress of your jobs from the command line with the

	61 provided afl-whatsup tool. When the instances are no longer finding new paths,

	62 it's probably time to stop.

	63

	64 WARNING: Exercise caution when explicitly specifying the -f option. Each fuzzer

	65 must use a separate temporary file; otherwise, things will go south. One safe

	66 example may be:

	67

	68 $ ./afl-fuzz [...] -S fuzzer10 -f file10.txt ./fuzzed/binary @@

	69 $ ./afl-fuzz [...] -S fuzzer11 -f file11.txt ./fuzzed/binary @@

	70 $ ./afl-fuzz [...] -S fuzzer12 -f file12.txt ./fuzzed/binary @@

	71

	72 This is not a concern if you use @@ without -f and let afl-fuzz come up with the

	73 file name.

	74

	75 3) Multi-system parallelization

	76 -------------------------------

	77

	78 The basic operating principle for multi-system parallelization is similar to

	79 the mechanism explained in section 2. The key difference is that you need to

	80 write a simple script that performs two actions:

	81

	82 - Uses SSH with authorized_keys to connect to every machine and retrieve

	83 a tar archive of the /path/to/sync_dir/<fuzzer_id>/queue/ directories for

	84 every <fuzzer_id> local to the machine. It's best to use a naming scheme

	85 that includes host name in the fuzzer ID, so that you can do something

	86 like:

	87

	88 for s in {1..10}; do

	89 ssh user@host${s} "tar -czf - sync/host${s}_fuzzid/[qf]" >host${s}.tgz

	90 done

	91

	92 - Distributes and unpacks these files on all the remaining machines, e.g.:

	93

	94 for s in {1..10}; do

	95 for d in {1..10}; do

	96 test "$s" = "$d" && continue

	97 ssh user@host${d} 'tar -kxzf -' <host${s}.tgz

	98 done

	99 done

	100

	101 There is an example of such a script in experimental/distributed_fuzzing/;

	102 you can also find a more featured, experimental tool developed by

	103 Martijn Bogaard at:

	104

	105 https://github.com/MartijnB/disfuzz-afl

	106

	107 Another client-server implementation from Richo Healey is:

	108

	109 https://github.com/richo/roving

	110

	111 Note that these third-party tools are unsafe to run on systems exposed to the

	112 Internet or to untrusted users.

	113

	114 When developing custom test case sync code, there are several optimizations

	115 to keep in mind:

	116

	117 - The synchronization does not have to happen very often; running the

	118 task every 30 minutes or so may be perfectly fine.

	119

	120 - There is no need to synchronize crashes/ or hangs/; you only need to

	121 copy over queue/* (and ideally, also fuzzer_stats).

	122

	123 - It is not necessary (and not advisable!) to overwrite existing files;

	124 the -k option in tar is a good way to avoid that.

	125

	126 - There is no need to fetch directories for fuzzers that are not running

	127 locally on a particular machine, and were simply copied over onto that

	128 system during earlier runs.

	129

	130 - For large fleets, you will want to consolidate tarballs for each host,

	131 as this will let you use n SSH connections for sync, rather than n*(n-1).

	132

	133 You may also want to implement staged synchronization. For example, you

	134 could have 10 groups of systems, with group 1 pushing test cases only

	135 to group 2; group 2 pushing them only to group 3; and so on, with group

	136 eventually 10 feeding back to group 1.

	137

	138 This arrangement would allow test interesting cases to propagate across

	139 the fleet without having to copy every fuzzer queue to every single host.

	140

	141 - You do not want a "master" instance of afl-fuzz on every system; you should

	142 run them all with -S, and just designate a single process somewhere within

	143 the fleet to run with -M.

	144

	145 It is not advisable to skip the synchronization script and run the fuzzers

	146 directly on a network filesystem; unexpected latency and unkillable processes

	147 in I/O wait state can mess things up.

	148

	149 4) Remote monitoring and data collection

	150 ----------------------------------------

	151

	152 You can use screen, nohup, tmux, or something equivalent to run remote

	153 instances of afl-fuzz. If you redirect the program's output to a file, it will

	154 automatically switch from a fancy UI to more limited status reports. There is

	155 also basic machine-readable information always written to the fuzzer_stats file

	156 in the output directory. Locally, that information can be interpreted with

	157 afl-whatsup.

	158

	159 In principle, you can use the status screen of the master (-M) instance to

	160 monitor the overall fuzzing progress and decide when to stop. In this

	161 mode, the most important signal is just that no new paths are being found

	162 for a longer while. If you do not have a master instance, just pick any

	163 single secondary instance to watch and go by that.

	164

	165 You can also rely on that instance's output directory to collect the

	166 synthesized corpus that covers all the noteworthy paths discovered anywhere

	167 within the fleet. Secondary (-S) instances do not require any special

	168 monitoring, other than just making sure that they are up.

	169

	170 Keep in mind that crashing inputs are not automatically propagated to the

	171 master instance, so you may still want to monitor for crashes fleet-wide

	172 from within your synchronization or health checking scripts (see afl-whatsup).

	173

	174 5) Asymmetric setups

	175 --------------------

	176

	177 It is perhaps worth noting that all of the following is permitted:

	178

	179 - Running afl-fuzz with conjunction with other guided tools that can extend

	180 coverage (e.g., via concolic execution). Third-party tools simply need to

	181 follow the protocol described above for pulling new test cases from

	182 out_dir/<fuzzer_id>/queue/* and writing their own finds to sequentially

	183 numbered id:nnnnnn files in out_dir/<ext_tool_id>/queue/*.

	184

	185 - Running some of the synchronized fuzzers with different (but related)

	186 target binaries. For example, simultaneously stress-testing several

	187 different JPEG parsers (say, IJG jpeg and libjpeg-turbo) while sharing

	188 the discovered test cases can have synergistic effects and improve the

	189 overall coverage.

	190

	191 (In this case, running one -M instance per each binary is a good plan.)

	192

	193 - Having some of the fuzzers invoke the binary in different ways.

	194 For example, 'djpeg' supports several DCT modes, configurable with

	195 a command-line flag, while 'dwebp' supports incremental and one-shot

	196 decoding. In some scenarios, going after multiple distinct modes and then

	197 pooling test cases will improve coverage.

	198

	199 - Much less convincingly, running the synchronized fuzzers with different

	200 starting test cases (e.g., progressive and standard JPEG) or dictionaries.

	201 The synchronization mechanism ensures that the test sets will get fairly

	202 homogeneous over time, but it introduces some initial variability.

OLD	NEW

« no previous file with comments | « third_party/afl/src/docs/notes_for_asan.txt ('k') | third_party/afl/src/docs/perf_tips.txt » ('j') | no next file with comments »