Index: third_party/afl/src/docs/perf_tips.txt |
diff --git a/third_party/afl/src/docs/perf_tips.txt b/third_party/afl/src/docs/perf_tips.txt |
new file mode 100644 |
index 0000000000000000000000000000000000000000..e05401d7df267d6c717669a26d8f900f1962796c |
--- /dev/null |
+++ b/third_party/afl/src/docs/perf_tips.txt |
@@ -0,0 +1,212 @@ |
+================================= |
+Tips for performance optimization |
+================================= |
+ |
+ This file provides tips for troubleshooting slow or wasteful fuzzing jobs. |
+ See README for the general instruction manual. |
+ |
+1) Keep your test cases small |
+----------------------------- |
+ |
+This is probably the single most important step to take! Large test cases do |
+not merely take more time and memory to be parsed by the tested binary, but |
+also make the fuzzing process dramatically less efficient in several other |
+ways. |
+ |
+To illustrate, let's say that you're randomly flipping bits in a file, one bit |
+at a time. Let's assume that if you flip bit #47, you will hit a security bug; |
+flipping any other bit just results in an invalid document. |
+ |
+Now, if your starting test case is 100 bytes long, you will have a 71% chance of |
+triggering the bug within the first 1,000 execs - not bad! But if the test case |
+is 1 kB long, the probability that we will randomly hit the right pattern in |
+the same timeframe goes down to 11%. And if it has 10 kB of non-essential |
+cruft, the odds plunge to 1%. |
+ |
+On top of that, with larger inputs, the binary may be now running 5-10x times |
+slower than before - so the overall drop in fuzzing efficiency may be easily |
+as high as 500x or so. |
+ |
+In practice, this means that you shouldn't fuzz image parsers with your |
+vacation photos. Generate a tiny 16x16 picture instead, and run it through |
+jpegtran or pngcrunch for good measure. The same goes for most other types |
+of documents. |
+ |
+There's plenty of small starting test cases in ../testcases/* - try them out |
+or submit new ones! |
+ |
+If you want to start with a larger, third-party corpus, run afl-cmin with an |
+aggressive timeout on that data set first. |
+ |
+2) Use a simpler target |
+----------------------- |
+ |
+Consider using a simpler target binary in your fuzzing work. For example, for |
+image formats, bundled utilities such as djpeg, readpng, or gifhisto are |
+considerably (10-20x) faster than the convert tool from ImageMagick - all while |
+exercising roughly the same library-level image parsing code. |
+ |
+Even if you don't have a lightweight harness for a particular target, remember |
+that you can always use another, related library to generate a corpus that will |
+be then manually fed to a more resource-hungry program later on. |
+ |
+3) Use LLVM instrumentation |
+--------------------------- |
+ |
+When fuzzing slow targets, you can gain 2x performance improvement by using |
+the LLVM-based instrumentation mode described in llvm_mode/README.llvm. Note |
+that this mode requires the use of clang and will not work with GCC. |
+ |
+The LLVM mode also offers a "persistent", in-process fuzzing mode that can |
+work well for certain types of self-contained libraries, and for fast targets, |
+can offer performance gains up to 5-10x; and a "deferred fork server" mode |
+that can offer huge benefits for programs with high startup overhead. Both |
+modes require you to edit the source code of the fuzzed program, but the |
+changes often amount to just strategically placing a single line or two. |
+ |
+4) Profile and optimize the binary |
+---------------------------------- |
+ |
+Check for any parameters or settings that obviously improve performance. For |
+example, the djpeg utility that comes with IJG jpeg and libjpeg-turbo can be |
+called with: |
+ |
+ -dct fast -nosmooth -onepass -dither none -scale 1/4 |
+ |
+...and that will speed things up. There is a corresponding drop in the quality |
+of decoded images, but it's probably not something you care about. |
+ |
+In some programs, it is possible to disable output altogether, or at least use |
+an output format that is computationally inexpensive. For example, with image |
+transcoding tools, converting to a BMP file will be a lot faster than to PNG. |
+ |
+With some laid-back parsers, enabling "strict" mode (i.e., bailing out after |
+first error) may result in smaller files and improved run time without |
+sacrificing coverage; for example, for sqlite, you may want to specify -bail. |
+ |
+If the program is still too slow, you can use strace -tt or an equivalent |
+profiling tool to see if the targeted binary is doing anything silly. |
+Sometimes, you can speed things up simply by specifying /dev/null as the |
+config file, or disabling some compile-time features that aren't really needed |
+for the job (try ./configure --help). One of the notoriously resource-consuming |
+things would be calling other utilities via exec*(), popen(), system(), or |
+equivalent calls; for example, tar can invoke external decompression tools |
+when it decides that the input file is a compressed archive. |
+ |
+Some programs may also intentionally call sleep(), usleep(), or nanosleep(); |
+vim is a good example of that. |
+ |
+In programs that are slow due to unavoidable initialization overhead, you may |
+want to try the LLVM deferred forkserver mode (see llvm_mode/README.llvm), |
+which can give you speed gains up to 10x, as mentioned above. |
+ |
+Last but not least, if you are using ASAN and the performance is unacceptable, |
+consider turning it off for now, and manually examining the generated corpus |
+with an ASAN-enabled binary later on. |
+ |
+5) Instrument just what you need |
+-------------------------------- |
+ |
+Instrument just the libraries you actually want to stress-test right now, one |
+at a time. Let the program use system-wide, non-instrumented libraries for |
+any functionality you don't actually want to fuzz. For example, in most |
+cases, it doesn't make to instrument libgmp just because you're testing a |
+crypto app that relies on it for bignum math. |
+ |
+Beware of programs that come with oddball third-party libraries bundled with |
+their source code (Spidermonkey is a good example of this). Check ./configure |
+options to use non-instrumented system-wide copies instead. |
+ |
+6) Parallelize your fuzzers |
+--------------------------- |
+ |
+The fuzzer is designed to need ~1 core per job. This means that on a, say, |
+4-core system, you can easily run four parallel fuzzing jobs with relatively |
+little performance hit. For tips on how to do that, see parallel_fuzzing.txt. |
+ |
+The afl-gotcpu utility can help you understand if you still have idle CPU |
+capacity on your system. (It won't tell you about memory bandwidth, cache |
+misses, or similar factors, but they are less likely to be a concern.) |
+ |
+7) Keep memory use and timeouts in check |
+---------------------------------------- |
+ |
+If you have increased the -m or -t limits more than truly necessary, consider |
+dialing them back down. |
+ |
+For programs that are nominally very fast, but get sluggish for some inputs, |
+you can also try setting -t values that are more punishing than what afl-fuzz |
+dares to use on its own. On fast and idle machines, going down to -t 5 may be |
+a viable plan. |
+ |
+The -m parameter is worth looking at, too. Some programs can end up spending |
+a fair amount of time allocating and initializing megabytes of memory when |
+presented with pathological inputs. Low -m values can make them give up sooner |
+and not waste CPU time. |
+ |
+8) Set CPU core affinity for AFL |
+-------------------------------- |
+ |
+Making sure that the fuzzer always runs on the same (idle) CPU core can offer |
+a significant speed bump and reduce scheduler jitter. The benefits can be even |
+more striking on true multiprocessor systems. |
+ |
+On Linux, you can assign the fuzzer to a specific core by first running |
+afl-gotcpu to see which cores are idle, and then specifying the ID of a |
+preferred core via -Z, like so: |
+ |
+ $ ./afl-fuzz -Z core_id [...other parameters...] |
+ |
+Note that this parameter needs to be used with care; accidentally forcing |
+multiple fuzzers to share the same core may result in performance that is |
+worse than what you would get without -Z. |
+ |
+(It is also possible to specify two comma-delimited values for -Z, in which |
+case, the fuzzer will run on one designated core, and the target binary will |
+be banished to another. This can sometimes offer minor benefits, but isn't |
+recommended for general use.) |
+ |
+9) Check OS configuration |
+------------------------- |
+ |
+There are several OS-level factors that may affect fuzzing speed: |
+ |
+ - High system load. Use idle machines where possible. Kill any non-essential |
+ CPU hogs (idle browser windows, media players, complex screensavers, etc). |
+ |
+ - Network filesystems, either used for fuzzer input / output, or accessed by |
+ the fuzzed binary to read configuration files (pay special attention to the |
+ home directory - many programs search it for dot-files). |
+ |
+ - On-demand CPU scaling. The Linux 'ondemand' governor performs its analysis |
+ on a particular schedule and is known to underestimate the needs of |
+ short-lived processes spawned by afl-fuzz (or any other fuzzer). On Linux, |
+ this can be fixed with: |
+ |
+ cd /sys/devices/system/cpu |
+ echo performance | tee cpu*/cpufreq/scaling_governor |
+ |
+ On other systems, the impact of CPU scaling will be different; when fuzzing, |
+ use OS-specific tools to find out if all cores are running at full speed. |
+ |
+ - Suboptimal scheduling strategies. The significance of this will vary from |
+ one target to another, but on Linux, you may want to make sure that the |
+ following options are set: |
+ |
+ echo 1 >/proc/sys/kernel/sched_child_runs_first |
+ echo 1 >/proc/sys/kernel/sched_autogroup_enabled |
+ |
+ Setting a different scheduling policy for the fuzzer process - say |
+ SCHED_RR - can usually speed things up, too, but needs to be done with |
+ care. |
+ |
+10) If all other options fail, use -d |
+------------------------------------- |
+ |
+For programs that are genuinely slow, in cases where you really can't escape |
+using huge input files, or when you simply want to get quick and dirty results |
+early on, you can always resort to the -d mode. |
+ |
+The mode causes afl-fuzz to skip all the deterministic fuzzing steps, which |
+makes output a lot less neat and makes the testing a bit less in-depth, but |
+it will give you an experience more familiar from other fuzzing tools. |