OLD | NEW |
(Empty) | |
| 1 ================================= |
| 2 Tips for performance optimization |
| 3 ================================= |
| 4 |
| 5 This file provides tips for troubleshooting slow or wasteful fuzzing jobs. |
| 6 See README for the general instruction manual. |
| 7 |
| 8 1) Keep your test cases small |
| 9 ----------------------------- |
| 10 |
| 11 This is probably the single most important step to take! Large test cases do |
| 12 not merely take more time and memory to be parsed by the tested binary, but |
| 13 also make the fuzzing process dramatically less efficient in several other |
| 14 ways. |
| 15 |
| 16 To illustrate, let's say that you're randomly flipping bits in a file, one bit |
| 17 at a time. Let's assume that if you flip bit #47, you will hit a security bug; |
| 18 flipping any other bit just results in an invalid document. |
| 19 |
| 20 Now, if your starting test case is 100 bytes long, you will have a 71% chance of |
| 21 triggering the bug within the first 1,000 execs - not bad! But if the test case |
| 22 is 1 kB long, the probability that we will randomly hit the right pattern in |
| 23 the same timeframe goes down to 11%. And if it has 10 kB of non-essential |
| 24 cruft, the odds plunge to 1%. |
| 25 |
| 26 On top of that, with larger inputs, the binary may be now running 5-10x times |
| 27 slower than before - so the overall drop in fuzzing efficiency may be easily |
| 28 as high as 500x or so. |
| 29 |
| 30 In practice, this means that you shouldn't fuzz image parsers with your |
| 31 vacation photos. Generate a tiny 16x16 picture instead, and run it through |
| 32 jpegtran or pngcrunch for good measure. The same goes for most other types |
| 33 of documents. |
| 34 |
| 35 There's plenty of small starting test cases in ../testcases/* - try them out |
| 36 or submit new ones! |
| 37 |
| 38 If you want to start with a larger, third-party corpus, run afl-cmin with an |
| 39 aggressive timeout on that data set first. |
| 40 |
| 41 2) Use a simpler target |
| 42 ----------------------- |
| 43 |
| 44 Consider using a simpler target binary in your fuzzing work. For example, for |
| 45 image formats, bundled utilities such as djpeg, readpng, or gifhisto are |
| 46 considerably (10-20x) faster than the convert tool from ImageMagick - all while |
| 47 exercising roughly the same library-level image parsing code. |
| 48 |
| 49 Even if you don't have a lightweight harness for a particular target, remember |
| 50 that you can always use another, related library to generate a corpus that will |
| 51 be then manually fed to a more resource-hungry program later on. |
| 52 |
| 53 3) Use LLVM instrumentation |
| 54 --------------------------- |
| 55 |
| 56 When fuzzing slow targets, you can gain 2x performance improvement by using |
| 57 the LLVM-based instrumentation mode described in llvm_mode/README.llvm. Note |
| 58 that this mode requires the use of clang and will not work with GCC. |
| 59 |
| 60 The LLVM mode also offers a "persistent", in-process fuzzing mode that can |
| 61 work well for certain types of self-contained libraries, and for fast targets, |
| 62 can offer performance gains up to 5-10x; and a "deferred fork server" mode |
| 63 that can offer huge benefits for programs with high startup overhead. Both |
| 64 modes require you to edit the source code of the fuzzed program, but the |
| 65 changes often amount to just strategically placing a single line or two. |
| 66 |
| 67 4) Profile and optimize the binary |
| 68 ---------------------------------- |
| 69 |
| 70 Check for any parameters or settings that obviously improve performance. For |
| 71 example, the djpeg utility that comes with IJG jpeg and libjpeg-turbo can be |
| 72 called with: |
| 73 |
| 74 -dct fast -nosmooth -onepass -dither none -scale 1/4 |
| 75 |
| 76 ...and that will speed things up. There is a corresponding drop in the quality |
| 77 of decoded images, but it's probably not something you care about. |
| 78 |
| 79 In some programs, it is possible to disable output altogether, or at least use |
| 80 an output format that is computationally inexpensive. For example, with image |
| 81 transcoding tools, converting to a BMP file will be a lot faster than to PNG. |
| 82 |
| 83 With some laid-back parsers, enabling "strict" mode (i.e., bailing out after |
| 84 first error) may result in smaller files and improved run time without |
| 85 sacrificing coverage; for example, for sqlite, you may want to specify -bail. |
| 86 |
| 87 If the program is still too slow, you can use strace -tt or an equivalent |
| 88 profiling tool to see if the targeted binary is doing anything silly. |
| 89 Sometimes, you can speed things up simply by specifying /dev/null as the |
| 90 config file, or disabling some compile-time features that aren't really needed |
| 91 for the job (try ./configure --help). One of the notoriously resource-consuming |
| 92 things would be calling other utilities via exec*(), popen(), system(), or |
| 93 equivalent calls; for example, tar can invoke external decompression tools |
| 94 when it decides that the input file is a compressed archive. |
| 95 |
| 96 Some programs may also intentionally call sleep(), usleep(), or nanosleep(); |
| 97 vim is a good example of that. |
| 98 |
| 99 In programs that are slow due to unavoidable initialization overhead, you may |
| 100 want to try the LLVM deferred forkserver mode (see llvm_mode/README.llvm), |
| 101 which can give you speed gains up to 10x, as mentioned above. |
| 102 |
| 103 Last but not least, if you are using ASAN and the performance is unacceptable, |
| 104 consider turning it off for now, and manually examining the generated corpus |
| 105 with an ASAN-enabled binary later on. |
| 106 |
| 107 5) Instrument just what you need |
| 108 -------------------------------- |
| 109 |
| 110 Instrument just the libraries you actually want to stress-test right now, one |
| 111 at a time. Let the program use system-wide, non-instrumented libraries for |
| 112 any functionality you don't actually want to fuzz. For example, in most |
| 113 cases, it doesn't make to instrument libgmp just because you're testing a |
| 114 crypto app that relies on it for bignum math. |
| 115 |
| 116 Beware of programs that come with oddball third-party libraries bundled with |
| 117 their source code (Spidermonkey is a good example of this). Check ./configure |
| 118 options to use non-instrumented system-wide copies instead. |
| 119 |
| 120 6) Parallelize your fuzzers |
| 121 --------------------------- |
| 122 |
| 123 The fuzzer is designed to need ~1 core per job. This means that on a, say, |
| 124 4-core system, you can easily run four parallel fuzzing jobs with relatively |
| 125 little performance hit. For tips on how to do that, see parallel_fuzzing.txt. |
| 126 |
| 127 The afl-gotcpu utility can help you understand if you still have idle CPU |
| 128 capacity on your system. (It won't tell you about memory bandwidth, cache |
| 129 misses, or similar factors, but they are less likely to be a concern.) |
| 130 |
| 131 7) Keep memory use and timeouts in check |
| 132 ---------------------------------------- |
| 133 |
| 134 If you have increased the -m or -t limits more than truly necessary, consider |
| 135 dialing them back down. |
| 136 |
| 137 For programs that are nominally very fast, but get sluggish for some inputs, |
| 138 you can also try setting -t values that are more punishing than what afl-fuzz |
| 139 dares to use on its own. On fast and idle machines, going down to -t 5 may be |
| 140 a viable plan. |
| 141 |
| 142 The -m parameter is worth looking at, too. Some programs can end up spending |
| 143 a fair amount of time allocating and initializing megabytes of memory when |
| 144 presented with pathological inputs. Low -m values can make them give up sooner |
| 145 and not waste CPU time. |
| 146 |
| 147 8) Set CPU core affinity for AFL |
| 148 -------------------------------- |
| 149 |
| 150 Making sure that the fuzzer always runs on the same (idle) CPU core can offer |
| 151 a significant speed bump and reduce scheduler jitter. The benefits can be even |
| 152 more striking on true multiprocessor systems. |
| 153 |
| 154 On Linux, you can assign the fuzzer to a specific core by first running |
| 155 afl-gotcpu to see which cores are idle, and then specifying the ID of a |
| 156 preferred core via -Z, like so: |
| 157 |
| 158 $ ./afl-fuzz -Z core_id [...other parameters...] |
| 159 |
| 160 Note that this parameter needs to be used with care; accidentally forcing |
| 161 multiple fuzzers to share the same core may result in performance that is |
| 162 worse than what you would get without -Z. |
| 163 |
| 164 (It is also possible to specify two comma-delimited values for -Z, in which |
| 165 case, the fuzzer will run on one designated core, and the target binary will |
| 166 be banished to another. This can sometimes offer minor benefits, but isn't |
| 167 recommended for general use.) |
| 168 |
| 169 9) Check OS configuration |
| 170 ------------------------- |
| 171 |
| 172 There are several OS-level factors that may affect fuzzing speed: |
| 173 |
| 174 - High system load. Use idle machines where possible. Kill any non-essential |
| 175 CPU hogs (idle browser windows, media players, complex screensavers, etc). |
| 176 |
| 177 - Network filesystems, either used for fuzzer input / output, or accessed by |
| 178 the fuzzed binary to read configuration files (pay special attention to the |
| 179 home directory - many programs search it for dot-files). |
| 180 |
| 181 - On-demand CPU scaling. The Linux 'ondemand' governor performs its analysis |
| 182 on a particular schedule and is known to underestimate the needs of |
| 183 short-lived processes spawned by afl-fuzz (or any other fuzzer). On Linux, |
| 184 this can be fixed with: |
| 185 |
| 186 cd /sys/devices/system/cpu |
| 187 echo performance | tee cpu*/cpufreq/scaling_governor |
| 188 |
| 189 On other systems, the impact of CPU scaling will be different; when fuzzing, |
| 190 use OS-specific tools to find out if all cores are running at full speed. |
| 191 |
| 192 - Suboptimal scheduling strategies. The significance of this will vary from |
| 193 one target to another, but on Linux, you may want to make sure that the |
| 194 following options are set: |
| 195 |
| 196 echo 1 >/proc/sys/kernel/sched_child_runs_first |
| 197 echo 1 >/proc/sys/kernel/sched_autogroup_enabled |
| 198 |
| 199 Setting a different scheduling policy for the fuzzer process - say |
| 200 SCHED_RR - can usually speed things up, too, but needs to be done with |
| 201 care. |
| 202 |
| 203 10) If all other options fail, use -d |
| 204 ------------------------------------- |
| 205 |
| 206 For programs that are genuinely slow, in cases where you really can't escape |
| 207 using huge input files, or when you simply want to get quick and dirty results |
| 208 early on, you can always resort to the -d mode. |
| 209 |
| 210 The mode causes afl-fuzz to skip all the deterministic fuzzing steps, which |
| 211 makes output a lot less neat and makes the testing a bit less in-depth, but |
| 212 it will give you an experience more familiar from other fuzzing tools. |
OLD | NEW |