| Index: third_party/afl/src/libtokencap/README.tokencap
|
| diff --git a/third_party/afl/src/libtokencap/README.tokencap b/third_party/afl/src/libtokencap/README.tokencap
|
| new file mode 100644
|
| index 0000000000000000000000000000000000000000..82d80c955802316cde48cb2aba0dff9a64523237
|
| --- /dev/null
|
| +++ b/third_party/afl/src/libtokencap/README.tokencap
|
| @@ -0,0 +1,60 @@
|
| +=========================================
|
| +strcmp() / memcmp() token capture library
|
| +=========================================
|
| +
|
| + (See ../docs/README for the general instruction manual.)
|
| +
|
| +This Linux-only companion library allows you to instrument strcmp(), memcmp(),
|
| +and related functions to automatically extract syntax tokens passed to any of
|
| +these libcalls. The resulting list of tokens may be then given as a starting
|
| +dictionary to afl-fuzz (the -x option) to improve coverage on subsequent
|
| +fuzzing runs.
|
| +
|
| +This may help improving coverage in some targets, and do precisely nothing in
|
| +others. In some cases, it may even make things worse: if libtokencap picks up
|
| +syntax tokens that are not used to process the input data, but that are a part
|
| +of - say - parsing a config file... well, you're going to end up wasting a lot
|
| +of CPU time on trying them out in the input stream. In other words, use this
|
| +feature with care. Manually screening the resulting dictionary is almost
|
| +always a necessity.
|
| +
|
| +As for the actual operation: the library stores tokens, without any deduping,
|
| +by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not
|
| +set, the tool uses stderr (which is probably not what you want).
|
| +
|
| +Similarly to afl-tmin, the library is not "proprietary" and can be used with
|
| +other fuzzers or testing tools without the need for any code tweaks. It does not
|
| +require AFL-instrumented binaries to work.
|
| +
|
| +To use the library, you *need* to make sure that your fuzzing target is compiled
|
| +with -fno-builtin and is linked dynamically. If you wish to automate the first
|
| +part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1
|
| +when using afl-gcc. This setting specifically adds the following flags:
|
| +
|
| + -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp
|
| + -fno-builtin-strcasencmp -fno-builtin-memcmp
|
| +
|
| +The next step is simply loading this library via LD_PRELOAD. The optimal usage
|
| +pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus,
|
| +and then fire off the target binary, with libtokencap.so loaded, on every file
|
| +found by AFL in that earlier run. This demonstrates the basic principle:
|
| +
|
| + export AFL_TOKEN_FILE=$PWD/temp_output.txt
|
| +
|
| + for i in <out_dir>/queue/id*; do
|
| + LD_PRELOAD=/path/to/libtokencap.so \
|
| + /path/to/target/program [...params, including $i...]
|
| + done
|
| +
|
| + sort -u temp_output.txt >afl_dictionary.txt
|
| +
|
| +If you don't get any results, the target library is probably not using strcmp()
|
| +and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or
|
| +the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect.
|
| +
|
| +PS. The library is Linux-only because there is probably no particularly portable
|
| +and non-invasive way to distinguish between read-only and read-write memory
|
| +mappings. The __tokencap_load_mappings() function is the only thing that would
|
| +need to be changed for other OSes. Porting to platforms with /proc/<pid>/maps
|
| +(e.g., FreeBSD) should be trivial.
|
| +
|
|
|