Index: third_party/google_benchmark/README.md |
diff --git a/third_party/google_benchmark/README.md b/third_party/google_benchmark/README.md |
new file mode 100644 |
index 0000000000000000000000000000000000000000..2430d93bf9c52aea19027f921310ab98f8a6b223 |
--- /dev/null |
+++ b/third_party/google_benchmark/README.md |
@@ -0,0 +1,726 @@ |
+# benchmark |
+[](https://travis-ci.org/google/benchmark) |
+[](https://ci.appveyor.com/project/google/benchmark/branch/master) |
+[](https://coveralls.io/r/google/benchmark) |
+ |
+A library to support the benchmarking of functions, similar to unit-tests. |
+ |
+Discussion group: https://groups.google.com/d/forum/benchmark-discuss |
+ |
+IRC channel: https://freenode.net #googlebenchmark |
+ |
+[Known issues and common problems](#known-issues) |
+ |
+[Additional Tooling Documentation](docs/tools.md) |
+ |
+## Example usage |
+### Basic usage |
+Define a function that executes the code to be measured. |
+ |
+```c++ |
+static void BM_StringCreation(benchmark::State& state) { |
+ while (state.KeepRunning()) |
+ std::string empty_string; |
+} |
+// Register the function as a benchmark |
+BENCHMARK(BM_StringCreation); |
+ |
+// Define another benchmark |
+static void BM_StringCopy(benchmark::State& state) { |
+ std::string x = "hello"; |
+ while (state.KeepRunning()) |
+ std::string copy(x); |
+} |
+BENCHMARK(BM_StringCopy); |
+ |
+BENCHMARK_MAIN(); |
+``` |
+ |
+### Passing arguments |
+Sometimes a family of benchmarks can be implemented with just one routine that |
+takes an extra argument to specify which one of the family of benchmarks to |
+run. For example, the following code defines a family of benchmarks for |
+measuring the speed of `memcpy()` calls of different lengths: |
+ |
+```c++ |
+static void BM_memcpy(benchmark::State& state) { |
+ char* src = new char[state.range(0)]; |
+ char* dst = new char[state.range(0)]; |
+ memset(src, 'x', state.range(0)); |
+ while (state.KeepRunning()) |
+ memcpy(dst, src, state.range(0)); |
+ state.SetBytesProcessed(int64_t(state.iterations()) * |
+ int64_t(state.range(0))); |
+ delete[] src; |
+ delete[] dst; |
+} |
+BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); |
+``` |
+ |
+The preceding code is quite repetitive, and can be replaced with the following |
+short-hand. The following invocation will pick a few appropriate arguments in |
+the specified range and will generate a benchmark for each such argument. |
+ |
+```c++ |
+BENCHMARK(BM_memcpy)->Range(8, 8<<10); |
+``` |
+ |
+By default the arguments in the range are generated in multiples of eight and |
+the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the |
+range multiplier is changed to multiples of two. |
+ |
+```c++ |
+BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); |
+``` |
+Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. |
+ |
+You might have a benchmark that depends on two or more inputs. For example, the |
+following code defines a family of benchmarks for measuring the speed of set |
+insertion. |
+ |
+```c++ |
+static void BM_SetInsert(benchmark::State& state) { |
+ while (state.KeepRunning()) { |
+ state.PauseTiming(); |
+ std::set<int> data = ConstructRandomSet(state.range(0)); |
+ state.ResumeTiming(); |
+ for (int j = 0; j < state.range(1); ++j) |
+ data.insert(RandomNumber()); |
+ } |
+} |
+BENCHMARK(BM_SetInsert) |
+ ->Args({1<<10, 1}) |
+ ->Args({1<<10, 8}) |
+ ->Args({1<<10, 64}) |
+ ->Args({1<<10, 512}) |
+ ->Args({8<<10, 1}) |
+ ->Args({8<<10, 8}) |
+ ->Args({8<<10, 64}) |
+ ->Args({8<<10, 512}); |
+``` |
+ |
+The preceding code is quite repetitive, and can be replaced with the following |
+short-hand. The following macro will pick a few appropriate arguments in the |
+product of the two specified ranges and will generate a benchmark for each such |
+pair. |
+ |
+```c++ |
+BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {1, 512}}); |
+``` |
+ |
+For more complex patterns of inputs, passing a custom function to `Apply` allows |
+programmatic specification of an arbitrary set of arguments on which to run the |
+benchmark. The following example enumerates a dense range on one parameter, |
+and a sparse range on the second. |
+ |
+```c++ |
+static void CustomArguments(benchmark::internal::Benchmark* b) { |
+ for (int i = 0; i <= 10; ++i) |
+ for (int j = 32; j <= 1024*1024; j *= 8) |
+ b->Args({i, j}); |
+} |
+BENCHMARK(BM_SetInsert)->Apply(CustomArguments); |
+``` |
+ |
+### Calculate asymptotic complexity (Big O) |
+Asymptotic complexity might be calculated for a family of benchmarks. The |
+following code will calculate the coefficient for the high-order term in the |
+running time and the normalized root-mean square error of string comparison. |
+ |
+```c++ |
+static void BM_StringCompare(benchmark::State& state) { |
+ std::string s1(state.range(0), '-'); |
+ std::string s2(state.range(0), '-'); |
+ while (state.KeepRunning()) { |
+ benchmark::DoNotOptimize(s1.compare(s2)); |
+ } |
+ state.SetComplexityN(state.range(0)); |
+} |
+BENCHMARK(BM_StringCompare) |
+ ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); |
+``` |
+ |
+As shown in the following invocation, asymptotic complexity might also be |
+calculated automatically. |
+ |
+```c++ |
+BENCHMARK(BM_StringCompare) |
+ ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); |
+``` |
+ |
+The following code will specify asymptotic complexity with a lambda function, |
+that might be used to customize high-order term calculation. |
+ |
+```c++ |
+BENCHMARK(BM_StringCompare)->RangeMultiplier(2) |
+ ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; }); |
+``` |
+ |
+### Templated benchmarks |
+Templated benchmarks work the same way: This example produces and consumes |
+messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the |
+absence of multiprogramming. |
+ |
+```c++ |
+template <class Q> int BM_Sequential(benchmark::State& state) { |
+ Q q; |
+ typename Q::value_type v; |
+ while (state.KeepRunning()) { |
+ for (int i = state.range(0); i--; ) |
+ q.push(v); |
+ for (int e = state.range(0); e--; ) |
+ q.Wait(&v); |
+ } |
+ // actually messages, not bytes: |
+ state.SetBytesProcessed( |
+ static_cast<int64_t>(state.iterations())*state.range(0)); |
+} |
+BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); |
+``` |
+ |
+Three macros are provided for adding benchmark templates. |
+ |
+```c++ |
+#if __cplusplus >= 201103L // C++11 and greater. |
+#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. |
+#else // C++ < C++11 |
+#define BENCHMARK_TEMPLATE(func, arg1) |
+#endif |
+#define BENCHMARK_TEMPLATE1(func, arg1) |
+#define BENCHMARK_TEMPLATE2(func, arg1, arg2) |
+``` |
+ |
+## Passing arbitrary arguments to a benchmark |
+In C++11 it is possible to define a benchmark that takes an arbitrary number |
+of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` |
+macro creates a benchmark that invokes `func` with the `benchmark::State` as |
+the first argument followed by the specified `args...`. |
+The `test_case_name` is appended to the name of the benchmark and |
+should describe the values passed. |
+ |
+```c++ |
+template <class ...ExtraArgs>` |
+void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { |
+ [...] |
+} |
+// Registers a benchmark named "BM_takes_args/int_string_test` that passes |
+// the specified values to `extra_args`. |
+BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); |
+``` |
+Note that elements of `...args` may refer to global variables. Users should |
+avoid modifying global state inside of a benchmark. |
+ |
+## Using RegisterBenchmark(name, fn, args...) |
+ |
+The `RegisterBenchmark(name, func, args...)` function provides an alternative |
+way to create and register benchmarks. |
+`RegisterBenchmark(name, func, args...)` creates, registers, and returns a |
+pointer to a new benchmark with the specified `name` that invokes |
+`func(st, args...)` where `st` is a `benchmark::State` object. |
+ |
+Unlike the `BENCHMARK` registration macros, which can only be used at the global |
+scope, the `RegisterBenchmark` can be called anywhere. This allows for |
+benchmark tests to be registered programmatically. |
+ |
+Additionally `RegisterBenchmark` allows any callable object to be registered |
+as a benchmark. Including capturing lambdas and function objects. This |
+allows the creation |
+ |
+For Example: |
+```c++ |
+auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; |
+ |
+int main(int argc, char** argv) { |
+ for (auto& test_input : { /* ... */ }) |
+ benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); |
+ benchmark::Initialize(&argc, argv); |
+ benchmark::RunSpecifiedBenchmarks(); |
+} |
+``` |
+ |
+### Multithreaded benchmarks |
+In a multithreaded test (benchmark invoked by multiple threads simultaneously), |
+it is guaranteed that none of the threads will start until all have called |
+`KeepRunning`, and all will have finished before KeepRunning returns false. As |
+such, any global setup or teardown can be wrapped in a check against the thread |
+index: |
+ |
+```c++ |
+static void BM_MultiThreaded(benchmark::State& state) { |
+ if (state.thread_index == 0) { |
+ // Setup code here. |
+ } |
+ while (state.KeepRunning()) { |
+ // Run the test as normal. |
+ } |
+ if (state.thread_index == 0) { |
+ // Teardown code here. |
+ } |
+} |
+BENCHMARK(BM_MultiThreaded)->Threads(2); |
+``` |
+ |
+If the benchmarked code itself uses threads and you want to compare it to |
+single-threaded code, you may want to use real-time ("wallclock") measurements |
+for latency comparisons: |
+ |
+```c++ |
+BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); |
+``` |
+ |
+Without `UseRealTime`, CPU time is used by default. |
+ |
+ |
+## Manual timing |
+For benchmarking something for which neither CPU time nor real-time are |
+correct or accurate enough, completely manual timing is supported using |
+the `UseManualTime` function. |
+ |
+When `UseManualTime` is used, the benchmarked code must call |
+`SetIterationTime` once per iteration of the `KeepRunning` loop to |
+report the manually measured time. |
+ |
+An example use case for this is benchmarking GPU execution (e.g. OpenCL |
+or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot |
+be accurately measured using CPU time or real-time. Instead, they can be |
+measured accurately using a dedicated API, and these measurement results |
+can be reported back with `SetIterationTime`. |
+ |
+```c++ |
+static void BM_ManualTiming(benchmark::State& state) { |
+ int microseconds = state.range(0); |
+ std::chrono::duration<double, std::micro> sleep_duration { |
+ static_cast<double>(microseconds) |
+ }; |
+ |
+ while (state.KeepRunning()) { |
+ auto start = std::chrono::high_resolution_clock::now(); |
+ // Simulate some useful workload with a sleep |
+ std::this_thread::sleep_for(sleep_duration); |
+ auto end = std::chrono::high_resolution_clock::now(); |
+ |
+ auto elapsed_seconds = |
+ std::chrono::duration_cast<std::chrono::duration<double>>( |
+ end - start); |
+ |
+ state.SetIterationTime(elapsed_seconds.count()); |
+ } |
+} |
+BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); |
+``` |
+ |
+### Preventing optimisation |
+To prevent a value or expression from being optimized away by the compiler |
+the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` |
+functions can be used. |
+ |
+```c++ |
+static void BM_test(benchmark::State& state) { |
+ while (state.KeepRunning()) { |
+ int x = 0; |
+ for (int i=0; i < 64; ++i) { |
+ benchmark::DoNotOptimize(x += i); |
+ } |
+ } |
+} |
+``` |
+ |
+`DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either |
+memory or a register. For GNU based compilers it acts as read/write barrier |
+for global memory. More specifically it forces the compiler to flush pending |
+writes to memory and reload any other values as necessary. |
+ |
+Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` |
+in any way. `<expr>` may even be removed entirely when the result is already |
+known. For example: |
+ |
+```c++ |
+ /* Example 1: `<expr>` is removed entirely. */ |
+ int foo(int x) { return x + 42; } |
+ while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); |
+ |
+ /* Example 2: Result of '<expr>' is only reused */ |
+ int bar(int) __attribute__((const)); |
+ while (...) DoNotOptimize(bar(0)); // Optimized to: |
+ // int __result__ = bar(0); |
+ // while (...) DoNotOptimize(__result__); |
+``` |
+ |
+The second tool for preventing optimizations is `ClobberMemory()`. In essence |
+`ClobberMemory()` forces the compiler to perform all pending writes to global |
+memory. Memory managed by block scope objects must be "escaped" using |
+`DoNotOptimize(...)` before it can be clobbered. In the below example |
+`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized |
+away. |
+ |
+```c++ |
+static void BM_vector_push_back(benchmark::State& state) { |
+ while (state.KeepRunning()) { |
+ std::vector<int> v; |
+ v.reserve(1); |
+ benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. |
+ v.push_back(42); |
+ benchmark::ClobberMemory(); // Force 42 to be written to memory. |
+ } |
+} |
+``` |
+ |
+Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. |
+ |
+### Set time unit manually |
+If a benchmark runs a few milliseconds it may be hard to visually compare the |
+measured times, since the output data is given in nanoseconds per default. In |
+order to manually set the time unit, you can specify it manually: |
+ |
+```c++ |
+BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); |
+``` |
+ |
+## Controlling number of iterations |
+In all cases, the number of iterations for which the benchmark is run is |
+governed by the amount of time the benchmark takes. Concretely, the number of |
+iterations is at least one, not more than 1e9, until CPU time is greater than |
+the minimum time, or the wallclock time is 5x minimum time. The minimum time is |
+set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on |
+the registered benchmark object. |
+ |
+## Reporting the mean and standard devation by repeated benchmarks |
+By default each benchmark is run once and that single result is reported. |
+However benchmarks are often noisy and a single result may not be representative |
+of the overall behavior. For this reason it's possible to repeatedly rerun the |
+benchmark. |
+ |
+The number of runs of each benchmark is specified globally by the |
+`--benchmark_repetitions` flag or on a per benchmark basis by calling |
+`Repetitions` on the registered benchmark object. When a benchmark is run |
+more than once the mean and standard deviation of the runs will be reported. |
+ |
+Additionally the `--benchmark_report_aggregates_only={true|false}` flag or |
+`ReportAggregatesOnly(bool)` function can be used to change how repeated tests |
+are reported. By default the result of each repeated run is reported. When this |
+option is 'true' only the mean and standard deviation of the runs is reported. |
+Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides |
+the value of the flag for that benchmark. |
+ |
+## Fixtures |
+Fixture tests are created by |
+first defining a type that derives from ::benchmark::Fixture and then |
+creating/registering the tests using the following macros: |
+ |
+* `BENCHMARK_F(ClassName, Method)` |
+* `BENCHMARK_DEFINE_F(ClassName, Method)` |
+* `BENCHMARK_REGISTER_F(ClassName, Method)` |
+ |
+For Example: |
+ |
+```c++ |
+class MyFixture : public benchmark::Fixture {}; |
+ |
+BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { |
+ while (st.KeepRunning()) { |
+ ... |
+ } |
+} |
+ |
+BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { |
+ while (st.KeepRunning()) { |
+ ... |
+ } |
+} |
+/* BarTest is NOT registered */ |
+BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); |
+/* BarTest is now registered */ |
+``` |
+ |
+ |
+## User-defined counters |
+ |
+You can add your own counters with user-defined names. The example below |
+will add columns "Foo", "Bar" and "Baz" in its output: |
+ |
+```c++ |
+static void UserCountersExample1(benchmark::State& state) { |
+ double numFoos = 0, numBars = 0, numBazs = 0; |
+ while (state.KeepRunning()) { |
+ // ... count Foo,Bar,Baz events |
+ } |
+ state.counters["Foo"] = numFoos; |
+ state.counters["Bar"] = numBars; |
+ state.counters["Baz"] = numBazs; |
+} |
+``` |
+ |
+The `state.counters` object is a `std::map` with `std::string` keys |
+and `Counter` values. The latter is a `double`-like class, via an implicit |
+conversion to `double&`. Thus you can use all of the standard arithmetic |
+assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. |
+ |
+In multithreaded benchmarks, each counter is set on the calling thread only. |
+When the benchmark finishes, the counters from each thread will be summed; |
+the resulting sum is the value which will be shown for the benchmark. |
+ |
+The `Counter` constructor accepts two parameters: the value as a `double` |
+and a bit flag which allows you to show counters as rates and/or as |
+per-thread averages: |
+ |
+```c++ |
+ // sets a simple counter |
+ state.counters["Foo"] = numFoos; |
+ |
+ // Set the counter as a rate. It will be presented divided |
+ // by the duration of the benchmark. |
+ state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); |
+ |
+ // Set the counter as a thread-average quantity. It will |
+ // be presented divided by the number of threads. |
+ state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); |
+ |
+ // There's also a combined flag: |
+ state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate); |
+``` |
+ |
+When you're compiling in C++11 mode or later you can use `insert()` with |
+`std::initializer_list`: |
+ |
+```c++ |
+ // With C++11, this can be done: |
+ state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); |
+ // ... instead of: |
+ state.counters["Foo"] = numFoos; |
+ state.counters["Bar"] = numBars; |
+ state.counters["Baz"] = numBazs; |
+``` |
+ |
+### Counter reporting |
+ |
+When using the console reporter, by default, user counters are are printed at |
+the end after the table, the same way as ``bytes_processed`` and |
+``items_processed``. This is best for cases in which there are few counters, |
+or where there are only a couple of lines per benchmark. Here's an example of |
+the default output: |
+ |
+``` |
+------------------------------------------------------------------------------ |
+Benchmark Time CPU Iterations UserCounters... |
+------------------------------------------------------------------------------ |
+BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8 |
+BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m |
+BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2 |
+BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4 |
+BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8 |
+BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16 |
+BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32 |
+BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4 |
+BM_Factorial 26 ns 26 ns 26608979 40320 |
+BM_Factorial/real_time 26 ns 26 ns 26587936 40320 |
+BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 |
+BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 |
+BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 |
+BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 |
+``` |
+ |
+If this doesn't suit you, you can print each counter as a table column by |
+passing the flag `--benchmark_counters_tabular=true` to the benchmark |
+application. This is best for cases in which there are a lot of counters, or |
+a lot of lines per individual benchmark. Note that this will trigger a |
+reprinting of the table header any time the counter set changes between |
+individual benchmarks. Here's an example of corresponding output when |
+`--benchmark_counters_tabular=true` is passed: |
+ |
+``` |
+--------------------------------------------------------------------------------------- |
+Benchmark Time CPU Iterations Bar Bat Baz Foo |
+--------------------------------------------------------------------------------------- |
+BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8 |
+BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1 |
+BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2 |
+BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4 |
+BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8 |
+BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16 |
+BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32 |
+BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4 |
+-------------------------------------------------------------- |
+Benchmark Time CPU Iterations |
+-------------------------------------------------------------- |
+BM_Factorial 26 ns 26 ns 26392245 40320 |
+BM_Factorial/real_time 26 ns 26 ns 26494107 40320 |
+BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 |
+BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 |
+BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 |
+BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 |
+BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 |
+BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 |
+BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 |
+BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 |
+BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 |
+``` |
+Note above the additional header printed when the benchmark changes from |
+``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does |
+not have the same counter set as ``BM_UserCounter``. |
+ |
+## Exiting Benchmarks in Error |
+ |
+When errors caused by external influences, such as file I/O and network |
+communication, occur within a benchmark the |
+`State::SkipWithError(const char* msg)` function can be used to skip that run |
+of benchmark and report the error. Note that only future iterations of the |
+`KeepRunning()` are skipped. Users may explicitly return to exit the |
+benchmark immediately. |
+ |
+The `SkipWithError(...)` function may be used at any point within the benchmark, |
+including before and after the `KeepRunning()` loop. |
+ |
+For example: |
+ |
+```c++ |
+static void BM_test(benchmark::State& state) { |
+ auto resource = GetResource(); |
+ if (!resource.good()) { |
+ state.SkipWithError("Resource is not good!"); |
+ // KeepRunning() loop will not be entered. |
+ } |
+ while (state.KeepRunning()) { |
+ auto data = resource.read_data(); |
+ if (!resource.good()) { |
+ state.SkipWithError("Failed to read data!"); |
+ break; // Needed to skip the rest of the iteration. |
+ } |
+ do_stuff(data); |
+ } |
+} |
+``` |
+ |
+## Running a subset of the benchmarks |
+ |
+The `--benchmark_filter=<regex>` option can be used to only run the benchmarks |
+which match the specified `<regex>`. For example: |
+ |
+```bash |
+$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 |
+Run on (1 X 2300 MHz CPU ) |
+2016-06-25 19:34:24 |
+Benchmark Time CPU Iterations |
+---------------------------------------------------- |
+BM_memcpy/32 11 ns 11 ns 79545455 |
+BM_memcpy/32k 2181 ns 2185 ns 324074 |
+BM_memcpy/32 12 ns 12 ns 54687500 |
+BM_memcpy/32k 1834 ns 1837 ns 357143 |
+``` |
+ |
+ |
+## Output Formats |
+The library supports multiple output formats. Use the |
+`--benchmark_format=<console|json|csv>` flag to set the format type. `console` |
+is the default format. |
+ |
+The Console format is intended to be a human readable format. By default |
+the format generates color output. Context is output on stderr and the |
+tabular data on stdout. Example tabular output looks like: |
+``` |
+Benchmark Time(ns) CPU(ns) Iterations |
+---------------------------------------------------------------------- |
+BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s |
+BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s |
+BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s |
+``` |
+ |
+The JSON format outputs human readable json split into two top level attributes. |
+The `context` attribute contains information about the run in general, including |
+information about the CPU and the date. |
+The `benchmarks` attribute contains a list of ever benchmark run. Example json |
+output looks like: |
+```json |
+{ |
+ "context": { |
+ "date": "2015/03/17-18:40:25", |
+ "num_cpus": 40, |
+ "mhz_per_cpu": 2801, |
+ "cpu_scaling_enabled": false, |
+ "build_type": "debug" |
+ }, |
+ "benchmarks": [ |
+ { |
+ "name": "BM_SetInsert/1024/1", |
+ "iterations": 94877, |
+ "real_time": 29275, |
+ "cpu_time": 29836, |
+ "bytes_per_second": 134066, |
+ "items_per_second": 33516 |
+ }, |
+ { |
+ "name": "BM_SetInsert/1024/8", |
+ "iterations": 21609, |
+ "real_time": 32317, |
+ "cpu_time": 32429, |
+ "bytes_per_second": 986770, |
+ "items_per_second": 246693 |
+ }, |
+ { |
+ "name": "BM_SetInsert/1024/10", |
+ "iterations": 21393, |
+ "real_time": 32724, |
+ "cpu_time": 33355, |
+ "bytes_per_second": 1199226, |
+ "items_per_second": 299807 |
+ } |
+ ] |
+} |
+``` |
+ |
+The CSV format outputs comma-separated values. The `context` is output on stderr |
+and the CSV itself on stdout. Example CSV output looks like: |
+``` |
+name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label |
+"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, |
+"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, |
+"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, |
+``` |
+ |
+## Output Files |
+The library supports writing the output of the benchmark to a file specified |
+by `--benchmark_out=<filename>`. The format of the output can be specified |
+using `--benchmark_out_format={json|console|csv}`. Specifying |
+`--benchmark_out` does not suppress the console output. |
+ |
+## Debug vs Release |
+By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: |
+ |
+``` |
+cmake -DCMAKE_BUILD_TYPE=Release |
+``` |
+ |
+To enable link-time optimisation, use |
+ |
+``` |
+cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true |
+``` |
+ |
+## Linking against the library |
+When using gcc, it is necessary to link against pthread to avoid runtime exceptions. |
+This is due to how gcc implements std::thread. |
+See [issue #67](https://github.com/google/benchmark/issues/67) for more details. |
+ |
+## Compiler Support |
+ |
+Google Benchmark uses C++11 when building the library. As such we require |
+a modern C++ toolchain, both compiler and standard library. |
+ |
+The following minimum versions are strongly recommended build the library: |
+ |
+* GCC 4.8 |
+* Clang 3.4 |
+* Visual Studio 2013 |
+* Intel 2015 Update 1 |
+ |
+Anything older *may* work. |
+ |
+Note: Using the library and its headers in C++03 is supported. C++11 is only |
+required to build the library. |
+ |
+# Known Issues |
+ |
+### Windows |
+ |
+* Users must manually link `shlwapi.lib`. Failure to do so may result |
+in unresolved symbols. |
+ |