OLD | NEW |
(Empty) | |
| 1 # benchmark |
| 2 [](http
s://travis-ci.org/google/benchmark) |
| 3 [](https://ci.appveyor.com/project/google/benchmark/branch/m
aster) |
| 4 [](http
s://coveralls.io/r/google/benchmark) |
| 5 |
| 6 A library to support the benchmarking of functions, similar to unit-tests. |
| 7 |
| 8 Discussion group: https://groups.google.com/d/forum/benchmark-discuss |
| 9 |
| 10 IRC channel: https://freenode.net #googlebenchmark |
| 11 |
| 12 [Known issues and common problems](#known-issues) |
| 13 |
| 14 [Additional Tooling Documentation](docs/tools.md) |
| 15 |
| 16 ## Example usage |
| 17 ### Basic usage |
| 18 Define a function that executes the code to be measured. |
| 19 |
| 20 ```c++ |
| 21 static void BM_StringCreation(benchmark::State& state) { |
| 22 while (state.KeepRunning()) |
| 23 std::string empty_string; |
| 24 } |
| 25 // Register the function as a benchmark |
| 26 BENCHMARK(BM_StringCreation); |
| 27 |
| 28 // Define another benchmark |
| 29 static void BM_StringCopy(benchmark::State& state) { |
| 30 std::string x = "hello"; |
| 31 while (state.KeepRunning()) |
| 32 std::string copy(x); |
| 33 } |
| 34 BENCHMARK(BM_StringCopy); |
| 35 |
| 36 BENCHMARK_MAIN(); |
| 37 ``` |
| 38 |
| 39 ### Passing arguments |
| 40 Sometimes a family of benchmarks can be implemented with just one routine that |
| 41 takes an extra argument to specify which one of the family of benchmarks to |
| 42 run. For example, the following code defines a family of benchmarks for |
| 43 measuring the speed of `memcpy()` calls of different lengths: |
| 44 |
| 45 ```c++ |
| 46 static void BM_memcpy(benchmark::State& state) { |
| 47 char* src = new char[state.range(0)]; |
| 48 char* dst = new char[state.range(0)]; |
| 49 memset(src, 'x', state.range(0)); |
| 50 while (state.KeepRunning()) |
| 51 memcpy(dst, src, state.range(0)); |
| 52 state.SetBytesProcessed(int64_t(state.iterations()) * |
| 53 int64_t(state.range(0))); |
| 54 delete[] src; |
| 55 delete[] dst; |
| 56 } |
| 57 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); |
| 58 ``` |
| 59 |
| 60 The preceding code is quite repetitive, and can be replaced with the following |
| 61 short-hand. The following invocation will pick a few appropriate arguments in |
| 62 the specified range and will generate a benchmark for each such argument. |
| 63 |
| 64 ```c++ |
| 65 BENCHMARK(BM_memcpy)->Range(8, 8<<10); |
| 66 ``` |
| 67 |
| 68 By default the arguments in the range are generated in multiples of eight and |
| 69 the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the |
| 70 range multiplier is changed to multiples of two. |
| 71 |
| 72 ```c++ |
| 73 BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); |
| 74 ``` |
| 75 Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. |
| 76 |
| 77 You might have a benchmark that depends on two or more inputs. For example, the |
| 78 following code defines a family of benchmarks for measuring the speed of set |
| 79 insertion. |
| 80 |
| 81 ```c++ |
| 82 static void BM_SetInsert(benchmark::State& state) { |
| 83 while (state.KeepRunning()) { |
| 84 state.PauseTiming(); |
| 85 std::set<int> data = ConstructRandomSet(state.range(0)); |
| 86 state.ResumeTiming(); |
| 87 for (int j = 0; j < state.range(1); ++j) |
| 88 data.insert(RandomNumber()); |
| 89 } |
| 90 } |
| 91 BENCHMARK(BM_SetInsert) |
| 92 ->Args({1<<10, 1}) |
| 93 ->Args({1<<10, 8}) |
| 94 ->Args({1<<10, 64}) |
| 95 ->Args({1<<10, 512}) |
| 96 ->Args({8<<10, 1}) |
| 97 ->Args({8<<10, 8}) |
| 98 ->Args({8<<10, 64}) |
| 99 ->Args({8<<10, 512}); |
| 100 ``` |
| 101 |
| 102 The preceding code is quite repetitive, and can be replaced with the following |
| 103 short-hand. The following macro will pick a few appropriate arguments in the |
| 104 product of the two specified ranges and will generate a benchmark for each such |
| 105 pair. |
| 106 |
| 107 ```c++ |
| 108 BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {1, 512}}); |
| 109 ``` |
| 110 |
| 111 For more complex patterns of inputs, passing a custom function to `Apply` allows |
| 112 programmatic specification of an arbitrary set of arguments on which to run the |
| 113 benchmark. The following example enumerates a dense range on one parameter, |
| 114 and a sparse range on the second. |
| 115 |
| 116 ```c++ |
| 117 static void CustomArguments(benchmark::internal::Benchmark* b) { |
| 118 for (int i = 0; i <= 10; ++i) |
| 119 for (int j = 32; j <= 1024*1024; j *= 8) |
| 120 b->Args({i, j}); |
| 121 } |
| 122 BENCHMARK(BM_SetInsert)->Apply(CustomArguments); |
| 123 ``` |
| 124 |
| 125 ### Calculate asymptotic complexity (Big O) |
| 126 Asymptotic complexity might be calculated for a family of benchmarks. The |
| 127 following code will calculate the coefficient for the high-order term in the |
| 128 running time and the normalized root-mean square error of string comparison. |
| 129 |
| 130 ```c++ |
| 131 static void BM_StringCompare(benchmark::State& state) { |
| 132 std::string s1(state.range(0), '-'); |
| 133 std::string s2(state.range(0), '-'); |
| 134 while (state.KeepRunning()) { |
| 135 benchmark::DoNotOptimize(s1.compare(s2)); |
| 136 } |
| 137 state.SetComplexityN(state.range(0)); |
| 138 } |
| 139 BENCHMARK(BM_StringCompare) |
| 140 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); |
| 141 ``` |
| 142 |
| 143 As shown in the following invocation, asymptotic complexity might also be |
| 144 calculated automatically. |
| 145 |
| 146 ```c++ |
| 147 BENCHMARK(BM_StringCompare) |
| 148 ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); |
| 149 ``` |
| 150 |
| 151 The following code will specify asymptotic complexity with a lambda function, |
| 152 that might be used to customize high-order term calculation. |
| 153 |
| 154 ```c++ |
| 155 BENCHMARK(BM_StringCompare)->RangeMultiplier(2) |
| 156 ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; }); |
| 157 ``` |
| 158 |
| 159 ### Templated benchmarks |
| 160 Templated benchmarks work the same way: This example produces and consumes |
| 161 messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the |
| 162 absence of multiprogramming. |
| 163 |
| 164 ```c++ |
| 165 template <class Q> int BM_Sequential(benchmark::State& state) { |
| 166 Q q; |
| 167 typename Q::value_type v; |
| 168 while (state.KeepRunning()) { |
| 169 for (int i = state.range(0); i--; ) |
| 170 q.push(v); |
| 171 for (int e = state.range(0); e--; ) |
| 172 q.Wait(&v); |
| 173 } |
| 174 // actually messages, not bytes: |
| 175 state.SetBytesProcessed( |
| 176 static_cast<int64_t>(state.iterations())*state.range(0)); |
| 177 } |
| 178 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); |
| 179 ``` |
| 180 |
| 181 Three macros are provided for adding benchmark templates. |
| 182 |
| 183 ```c++ |
| 184 #if __cplusplus >= 201103L // C++11 and greater. |
| 185 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. |
| 186 #else // C++ < C++11 |
| 187 #define BENCHMARK_TEMPLATE(func, arg1) |
| 188 #endif |
| 189 #define BENCHMARK_TEMPLATE1(func, arg1) |
| 190 #define BENCHMARK_TEMPLATE2(func, arg1, arg2) |
| 191 ``` |
| 192 |
| 193 ## Passing arbitrary arguments to a benchmark |
| 194 In C++11 it is possible to define a benchmark that takes an arbitrary number |
| 195 of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` |
| 196 macro creates a benchmark that invokes `func` with the `benchmark::State` as |
| 197 the first argument followed by the specified `args...`. |
| 198 The `test_case_name` is appended to the name of the benchmark and |
| 199 should describe the values passed. |
| 200 |
| 201 ```c++ |
| 202 template <class ...ExtraArgs>` |
| 203 void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { |
| 204 [...] |
| 205 } |
| 206 // Registers a benchmark named "BM_takes_args/int_string_test` that passes |
| 207 // the specified values to `extra_args`. |
| 208 BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); |
| 209 ``` |
| 210 Note that elements of `...args` may refer to global variables. Users should |
| 211 avoid modifying global state inside of a benchmark. |
| 212 |
| 213 ## Using RegisterBenchmark(name, fn, args...) |
| 214 |
| 215 The `RegisterBenchmark(name, func, args...)` function provides an alternative |
| 216 way to create and register benchmarks. |
| 217 `RegisterBenchmark(name, func, args...)` creates, registers, and returns a |
| 218 pointer to a new benchmark with the specified `name` that invokes |
| 219 `func(st, args...)` where `st` is a `benchmark::State` object. |
| 220 |
| 221 Unlike the `BENCHMARK` registration macros, which can only be used at the global |
| 222 scope, the `RegisterBenchmark` can be called anywhere. This allows for |
| 223 benchmark tests to be registered programmatically. |
| 224 |
| 225 Additionally `RegisterBenchmark` allows any callable object to be registered |
| 226 as a benchmark. Including capturing lambdas and function objects. This |
| 227 allows the creation |
| 228 |
| 229 For Example: |
| 230 ```c++ |
| 231 auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; |
| 232 |
| 233 int main(int argc, char** argv) { |
| 234 for (auto& test_input : { /* ... */ }) |
| 235 benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); |
| 236 benchmark::Initialize(&argc, argv); |
| 237 benchmark::RunSpecifiedBenchmarks(); |
| 238 } |
| 239 ``` |
| 240 |
| 241 ### Multithreaded benchmarks |
| 242 In a multithreaded test (benchmark invoked by multiple threads simultaneously), |
| 243 it is guaranteed that none of the threads will start until all have called |
| 244 `KeepRunning`, and all will have finished before KeepRunning returns false. As |
| 245 such, any global setup or teardown can be wrapped in a check against the thread |
| 246 index: |
| 247 |
| 248 ```c++ |
| 249 static void BM_MultiThreaded(benchmark::State& state) { |
| 250 if (state.thread_index == 0) { |
| 251 // Setup code here. |
| 252 } |
| 253 while (state.KeepRunning()) { |
| 254 // Run the test as normal. |
| 255 } |
| 256 if (state.thread_index == 0) { |
| 257 // Teardown code here. |
| 258 } |
| 259 } |
| 260 BENCHMARK(BM_MultiThreaded)->Threads(2); |
| 261 ``` |
| 262 |
| 263 If the benchmarked code itself uses threads and you want to compare it to |
| 264 single-threaded code, you may want to use real-time ("wallclock") measurements |
| 265 for latency comparisons: |
| 266 |
| 267 ```c++ |
| 268 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); |
| 269 ``` |
| 270 |
| 271 Without `UseRealTime`, CPU time is used by default. |
| 272 |
| 273 |
| 274 ## Manual timing |
| 275 For benchmarking something for which neither CPU time nor real-time are |
| 276 correct or accurate enough, completely manual timing is supported using |
| 277 the `UseManualTime` function. |
| 278 |
| 279 When `UseManualTime` is used, the benchmarked code must call |
| 280 `SetIterationTime` once per iteration of the `KeepRunning` loop to |
| 281 report the manually measured time. |
| 282 |
| 283 An example use case for this is benchmarking GPU execution (e.g. OpenCL |
| 284 or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot |
| 285 be accurately measured using CPU time or real-time. Instead, they can be |
| 286 measured accurately using a dedicated API, and these measurement results |
| 287 can be reported back with `SetIterationTime`. |
| 288 |
| 289 ```c++ |
| 290 static void BM_ManualTiming(benchmark::State& state) { |
| 291 int microseconds = state.range(0); |
| 292 std::chrono::duration<double, std::micro> sleep_duration { |
| 293 static_cast<double>(microseconds) |
| 294 }; |
| 295 |
| 296 while (state.KeepRunning()) { |
| 297 auto start = std::chrono::high_resolution_clock::now(); |
| 298 // Simulate some useful workload with a sleep |
| 299 std::this_thread::sleep_for(sleep_duration); |
| 300 auto end = std::chrono::high_resolution_clock::now(); |
| 301 |
| 302 auto elapsed_seconds = |
| 303 std::chrono::duration_cast<std::chrono::duration<double>>( |
| 304 end - start); |
| 305 |
| 306 state.SetIterationTime(elapsed_seconds.count()); |
| 307 } |
| 308 } |
| 309 BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); |
| 310 ``` |
| 311 |
| 312 ### Preventing optimisation |
| 313 To prevent a value or expression from being optimized away by the compiler |
| 314 the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` |
| 315 functions can be used. |
| 316 |
| 317 ```c++ |
| 318 static void BM_test(benchmark::State& state) { |
| 319 while (state.KeepRunning()) { |
| 320 int x = 0; |
| 321 for (int i=0; i < 64; ++i) { |
| 322 benchmark::DoNotOptimize(x += i); |
| 323 } |
| 324 } |
| 325 } |
| 326 ``` |
| 327 |
| 328 `DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either |
| 329 memory or a register. For GNU based compilers it acts as read/write barrier |
| 330 for global memory. More specifically it forces the compiler to flush pending |
| 331 writes to memory and reload any other values as necessary. |
| 332 |
| 333 Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` |
| 334 in any way. `<expr>` may even be removed entirely when the result is already |
| 335 known. For example: |
| 336 |
| 337 ```c++ |
| 338 /* Example 1: `<expr>` is removed entirely. */ |
| 339 int foo(int x) { return x + 42; } |
| 340 while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); |
| 341 |
| 342 /* Example 2: Result of '<expr>' is only reused */ |
| 343 int bar(int) __attribute__((const)); |
| 344 while (...) DoNotOptimize(bar(0)); // Optimized to: |
| 345 // int __result__ = bar(0); |
| 346 // while (...) DoNotOptimize(__result__); |
| 347 ``` |
| 348 |
| 349 The second tool for preventing optimizations is `ClobberMemory()`. In essence |
| 350 `ClobberMemory()` forces the compiler to perform all pending writes to global |
| 351 memory. Memory managed by block scope objects must be "escaped" using |
| 352 `DoNotOptimize(...)` before it can be clobbered. In the below example |
| 353 `ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized |
| 354 away. |
| 355 |
| 356 ```c++ |
| 357 static void BM_vector_push_back(benchmark::State& state) { |
| 358 while (state.KeepRunning()) { |
| 359 std::vector<int> v; |
| 360 v.reserve(1); |
| 361 benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. |
| 362 v.push_back(42); |
| 363 benchmark::ClobberMemory(); // Force 42 to be written to memory. |
| 364 } |
| 365 } |
| 366 ``` |
| 367 |
| 368 Note that `ClobberMemory()` is only available for GNU or MSVC based compilers. |
| 369 |
| 370 ### Set time unit manually |
| 371 If a benchmark runs a few milliseconds it may be hard to visually compare the |
| 372 measured times, since the output data is given in nanoseconds per default. In |
| 373 order to manually set the time unit, you can specify it manually: |
| 374 |
| 375 ```c++ |
| 376 BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); |
| 377 ``` |
| 378 |
| 379 ## Controlling number of iterations |
| 380 In all cases, the number of iterations for which the benchmark is run is |
| 381 governed by the amount of time the benchmark takes. Concretely, the number of |
| 382 iterations is at least one, not more than 1e9, until CPU time is greater than |
| 383 the minimum time, or the wallclock time is 5x minimum time. The minimum time is |
| 384 set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on |
| 385 the registered benchmark object. |
| 386 |
| 387 ## Reporting the mean and standard devation by repeated benchmarks |
| 388 By default each benchmark is run once and that single result is reported. |
| 389 However benchmarks are often noisy and a single result may not be representative |
| 390 of the overall behavior. For this reason it's possible to repeatedly rerun the |
| 391 benchmark. |
| 392 |
| 393 The number of runs of each benchmark is specified globally by the |
| 394 `--benchmark_repetitions` flag or on a per benchmark basis by calling |
| 395 `Repetitions` on the registered benchmark object. When a benchmark is run |
| 396 more than once the mean and standard deviation of the runs will be reported. |
| 397 |
| 398 Additionally the `--benchmark_report_aggregates_only={true|false}` flag or |
| 399 `ReportAggregatesOnly(bool)` function can be used to change how repeated tests |
| 400 are reported. By default the result of each repeated run is reported. When this |
| 401 option is 'true' only the mean and standard deviation of the runs is reported. |
| 402 Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides |
| 403 the value of the flag for that benchmark. |
| 404 |
| 405 ## Fixtures |
| 406 Fixture tests are created by |
| 407 first defining a type that derives from ::benchmark::Fixture and then |
| 408 creating/registering the tests using the following macros: |
| 409 |
| 410 * `BENCHMARK_F(ClassName, Method)` |
| 411 * `BENCHMARK_DEFINE_F(ClassName, Method)` |
| 412 * `BENCHMARK_REGISTER_F(ClassName, Method)` |
| 413 |
| 414 For Example: |
| 415 |
| 416 ```c++ |
| 417 class MyFixture : public benchmark::Fixture {}; |
| 418 |
| 419 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { |
| 420 while (st.KeepRunning()) { |
| 421 ... |
| 422 } |
| 423 } |
| 424 |
| 425 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { |
| 426 while (st.KeepRunning()) { |
| 427 ... |
| 428 } |
| 429 } |
| 430 /* BarTest is NOT registered */ |
| 431 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); |
| 432 /* BarTest is now registered */ |
| 433 ``` |
| 434 |
| 435 |
| 436 ## User-defined counters |
| 437 |
| 438 You can add your own counters with user-defined names. The example below |
| 439 will add columns "Foo", "Bar" and "Baz" in its output: |
| 440 |
| 441 ```c++ |
| 442 static void UserCountersExample1(benchmark::State& state) { |
| 443 double numFoos = 0, numBars = 0, numBazs = 0; |
| 444 while (state.KeepRunning()) { |
| 445 // ... count Foo,Bar,Baz events |
| 446 } |
| 447 state.counters["Foo"] = numFoos; |
| 448 state.counters["Bar"] = numBars; |
| 449 state.counters["Baz"] = numBazs; |
| 450 } |
| 451 ``` |
| 452 |
| 453 The `state.counters` object is a `std::map` with `std::string` keys |
| 454 and `Counter` values. The latter is a `double`-like class, via an implicit |
| 455 conversion to `double&`. Thus you can use all of the standard arithmetic |
| 456 assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter. |
| 457 |
| 458 In multithreaded benchmarks, each counter is set on the calling thread only. |
| 459 When the benchmark finishes, the counters from each thread will be summed; |
| 460 the resulting sum is the value which will be shown for the benchmark. |
| 461 |
| 462 The `Counter` constructor accepts two parameters: the value as a `double` |
| 463 and a bit flag which allows you to show counters as rates and/or as |
| 464 per-thread averages: |
| 465 |
| 466 ```c++ |
| 467 // sets a simple counter |
| 468 state.counters["Foo"] = numFoos; |
| 469 |
| 470 // Set the counter as a rate. It will be presented divided |
| 471 // by the duration of the benchmark. |
| 472 state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate); |
| 473 |
| 474 // Set the counter as a thread-average quantity. It will |
| 475 // be presented divided by the number of threads. |
| 476 state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads); |
| 477 |
| 478 // There's also a combined flag: |
| 479 state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreads
Rate); |
| 480 ``` |
| 481 |
| 482 When you're compiling in C++11 mode or later you can use `insert()` with |
| 483 `std::initializer_list`: |
| 484 |
| 485 ```c++ |
| 486 // With C++11, this can be done: |
| 487 state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}}); |
| 488 // ... instead of: |
| 489 state.counters["Foo"] = numFoos; |
| 490 state.counters["Bar"] = numBars; |
| 491 state.counters["Baz"] = numBazs; |
| 492 ``` |
| 493 |
| 494 ### Counter reporting |
| 495 |
| 496 When using the console reporter, by default, user counters are are printed at |
| 497 the end after the table, the same way as ``bytes_processed`` and |
| 498 ``items_processed``. This is best for cases in which there are few counters, |
| 499 or where there are only a couple of lines per benchmark. Here's an example of |
| 500 the default output: |
| 501 |
| 502 ``` |
| 503 ------------------------------------------------------------------------------ |
| 504 Benchmark Time CPU Iterations UserCounters... |
| 505 ------------------------------------------------------------------------------ |
| 506 BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz
=24 Foo=8 |
| 507 BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3
Foo=1024m |
| 508 BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=
6 Foo=2 |
| 509 BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=
12 Foo=4 |
| 510 BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz
=24 Foo=8 |
| 511 BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz
=48 Foo=16 |
| 512 BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Ba
z=96 Foo=32 |
| 513 BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=
12 Foo=4 |
| 514 BM_Factorial 26 ns 26 ns 26608979 40320 |
| 515 BM_Factorial/real_time 26 ns 26 ns 26587936 40320 |
| 516 BM_CalculatePiRange/1 16 ns 16 ns 45704255 0 |
| 517 BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374 |
| 518 BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746 |
| 519 BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355 |
| 520 ``` |
| 521 |
| 522 If this doesn't suit you, you can print each counter as a table column by |
| 523 passing the flag `--benchmark_counters_tabular=true` to the benchmark |
| 524 application. This is best for cases in which there are a lot of counters, or |
| 525 a lot of lines per individual benchmark. Note that this will trigger a |
| 526 reprinting of the table header any time the counter set changes between |
| 527 individual benchmarks. Here's an example of corresponding output when |
| 528 `--benchmark_counters_tabular=true` is passed: |
| 529 |
| 530 ``` |
| 531 --------------------------------------------------------------------------------
------- |
| 532 Benchmark Time CPU Iterations Bar Bat Ba
z Foo |
| 533 --------------------------------------------------------------------------------
------- |
| 534 BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 2
4 8 |
| 535 BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5
3 1 |
| 536 BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10
6 2 |
| 537 BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 1
2 4 |
| 538 BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 2
4 8 |
| 539 BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 4
8 16 |
| 540 BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 9
6 32 |
| 541 BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 1
2 4 |
| 542 -------------------------------------------------------------- |
| 543 Benchmark Time CPU Iterations |
| 544 -------------------------------------------------------------- |
| 545 BM_Factorial 26 ns 26 ns 26392245 40320 |
| 546 BM_Factorial/real_time 26 ns 26 ns 26494107 40320 |
| 547 BM_CalculatePiRange/1 15 ns 15 ns 45571597 0 |
| 548 BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374 |
| 549 BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746 |
| 550 BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355 |
| 551 BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184 |
| 552 BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162 |
| 553 BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416 |
| 554 BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159 |
| 555 BM_CalculatePi/threads:8 2255 ns 9943 ns 70936 |
| 556 ``` |
| 557 Note above the additional header printed when the benchmark changes from |
| 558 ``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does |
| 559 not have the same counter set as ``BM_UserCounter``. |
| 560 |
| 561 ## Exiting Benchmarks in Error |
| 562 |
| 563 When errors caused by external influences, such as file I/O and network |
| 564 communication, occur within a benchmark the |
| 565 `State::SkipWithError(const char* msg)` function can be used to skip that run |
| 566 of benchmark and report the error. Note that only future iterations of the |
| 567 `KeepRunning()` are skipped. Users may explicitly return to exit the |
| 568 benchmark immediately. |
| 569 |
| 570 The `SkipWithError(...)` function may be used at any point within the benchmark, |
| 571 including before and after the `KeepRunning()` loop. |
| 572 |
| 573 For example: |
| 574 |
| 575 ```c++ |
| 576 static void BM_test(benchmark::State& state) { |
| 577 auto resource = GetResource(); |
| 578 if (!resource.good()) { |
| 579 state.SkipWithError("Resource is not good!"); |
| 580 // KeepRunning() loop will not be entered. |
| 581 } |
| 582 while (state.KeepRunning()) { |
| 583 auto data = resource.read_data(); |
| 584 if (!resource.good()) { |
| 585 state.SkipWithError("Failed to read data!"); |
| 586 break; // Needed to skip the rest of the iteration. |
| 587 } |
| 588 do_stuff(data); |
| 589 } |
| 590 } |
| 591 ``` |
| 592 |
| 593 ## Running a subset of the benchmarks |
| 594 |
| 595 The `--benchmark_filter=<regex>` option can be used to only run the benchmarks |
| 596 which match the specified `<regex>`. For example: |
| 597 |
| 598 ```bash |
| 599 $ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32 |
| 600 Run on (1 X 2300 MHz CPU ) |
| 601 2016-06-25 19:34:24 |
| 602 Benchmark Time CPU Iterations |
| 603 ---------------------------------------------------- |
| 604 BM_memcpy/32 11 ns 11 ns 79545455 |
| 605 BM_memcpy/32k 2181 ns 2185 ns 324074 |
| 606 BM_memcpy/32 12 ns 12 ns 54687500 |
| 607 BM_memcpy/32k 1834 ns 1837 ns 357143 |
| 608 ``` |
| 609 |
| 610 |
| 611 ## Output Formats |
| 612 The library supports multiple output formats. Use the |
| 613 `--benchmark_format=<console|json|csv>` flag to set the format type. `console` |
| 614 is the default format. |
| 615 |
| 616 The Console format is intended to be a human readable format. By default |
| 617 the format generates color output. Context is output on stderr and the |
| 618 tabular data on stdout. Example tabular output looks like: |
| 619 ``` |
| 620 Benchmark Time(ns) CPU(ns) Iterations |
| 621 ---------------------------------------------------------------------- |
| 622 BM_SetInsert/1024/1 28928 29349 23853 133.097k
B/s 33.2742k items/s |
| 623 BM_SetInsert/1024/8 32065 32913 21375 949.487k
B/s 237.372k items/s |
| 624 BM_SetInsert/1024/10 33157 33648 21431 1.13369M
B/s 290.225k items/s |
| 625 ``` |
| 626 |
| 627 The JSON format outputs human readable json split into two top level attributes. |
| 628 The `context` attribute contains information about the run in general, including |
| 629 information about the CPU and the date. |
| 630 The `benchmarks` attribute contains a list of ever benchmark run. Example json |
| 631 output looks like: |
| 632 ```json |
| 633 { |
| 634 "context": { |
| 635 "date": "2015/03/17-18:40:25", |
| 636 "num_cpus": 40, |
| 637 "mhz_per_cpu": 2801, |
| 638 "cpu_scaling_enabled": false, |
| 639 "build_type": "debug" |
| 640 }, |
| 641 "benchmarks": [ |
| 642 { |
| 643 "name": "BM_SetInsert/1024/1", |
| 644 "iterations": 94877, |
| 645 "real_time": 29275, |
| 646 "cpu_time": 29836, |
| 647 "bytes_per_second": 134066, |
| 648 "items_per_second": 33516 |
| 649 }, |
| 650 { |
| 651 "name": "BM_SetInsert/1024/8", |
| 652 "iterations": 21609, |
| 653 "real_time": 32317, |
| 654 "cpu_time": 32429, |
| 655 "bytes_per_second": 986770, |
| 656 "items_per_second": 246693 |
| 657 }, |
| 658 { |
| 659 "name": "BM_SetInsert/1024/10", |
| 660 "iterations": 21393, |
| 661 "real_time": 32724, |
| 662 "cpu_time": 33355, |
| 663 "bytes_per_second": 1199226, |
| 664 "items_per_second": 299807 |
| 665 } |
| 666 ] |
| 667 } |
| 668 ``` |
| 669 |
| 670 The CSV format outputs comma-separated values. The `context` is output on stderr |
| 671 and the CSV itself on stdout. Example CSV output looks like: |
| 672 ``` |
| 673 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label |
| 674 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, |
| 675 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, |
| 676 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, |
| 677 ``` |
| 678 |
| 679 ## Output Files |
| 680 The library supports writing the output of the benchmark to a file specified |
| 681 by `--benchmark_out=<filename>`. The format of the output can be specified |
| 682 using `--benchmark_out_format={json|console|csv}`. Specifying |
| 683 `--benchmark_out` does not suppress the console output. |
| 684 |
| 685 ## Debug vs Release |
| 686 By default, benchmark builds as a debug library. You will see a warning in the o
utput when this is the case. To build it as a release library instead, use: |
| 687 |
| 688 ``` |
| 689 cmake -DCMAKE_BUILD_TYPE=Release |
| 690 ``` |
| 691 |
| 692 To enable link-time optimisation, use |
| 693 |
| 694 ``` |
| 695 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true |
| 696 ``` |
| 697 |
| 698 ## Linking against the library |
| 699 When using gcc, it is necessary to link against pthread to avoid runtime excepti
ons. |
| 700 This is due to how gcc implements std::thread. |
| 701 See [issue #67](https://github.com/google/benchmark/issues/67) for more details. |
| 702 |
| 703 ## Compiler Support |
| 704 |
| 705 Google Benchmark uses C++11 when building the library. As such we require |
| 706 a modern C++ toolchain, both compiler and standard library. |
| 707 |
| 708 The following minimum versions are strongly recommended build the library: |
| 709 |
| 710 * GCC 4.8 |
| 711 * Clang 3.4 |
| 712 * Visual Studio 2013 |
| 713 * Intel 2015 Update 1 |
| 714 |
| 715 Anything older *may* work. |
| 716 |
| 717 Note: Using the library and its headers in C++03 is supported. C++11 is only |
| 718 required to build the library. |
| 719 |
| 720 # Known Issues |
| 721 |
| 722 ### Windows |
| 723 |
| 724 * Users must manually link `shlwapi.lib`. Failure to do so may result |
| 725 in unresolved symbols. |
| 726 |
OLD | NEW |