| OLD | NEW |
| 1 IMPORTANT NOTE FOR 64-BIT USERS | 1 IMPORTANT NOTE FOR 64-BIT USERS |
| 2 ------------------------------- | 2 ------------------------------- |
| 3 There are known issues with some perftools functionality on x86_64 | 3 There are known issues with some perftools functionality on x86_64 |
| 4 systems. See 64-BIT ISSUES, below. | 4 systems. See 64-BIT ISSUES, below. |
| 5 | 5 |
| 6 | 6 |
| 7 TCMALLOC | 7 TCMALLOC |
| 8 -------- | 8 -------- |
| 9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of | 9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of |
| 10 tcmalloc -- a replacement for malloc and new. See below for some | 10 tcmalloc -- a replacement for malloc and new. See below for some |
| 11 environment variables you can use with tcmalloc, as well. | 11 environment variables you can use with tcmalloc, as well. |
| 12 | 12 |
| 13 tcmalloc functionality is available on all systems we've tested; see | 13 tcmalloc functionality is available on all systems we've tested; see |
| 14 INSTALL for more details. See README_windows.txt for instructions on | 14 INSTALL for more details. See README_windows.txt for instructions on |
| 15 using tcmalloc on Windows. | 15 using tcmalloc on Windows. |
| 16 | 16 |
| 17 NOTE: When compiling with programs with gcc, that you plan to link | 17 NOTE: When compiling with programs with gcc, that you plan to link |
| 18 with libtcmalloc, it's safest to pass in the flags | 18 with libtcmalloc, it's safest to pass in the flags |
| 19 | 19 |
| 20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free | 20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free |
| 21 | 21 |
| 22 when compiling. gcc makes some optimizations assuming it is using its | 22 when compiling. gcc makes some optimizations assuming it is using its |
| 23 own, built-in malloc; that assumption obviously isn't true with | 23 own, built-in malloc; that assumption obviously isn't true with |
| 24 tcmalloc. In practice, we haven't seen any problems with this, but | 24 tcmalloc. In practice, we haven't seen any problems with this, but |
| 25 the expected risk is highest for users who register their own malloc | 25 the expected risk is highest for users who register their own malloc |
| 26 hooks with tcmalloc (using google/malloc_hook.h). The risk is lowest | 26 hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is |
| 27 for folks who use tcmalloc_minimal (or, of course, who pass in the | 27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in |
| 28 above flags :-) ). | 28 the above flags :-) ). |
| 29 | 29 |
| 30 | 30 |
| 31 HEAP PROFILER | 31 HEAP PROFILER |
| 32 ------------- | 32 ------------- |
| 33 See doc/heap-profiler.html for information about how to use tcmalloc's | 33 See doc/heap-profiler.html for information about how to use tcmalloc's |
| 34 heap profiler and analyze its output. | 34 heap profiler and analyze its output. |
| 35 | 35 |
| 36 As a quick-start, do the following after installing this package: | 36 As a quick-start, do the following after installing this package: |
| 37 | 37 |
| 38 1) Link your executable with -ltcmalloc | 38 1) Link your executable with -ltcmalloc |
| (...skipping 181 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 220 Its likeliness increases the more dlopen() commands an executable has. | 220 Its likeliness increases the more dlopen() commands an executable has. |
| 221 Most executables don't have any, though several library routines like | 221 Most executables don't have any, though several library routines like |
| 222 getgrgid() call dlopen() behind the scenes. | 222 getgrgid() call dlopen() behind the scenes. |
| 223 | 223 |
| 224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the | 224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the |
| 225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes | 225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes |
| 226 cause a segfault. I'll explain the problem first, and then some | 226 cause a segfault. I'll explain the problem first, and then some |
| 227 workarounds. | 227 workarounds. |
| 228 | 228 |
| 229 Note that this only affects the cpu-profiler, which is a | 229 Note that this only affects the cpu-profiler, which is a |
| 230 google-perftools feature you must turn on manually by setting the | 230 gperftools feature you must turn on manually by setting the |
| 231 CPUPROFILE environment variable. If you do not turn on cpu-profiling, | 231 CPUPROFILE environment variable. If you do not turn on cpu-profiling, |
| 232 you shouldn't see any crashes due to perftools. | 232 you shouldn't see any crashes due to perftools. |
| 233 | 233 |
| 234 The gory details: The underlying problem is in the backtrace() | 234 The gory details: The underlying problem is in the backtrace() |
| 235 function, which is a built-in function in libc. | 235 function, which is a built-in function in libc. |
| 236 Backtracing is fairly straightforward in the normal case, but can run | 236 Backtracing is fairly straightforward in the normal case, but can run |
| 237 into problems when having to backtrace across a signal frame. | 237 into problems when having to backtrace across a signal frame. |
| 238 Unfortunately, the cpu-profiler uses signals in order to register a | 238 Unfortunately, the cpu-profiler uses signals in order to register a |
| 239 profiling event, so every backtrace that the profiler does crosses a | 239 profiling event, so every backtrace that the profiler does crosses a |
| 240 signal frame. | 240 signal frame. |
| (...skipping 15 matching lines...) Expand all Loading... |
| 256 your code, rather than setting CPUPROFILE. This will profile only | 256 your code, rather than setting CPUPROFILE. This will profile only |
| 257 those sections of the codebase. Though we haven't done much testing, | 257 those sections of the codebase. Though we haven't done much testing, |
| 258 in theory this should reduce the chance of crashes by limiting the | 258 in theory this should reduce the chance of crashes by limiting the |
| 259 signal generation to only a small part of the codebase. Ideally, you | 259 signal generation to only a small part of the codebase. Ideally, you |
| 260 would not use ProfilerStart()/ProfilerStop() around code that spawns | 260 would not use ProfilerStart()/ProfilerStop() around code that spawns |
| 261 new threads, or is otherwise likely to cause a call to | 261 new threads, or is otherwise likely to cause a call to |
| 262 pthread_mutex_lock! | 262 pthread_mutex_lock! |
| 263 | 263 |
| 264 --- | 264 --- |
| 265 17 May 2011 | 265 17 May 2011 |
| OLD | NEW |