OLD | NEW |
(Empty) | |
| 1 # Breakpad Processor Library |
| 2 |
| 3 ## Objective |
| 4 |
| 5 The Breakpad processor library is an open-source framework to access the the |
| 6 information contained within crash dumps for multiple platforms, and to use that |
| 7 information to produce stack traces showing the call chain of each thread in a |
| 8 process. After processing, this data is made available to users of the library. |
| 9 |
| 10 ## Background |
| 11 |
| 12 The Breakpad processor is intended to sit at the core of a comprehensive |
| 13 crash-reporting system that does not require debugging information to be |
| 14 provided to those running applications being monitored. Some existing |
| 15 crash-reporting systems, such as [GNOME](http://www.gnome.org/)’s Bug-Buddy and |
| 16 [Apple](http://www.apple.com/)’s [CrashReporter] |
| 17 (http://developer.apple.com/technotes/tn2004/tn2123.html), require symbolic |
| 18 information to be present on the end user’s computer; in the case of |
| 19 CrashReporter, the reports are transmitted only to Apple, not to third-party |
| 20 developers. Other systems, such as [Microsoft](http://www.microsoft.com/)’s |
| 21 [Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/) and |
| 22 SupportSoft’s Talkback, transmit only a snapshot of a crashed process’ state, |
| 23 which can later be combined with symbolic debugging information without the need |
| 24 for it to be present on end users’ computers. Because symbolic debugging |
| 25 information consumes a large amount of space and is otherwise not needed during |
| 26 the normal operation of software, and because some developers are reluctant to |
| 27 release debugging symbols to their customers, Breakpad follows the latter |
| 28 approach. |
| 29 |
| 30 We know of no currently-maintained crash-reporting systems that meet our |
| 31 requirements, which are to: * allow for symbols to be separate from the |
| 32 application, * handle crash reports from multiple platforms, * allow developers |
| 33 to operate their own crash-reporting platform, and to * be open-source. Windows |
| 34 Error Reporting only functions for Microsoft products, and requires the |
| 35 involvement of Microsoft’s servers. Talkback, while cross-platform, has not been |
| 36 maintained and at this point does not support Mac OS X on x86, which we consider |
| 37 to be a significant platform. Talkback is also closed-source commercial |
| 38 software, and has very specific requirements for its server platform. |
| 39 |
| 40 We are aware of Windows-only crash-reporting systems that leverage Microsoft’s |
| 41 debugging interfaces. Such systems, even if extended to support dumps from other |
| 42 platforms, are tied to using Windows for at least a portion of the processor |
| 43 platform. |
| 44 |
| 45 ## Overview |
| 46 |
| 47 The Breakpad processor itself is written in standard C++ and will work on a |
| 48 variety of platforms. The dumps it accepts may also have been created on a |
| 49 variety of systems. The library is able to combine dumps with symbolic debugging |
| 50 information to create stack traces that include function signatures. The |
| 51 processor library includes simple command-line tools to examine dumps and |
| 52 process them, producing stack traces. It also exposes several layers of APIs |
| 53 enabling crash-reporting systems to be built around the Breakpad processor. |
| 54 |
| 55 ## Detailed Design |
| 56 |
| 57 ### Dump Files |
| 58 |
| 59 In the processor, the dump data is of primary significance. Dumps typically |
| 60 contain: |
| 61 |
| 62 * CPU context (register data) as it was at the time the crash occurred, and an |
| 63 indication of which thread caused the crash. General-purpose registers are |
| 64 included, as are special-purpose registers such as the instruction pointer |
| 65 (program counter). |
| 66 * Information about each thread of execution within a crashed process, |
| 67 including: |
| 68 * The memory region used for each thread’s stack. |
| 69 * CPU context for each thread, which for various reasons is not the same |
| 70 as the crash context in the case of the crashed thread. |
| 71 * A list of loaded code segments (or modules), including: |
| 72 * The name of the file (`.so`, `.exe`, `.dll`, etc.) which provides the |
| 73 code. |
| 74 * The boundaries of the memory region in which the code segment is visible |
| 75 to the process. |
| 76 * A reference to the debugging information for the code module, when such |
| 77 information is available. |
| 78 |
| 79 Ordinarily, dumps are produced as a result of a crash, but other triggers may be |
| 80 set to produce dumps at any time a developer deems appropriate. The Breakpad |
| 81 processor can handle dumps in the minidump format, either generated by an |
| 82 [Breakpad client “handler”](client_design.md) implementation, or by another |
| 83 implementation that produces dumps in this format. The |
| 84 [DbgHelp.dll!MiniDumpWriteDump] |
| 85 (http://msdn2.microsoft.com/en-us/library/ms680360.aspx) function on Windows |
| 86 produces dumps in this format, and is the basis for the Breakpad handler |
| 87 implementation on that platform. |
| 88 |
| 89 The [minidump format] |
| 90 (http://msdn.microsoft.com/en-us/library/ms679293%28VS.85%29.aspx) is |
| 91 essentially a simple container format, organized as a series of streams. Each |
| 92 stream contains some type of data relevant to the crash. A typical “normal” |
| 93 minidump contains streams for the thread list, the module list, the CPU context |
| 94 at the time of the crash, and various bits of additional system information. |
| 95 Other types of minidump can be generated, such as a full-memory minidump, which |
| 96 in addition to stack memory contains snapshots of all of a process’ mapped |
| 97 memory regions. |
| 98 |
| 99 The minidump format was chosen as Breakpad’s dump format because it has an |
| 100 established track record on Windows, and it can be adapted to meet the needs of |
| 101 the other platforms that Breakpad supports. Most other operating systems use |
| 102 “core” files as their native dump formats, but the capabilities of core files |
| 103 vary across platforms, and because core files are usually presented in a |
| 104 platform’s native executable format, there are complications involved in |
| 105 accessing the data contained therein without the benefit of the header files |
| 106 that define an executable format’s entire structure. Because minidumps are |
| 107 leaner than a typical executable format, a redefinition of the format in a |
| 108 cross-platform header file, `minidump_format.h`, was a straightforward task. |
| 109 Similarly, the capabilities of the minidump format are understood, and because |
| 110 it provides an extensible container, any of Breakpad’s needs that could not be |
| 111 met directly by the standard minidump format could likely be met by extending it |
| 112 as needed. Finally, using this format means that the dump file is compatible |
| 113 with native debugging tools at least on Windows. A possible future avenue for |
| 114 exploration is the conversion of minidumps to core files, to enable this same |
| 115 benefit on other platforms. |
| 116 |
| 117 We have already provided an extension to the minidump format that allows it to |
| 118 carry dumps generated on systems with PowerPC processors. The format already |
| 119 allows for variable CPUs, so our work in this area was limited to defining a |
| 120 context structure sufficient to represent the execution state of a PowerPC. We |
| 121 have also defined an extension that allows minidumps to indicate which thread of |
| 122 execution requested a dump be produced for non-crash dumps. |
| 123 |
| 124 Often, the information contained within a dump alone is sufficient to produce a |
| 125 full stack backtrace for each thread. Certain optimizations that compilers |
| 126 employ in producing code frustrate this process. Specifically, the “frame |
| 127 pointer omission” optimization of x86 compilers can make it impossible to |
| 128 produce useful stack traces given only a stack snapshot and CPU context. In |
| 129 these cases, however, compiler-emitted debugging information can aid in |
| 130 producing useful stack traces. The Breakpad processor is able to take advantage |
| 131 of this debugging information as supplied by Microsoft’s C/C++ compiler, the |
| 132 only compiler to apply such optimizations by default. As a result, the Breakpad |
| 133 processor can produce useful stack traces even from code with frame pointer |
| 134 omission optimizations as produced by this compiler. |
| 135 |
| 136 ### Symbol Files |
| 137 |
| 138 The [symbol files](symbol_files.md) that the Breakpad processor accepts allow |
| 139 for frame pointer omission data, but this is only one of their capabilities. |
| 140 Each symbol file also includes information about the functions, source files, |
| 141 and source code line numbers for a single module of code. A module is an |
| 142 individually-loadble chunk of code: these can be executables containing a main |
| 143 program (`exe` files on Windows) or shared libraries (`.so` files on Linux, |
| 144 `.dylib` files, frameworks, and bundles on Mac OS X, and `.dll` files on |
| 145 Windows). Dumps contain information about which of these modules were loaded at |
| 146 the time the dump was produced, and given this information, the Breakpad |
| 147 processor attempts to locate debugging symbols for the module through a |
| 148 user-supplied function embodied in a “symbol supplier.” Breakpad includes a |
| 149 sample symbol supplier, called `SimpleSymbolSupplier`, that is used by its |
| 150 command-line tools; this supplier locates symbol files by pathname. |
| 151 `SimpleSymbolSupplier` is also available to other users of the Breakpad |
| 152 processor library. This allows for the use of a simple reference implementation, |
| 153 but preserves flexibility for users who may have more demanding symbol file |
| 154 storage needs. |
| 155 |
| 156 Breakpad’s symbol file format is text-based, and was defined to be fairly |
| 157 human-readable and to encompass the needs of multiple platforms. The Breakpad |
| 158 processor itself does not operate directly with native symbol formats ([DWARF] |
| 159 (http://dwarf.freestandards.org/) and [STABS] |
| 160 (http://sourceware.org/gdb/current/onlinedocs/stabs.html) on most Unix-like |
| 161 systems, [.pdb files] |
| 162 (http://msdn2.microsoft.com/en-us/library/yd4f8bd1(VS.80).aspx) on Windows), |
| 163 because of the complications in accessing potentially complex symbol formats |
| 164 with slight variations between platforms, stored within different types of |
| 165 binary formats. In the case of `.pdb` files, the debugging format is not even |
| 166 documented. Instead, Breakpad’s symbol files are produced on each platform, |
| 167 using specific debugging APIs where available, to convert native symbols to |
| 168 Breakpad’s cross-platform format. |
| 169 |
| 170 ### Processing |
| 171 |
| 172 Most commonly, a developer will enable an application to use Breakpad by |
| 173 building it with a platform-specific [client “handler”](client_design.md) |
| 174 library. After building the application, the developer will create symbol files |
| 175 for Breakpad’s use using the included `dump_syms` or `symupload` tools, or |
| 176 another suitable tool, and place the symbol files where the processor’s symbol |
| 177 supplier will be able to locate them. |
| 178 |
| 179 When a dump file is given to the processor’s `MinidumpProcessor` class, it will |
| 180 read it using its included minidump reader, contained in the `Minidump` family |
| 181 of classes. It will collect information about the operating system and CPU that |
| 182 produced the dump, and determine whether the dump was produced as a result of a |
| 183 crash or at the direct request of the application itself. It then loops over all |
| 184 of the threads in a process, attempting to walk the stack associated with each |
| 185 thread. This process is achieved by the processor’s `Stackwalker` components, of |
| 186 which there are a slightly different implementations for each CPU type that the |
| 187 processor is able to handle dumps from. Beginning with a thread’s context, and |
| 188 possibly using debugging data, the stackwalker produces a list of stack frames, |
| 189 containing each instruction executed in the chain. These instructions are |
| 190 matched up with the modules that contributed them to a process, and the |
| 191 `SymbolSupplier` is invoked to locate a symbol file. The symbol file is given to |
| 192 a `SourceLineResolver`, which matches the instruction up with a specific |
| 193 function name, source file, and line number, resulting in a representation of a |
| 194 stack frame that can easily be used to identify which code was executing. |
| 195 |
| 196 The results of processing are made available in a `ProcessState` object, which |
| 197 contains a vector of threads, each containing a vector of stack frames. |
| 198 |
| 199 For small-scale use of the Breakpad processor, and for testing and debugging, |
| 200 the `minidump_stackwalk` tool is provided. It invokes the processor and displays |
| 201 the full results of processing, optionally allowing symbols to be provided to |
| 202 the processor by a pathname-based symbol supplier, `SimpleSymbolSupplier`. |
| 203 |
| 204 For lower-level testing and debugging, the processor library also includes a |
| 205 `minidump_dump` tool, which walks through an entire minidump file and displays |
| 206 its contents in somewhat readable form. |
| 207 |
| 208 ### Platform Support |
| 209 |
| 210 The Breakpad processor library is able to process dumps produced on Mac OS X |
| 211 systems running on x86, x86-64, and PowerPC processors, on Windows and Linux |
| 212 systems running on x86 or x86-64 processors, and on Android systems running ARM |
| 213 or x86 processors. The processor library itself is written in standard C++, and |
| 214 should function properly in most Unix-like environments. It has been tested on |
| 215 Linux and Mac OS X. |
| 216 |
| 217 ## Future Plans |
| 218 |
| 219 There are currently no firm plans or timetables to implement any of these |
| 220 features, although they are possible avenues for future exploration. |
| 221 |
| 222 The symbol file format can be extended to carry information about the locations |
| 223 of parameters and local variables as stored in stack frames and registers, and |
| 224 the processor can use this information to provide enhanced stack traces showing |
| 225 function arguments and variable values. |
| 226 |
| 227 On Mac OS X and Linux, we can provide tools to convert files from the minidump |
| 228 format into the native core format. This will enable developers to open dump |
| 229 files in a native debugger, just as they are presently able to do with minidumps |
| 230 on Windows. |
OLD | NEW |