OLD | NEW |
(Empty) | |
| 1 # Breakpad Client Libraries |
| 2 |
| 3 ## Objective |
| 4 |
| 5 The Breakpad client libraries are responsible for monitoring an application for |
| 6 crashes (exceptions), handling them when they occur by generating a dump, and |
| 7 providing a means to upload dumps to a crash reporting server. These tasks are |
| 8 divided between the “handler” (short for “exception handler”) library linked in |
| 9 to an application being monitored for crashes, and the “sender” library, |
| 10 intended to be linked in to a separate external program. |
| 11 |
| 12 ## Background |
| 13 |
| 14 As one of the chief tasks of the client handler is to generate a dump, an |
| 15 understanding of [dump files](processor_design.md) will aid in understanding the |
| 16 handler. |
| 17 |
| 18 ## Overview |
| 19 |
| 20 Breakpad provides client libraries for each of its target platforms. Currently, |
| 21 these exist for Windows on x86 and Mac OS X on both x86 and PowerPC. A Linux |
| 22 implementation has been written and is currently under review. |
| 23 |
| 24 Because the mechanisms for catching exceptions and the methods for obtaining the |
| 25 information that a dump contains vary between operating systems, each target |
| 26 operating system requires a completely different handler implementation. Where |
| 27 multiple CPUs are supported for a single operating system, the handler |
| 28 implementation will likely also require separate code for each processor type to |
| 29 extract CPU-specific information. One of the goals of the Breakpad handler is to |
| 30 provide a prepackaged cross-platform system that masks many of these |
| 31 system-level differences and quirks from the application developer. Although the |
| 32 underlying implementations differ, the handler library for each system follows |
| 33 the same set of principles and exposes a similar interface. |
| 34 |
| 35 Code that wishes to take advantage of Breakpad should be linked against the |
| 36 handler library, and should, at an appropriate time, install a Breakpad handler. |
| 37 For applications, it is generally desirable to install the handler as early in |
| 38 the start-up process as possible. Developers of library code using Breakpad to |
| 39 monitor itself may wish to install a Breakpad handler when the library is |
| 40 loaded, or may only want to install a handler when calls are made in to the |
| 41 library. |
| 42 |
| 43 The handler can be triggered to generate a dump either by catching an exception |
| 44 or at the request of the application itself. The latter case may be useful in |
| 45 debugging assertions or other conditions where developers want to know how a |
| 46 program got in to a specific non-crash state. After generating a dump, the |
| 47 handler calls a user-specified callback function. The callback function may |
| 48 collect additional data about the program’s state, quit the program, launch a |
| 49 crash reporter application, or perform other tasks. Allowing for this |
| 50 functionality to be dictated by a callback function preserves flexibility. |
| 51 |
| 52 The sender library is also has a separate implementation for each supported |
| 53 platform, because of the varying interfaces for accessing network resources on |
| 54 different operating systems. The sender transmits a dump along with other |
| 55 application-defined information to a crash report server via HTTP. Because dumps |
| 56 may contain sensitive data, the sender allows for the use of HTTPS. |
| 57 |
| 58 The canonical example of the entire client system would be for a monitored |
| 59 application to link against the handler library, install a Breakpad handler from |
| 60 its main function, and provide a callback to launch a small crash reporter |
| 61 program. The crash reporter program would be linked against the sender library, |
| 62 and would send the crash dump when launched. A separate process is recommended |
| 63 for this function because of the unreliability inherent in doing any significant |
| 64 amount of work from a crashed process. |
| 65 |
| 66 ## Detailed Design |
| 67 |
| 68 ### Exception Handler Installation |
| 69 |
| 70 The mechanisms for installing an exception handler vary between operating |
| 71 systems. On Windows, it’s a relatively simple matter of making one call to |
| 72 register a [top-level exception filter] |
| 73 (http://msdn.microsoft.com/library/en-us/debug/base/setunhandledexceptionfilter.
asp) |
| 74 callback function. On most Unix-like systems such as Linux, processes are |
| 75 informed of exceptions by the delivery of a signal, so an exception handler |
| 76 takes the form of a signal handler. The native mechanism to catch exceptions on |
| 77 Mac OS X requires a large amount of code to set up a Mach port, identify it as |
| 78 the exception port, and assign a thread to listen for an exception on that port. |
| 79 Just as the preparation of exception handlers differ, the manner in which they |
| 80 are called differs as well. On Windows and most Unix-like systems, the handler |
| 81 is called on the thread that caused the exception. On Mac OS X, the thread |
| 82 listening to the exception port is notified that an exception has occurred. The |
| 83 different implementations of the Breakpad handler libraries perform these tasks |
| 84 in the appropriate ways on each platform, while exposing a similar interface on |
| 85 each. |
| 86 |
| 87 A Breakpad handler is embodied in an `ExceptionHandler` object. Because it’s a |
| 88 C++ object, `ExceptionHandler`s may be created as local variables, allowing them |
| 89 to be installed and removed as functions are called and return. This provides |
| 90 one possible way for a developer to monitor only a portion of an application for |
| 91 crashes. |
| 92 |
| 93 ### Exception Basics |
| 94 |
| 95 Once an application encounters an exception, it is in an indeterminate and |
| 96 possibly hazardous state. Consequently, any code that runs after an exception |
| 97 occurs must take extreme care to avoid performing operations that might fail, |
| 98 hang, or cause additional exceptions. This task is not at all straightforward, |
| 99 and the Breakpad handler library seeks to do it properly, accounting for all of |
| 100 the minute details while allowing other application developers, even those with |
| 101 little systems programming experience, to reap the benefits. All of the Breakpad |
| 102 handler code that executes after an exception occurs has been written according |
| 103 to the following guidelines for safety at exception time: |
| 104 |
| 105 * Use of the application heap is forbidden. The heap may be corrupt or |
| 106 otherwise unusable, and allocators may not function. |
| 107 * Resource allocation must be severely limited. The handler may create a new |
| 108 file to contain the dump, and it may attempt to launch a process to continue |
| 109 handling the crash. |
| 110 * Execution on the thread that caused the exception is significantly limited. |
| 111 The only code permitted to execute on this thread is the code necessary to |
| 112 transition handling to a dedicated preallocated handler thread, and the code |
| 113 to return from the exception handler. |
| 114 * Handlers shouldn’t handle crashes by attempting to walk stacks themselves, |
| 115 as stacks may be in inconsistent states. Dump generation should be performed |
| 116 by interfacing with the operating system’s memory manager and code module |
| 117 manager. |
| 118 * Library code, including runtime library code, must be avoided unless it |
| 119 provably meets the above guidelines. For example, this means that the STL |
| 120 string class may not be used, because it performs operations that attempt to |
| 121 allocate and use heap memory. It also means that many C runtime functions |
| 122 must be avoided, particularly on Windows, because of heap operations that |
| 123 they may perform. |
| 124 |
| 125 A dedicated handler thread is used to preserve the state of the exception thread |
| 126 when an exception occurs: during dump generation, it is difficult if not |
| 127 impossible for a thread to accurately capture its own state. Performing all |
| 128 exception-handling functions on a separate thread is also critical when handling |
| 129 stack-limit-exceeded exceptions. It would be hazardous to run out of stack space |
| 130 while attempting to handle an exception. Because of the rule against allocating |
| 131 resources at exception time, the Breakpad handler library creates its handler |
| 132 thread when it installs its exception handler. On Mac OS X, this handler thread |
| 133 is created during the normal setup of the exception handler, and the handler |
| 134 thread will be signaled directly in the event of an exception. On Windows and |
| 135 Linux, the handler thread is signaled by a small amount of code that executes on |
| 136 the exception thread. Because the code that executes on the exception thread in |
| 137 this case is small and safe, this does not pose a problem. Even when an |
| 138 exception is caused by exceeding stack size limits, this code is sufficiently |
| 139 compact to execute entirely within the stack’s guard page without causing an |
| 140 exception. |
| 141 |
| 142 The handler thread may also be triggered directly by a user call, even when no |
| 143 exception occurs, to allow dumps to be generated at any point deemed |
| 144 interesting. |
| 145 |
| 146 ### Filter Callback |
| 147 |
| 148 When the handler thread begins handling an exception, it calls an optional |
| 149 user-defined filter callback function, which is responsible for judging whether |
| 150 Breakpad’s handler should continue handling the exception or not. This mechanism |
| 151 is provided for the benefit of library or plug-in code, whose developers may not |
| 152 be interested in reports of crashes that occur outside of their modules but |
| 153 within processes hosting their code. If the filter callback indicates that it is |
| 154 not interested in the exception, the Breakpad handler arranges for it to be |
| 155 delivered to any previously-installed handler. |
| 156 |
| 157 ### Dump Generation |
| 158 |
| 159 Assuming that the filter callback approves (or does not exist), the handler |
| 160 writes a dump in a directory specified by the application developer when the |
| 161 handler was installed, using a previously generated unique identifier to avoid |
| 162 name collisions. The mechanics of dump generation also vary between platforms, |
| 163 but in general, the process involves enumerating each thread of execution, and |
| 164 capturing its state, including processor context and the active portion of its |
| 165 stack area. The dump also includes a list of the code modules loaded in to the |
| 166 application, and an indicator of which thread generated the exception or |
| 167 requested the dump. In order to avoid allocating memory during this process, the |
| 168 dump is written in place on disk. |
| 169 |
| 170 ### Post-Dump Behavior |
| 171 |
| 172 Upon completion of writing the dump, a second callback function is called. This |
| 173 callback may be used to launch a separate crash reporting program or to collect |
| 174 additional data from the application. The callback may also be used to influence |
| 175 whether Breakpad will treat the exception as handled or unhandled. Even after a |
| 176 dump is successfully generated, Breakpad can be made to behave as though it |
| 177 didn’t actually handle an exception. This function may be useful for developers |
| 178 who want to test their applications with Breakpad enabled but still retain the |
| 179 ability to use traditional debugging techniques. It also allows a |
| 180 Breakpad-enabled application to coexist with a platform’s native crash reporting |
| 181 system, such as Mac OS X’ [CrashReporter] |
| 182 (http://developer.apple.com/technotes/tn2004/tn2123.html) and [Windows Error |
| 183 Reporting](http://msdn.microsoft.com/isv/resources/wer/). |
| 184 |
| 185 Typically, when Breakpad handles an exception fully and no debuggers are |
| 186 involved, the crashed process will terminate. |
| 187 |
| 188 Authors of both callback functions that execute within a Breakpad handler are |
| 189 cautioned that their code will be run at exception time, and that as a result, |
| 190 they should observe the same programming practices that the Breakpad handler |
| 191 itself adheres to. Notably, if a callback is to be used to collect additional |
| 192 data from an application, it should take care to read only “safe” data. This |
| 193 might involve accessing only static memory locations that are updated |
| 194 periodically during the course of normal program execution. |
| 195 |
| 196 ### Sender Library |
| 197 |
| 198 The Breakpad sender library provides a single function to send a crash report to |
| 199 a crash server. It accepts a crash server’s URL, a map of key-value parameters |
| 200 that will accompany the dump, and the path to a dump file itself. Each of the |
| 201 key-value parameters and the dump file are sent as distinct parts of a multipart |
| 202 HTTP POST request to the specified URL using the platform’s native HTTP |
| 203 facilities. On Linux, [libcurl](http://curl.haxx.se/) is used for this function, |
| 204 as it is the closest thing to a standard HTTP library available on that |
| 205 platform. |
| 206 |
| 207 ## Future Plans |
| 208 |
| 209 Although we’ve had great success with in-process dump generation by following |
| 210 our guidelines for safe code at exception time, we are exploring options for |
| 211 allowing dumps to be generated in a separate process, to further enhance the |
| 212 handler library’s robustness. |
| 213 |
| 214 On Windows, we intend to offer tools to make it easier for Breakpad’s settings |
| 215 to be managed by the native group policy management system. |
| 216 |
| 217 We also plan to offer tools that many developers would find desirable in the |
| 218 context of handling crashes, such as a mechanism to determine at launch if the |
| 219 program last terminated in a crash, and a way to calculate “crashiness” in terms |
| 220 of crashes over time or the number of application launches between crashes. |
| 221 |
| 222 We are also investigating methods to capture crashes that occur early in an |
| 223 application’s launch sequence, including crashes that occur before a program’s |
| 224 main function begins executing. |
OLD | NEW |