OLD | NEW |
(Empty) | |
| 1 The goal of this document is to give an overview of the exception handling |
| 2 options in breakpad. |
| 3 |
| 4 # Basics |
| 5 |
| 6 Exception handling is a mechanism designed to handle the occurrence of |
| 7 exceptions, special conditions that change the normal flow of program execution. |
| 8 |
| 9 `SetUnhandledExceptionFilter` replaces all unhandled exceptions when Breakpad is |
| 10 enabled. TODO: More on first and second change and vectored v. try/catch. |
| 11 |
| 12 There are two main types of exceptions across all platforms: in-process and |
| 13 out-of-process. |
| 14 |
| 15 # In-Process |
| 16 |
| 17 In process exception handling is relatively simple since the crashing process |
| 18 handles crash reporting. It is generally considered unsafe to write a minidump |
| 19 from a crashed process. For example, key data structures could be corrupted or |
| 20 the stack on which the exception handler runs could have been overwritten. For |
| 21 this reason all platforms also support some level of out-of-process exception |
| 22 handling. |
| 23 |
| 24 ## Windows |
| 25 |
| 26 In-process exception handling Breakpad creates a 'handler head' that waits |
| 27 infinitely on a semaphore at start up. When this thread is woken it writes the |
| 28 minidump and signals to the excepting thread that it may continue. A filter will |
| 29 tell the OS to kill the process if the minidump is written successfully. |
| 30 Otherwise it continues. |
| 31 |
| 32 # Out-of-Process |
| 33 |
| 34 Out-of-process exception handling is more complicated than in-process exception |
| 35 handling because of the need to set up a separate process that can read the |
| 36 state of the crashing process. |
| 37 |
| 38 ## Windows |
| 39 |
| 40 Breakpad uses two abstractions around the exception handler to make things work: |
| 41 `CrashGenerationServer` and `CrashGenerationClient`. The constructor for these |
| 42 takes a named pipe name. |
| 43 |
| 44 During server start up a named pipe and registers callbacks for client |
| 45 connections are created. The named pipe is used for registration and all IO on |
| 46 the pipe is done asynchronously. `OnPipeConnected` is called when a client |
| 47 attempts to connect (call `CreateFile` on the pipe). `OnPipeConnected` does the |
| 48 state machine transition from `Initial` to `Connecting` and on through |
| 49 `Reading`, `Reading_Done`, `Writing`, `Writing_Done`, `Reading_ACK`, and |
| 50 `Disconnecting`. |
| 51 |
| 52 When registering callbacks, the client passes in two pointers to pointers: 1. A |
| 53 pointer to the `EXCEPTION_INFO` pointer 1. A pointer to the `MDRawAssertionInfo` |
| 54 which handles various non-exception failures like assertions |
| 55 |
| 56 The essence of registration is adding a "`ClientInfo`" object that contains |
| 57 handles used for synchronization with the crashing process to an array |
| 58 maintained by the server. This is how we can keep track of all the clients on |
| 59 the system that have registered for minidumps. These handles are: * |
| 60 `server_died(mutex)` * `dump_requested(Event)` * `dump_generated(Event)` |
| 61 |
| 62 The server registers asynchronous waits on these events with the `ClientInfo` |
| 63 object as the callback context. When the `dump_requested` event is set by the |
| 64 client, the `OnDumpRequested()` callback is called. The server uses the handles |
| 65 inside `ClientInfo` to communicate with the child process. Once the child sets |
| 66 the event, it waits for two objects: 1. the `dump_generated` event 1. the |
| 67 `server_died` mutex |
| 68 |
| 69 In the end handles are "duped" into the client process, and the clients use |
| 70 `SetEvent` to request events, wait on the other event, or the `server_died` |
| 71 mutex. |
| 72 |
| 73 ## Linux |
| 74 |
| 75 ### Current Status |
| 76 |
| 77 As of July 2011, Linux had a minidump generator that is not entirely |
| 78 out-of-process. The minidump was generated from a separate process, but one that |
| 79 shared an address space, file descriptors, signal handles and much else with the |
| 80 crashing process. It worked by using the `clone()` system call to duplicate the |
| 81 crashing process, and then uses `ptrace()` and the `/proc` file system to |
| 82 retrieve the information required to write the minidump. Since then Breakpad has |
| 83 updated Linux exception handling to provide more benefits of out-of-process |
| 84 report generation. |
| 85 |
| 86 ### Proposed Design |
| 87 |
| 88 #### Overview |
| 89 |
| 90 Breakpad would use a per-user daemon to write out a minidump that does not have, |
| 91 interact with or depend on the crashing process. We don't want to start a new |
| 92 separate process every time a user launches a Breakpad-enabled process. Doing |
| 93 one daemon per machine is unacceptable for security concerns around one user |
| 94 being able to initiate a minidump generation for another user's process. |
| 95 |
| 96 #### Client/Server Communication |
| 97 |
| 98 On Breakpad initialization in a process, the initializer would check if the |
| 99 daemon is running and, if not, start it. The race condition between the check |
| 100 and the initialization is not a problem because multiple daemons can check if |
| 101 the IPC endpoint already exists and if a server is listening. Even if multiple |
| 102 copies of the daemon try to `bind()` the filesystem to name the socket, all but |
| 103 one will fail and can terminate. |
| 104 |
| 105 This point is relevant for error handling conditions. Linux does not clean the |
| 106 file system representation of a UNIX domain socket even if both endpoints |
| 107 terminate, so checking for existence is not strong enough. However checking the |
| 108 process list or sending a ping on the socket can handle this. |
| 109 |
| 110 Breakpad uses UNIX domain sockets since they support full duplex communication |
| 111 (unlike Windows, named pipes on Linux are half) and the kernal automatically |
| 112 creates a private channel between the client and server once the client calls |
| 113 `connect()`. |
| 114 |
| 115 #### Minidump Generation |
| 116 |
| 117 Breakpad could use the current system with `ptrace()` and `/proc` within the |
| 118 daemon executable. |
| 119 |
| 120 Overall the operations look like: 1. Signal from OS indicating crash 1. Signal |
| 121 Handler suspends all threads except itself 1. Signal Handler sends |
| 122 `CRASH_DUMP_REQUEST` message to server and waits for response 1. Server inspects |
| 123 1. Minidump is asynchronously written to disk by the server 1. Server responds |
| 124 indicating inspection is done |
| 125 |
| 126 ## Mac OSX |
| 127 |
| 128 Out-of-process exception handling is fully supported on Mac. |
OLD | NEW |