Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(562)

Side by Side Diff: third_party/crashpad/crashpad/doc/overview_design.md

Issue 2773813002: Update Crashpad to 8e37886d418dd042c3c7bfadac99214739ee4d98 (Closed)
Patch Set: Update Crashpad to 8e37886d418dd042c3c7bfadac99214739ee4d98 Created 3 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 <!--
2 Copyright 2017 The Crashpad Authors. All rights reserved.
3
4 Licensed under the Apache License, Version 2.0 (the "License");
5 you may not use this file except in compliance with the License.
6 You may obtain a copy of the License at
7
8 http://www.apache.org/licenses/LICENSE-2.0
9
10 Unless required by applicable law or agreed to in writing, software
11 distributed under the License is distributed on an "AS IS" BASIS,
12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 See the License for the specific language governing permissions and
14 limitations under the License.
15 -->
16
17 # Crashpad Overview Design
18
19 [TOC]
20
21 ## Objective
22
23 Crashpad is a library for capturing, storing and transmitting postmortem crash
24 reports from a client to an upstream collection server. Crashpad aims to make it
25 possible for clients to capture process state at the time of crash with the best
26 possible fidelity and coverage, with the minimum of fuss.
27
28 Crashpad also provides a facility for clients to capture dumps of process state
29 on-demand for diagnostic purposes.
30
31 Crashpad additionally provides minimal facilities for clients to adorn their
32 crashes with application-specific metadata in the form of per-process key/value
33 pairs. More sophisticated clients are able to adorn crash reports further
34 through extensibility points that allow the embedder to augment the crash report
35 with application-specific metadata.
36
37 ## Background
38
39 It’s an unfortunate truth that any large piece of software will contain bugs
40 that will cause it to occasionally crash. Even in the absence of bugs, software
41 incompatibilities can cause program instability.
42
43 Fixing bugs and incompatibilities in client software that ships to millions of
44 users around the world is a daunting task. User reports and manual reproduction
45 of crashes can work, but even given a user report, often times the problem is
46 not readily reproducible. This is for various reasons, such as e.g. system
47 version or third-party software incompatibility, or the problem can happen due
48 to a race of some sort. Users are also unlikely to report problems they
49 encounter, and user reports are often of poor quality, as unfortunately most
50 users don’t have experience with making good bug reports.
51
52 Automatic crash telemetry has been the best solution to the problem so far, as
53 this relieves the burden of manual reporting from users, while capturing the
54 hardware and software state at the time of crash.
55
56 TODO(siggi): examples of this?
57
58 Crash telemetry involves capturing postmortem crash dumps and transmitting them
59 to a backend collection server. On the server they can be stackwalked and
60 symbolized, and evaluated and aggregated in various ways. Stackwalking and
61 symbolizing the reports on an upstream server has several benefits over
62 performing these tasks on the client. High-fidelity stackwalking requires access
63 to bulky unwind data, and it may be desirable to not ship this to end users out
64 of concern for the application size. The process of symbolization requires
65 access to debugging symbols, which can be quite large, and the symbolization
66 process can consume considerable other resources. Transmitting un-stackwalked
67 and un-symbolized postmortem dumps to the collection server also allows deep
68 analysis of individual dumps, which is often necessary to resolve the bug
69 causing the crash.
70
71 Transmitting reports to the collection server allows aggregating crashes by
72 cause, which in turn allows assessing the importance of different crashes in
73 terms of the occurrence rate and e.g. the potential security impact.
74
75 A postmortem crash dump must contain the program state at the time of crash
76 with sufficient fidelity to allow diagnosing and fixing the problem. As the full
77 program state is usually too large to transmit to an upstream server, the
78 postmortem dump captures a heuristic subset of the full state.
79
80 The crashed program is in an indeterminate state and, in fact, has often crashed
81 because of corrupt global state - such as heap. It’s therefore important to
82 generate crash reports with as little execution in the crashed process as
83 possible. Different operating systems vary in the facilities they provide for
84 this.
85
86 ## Overview
87
88 Crashpad is a client-side library that focuses on capturing machine and program
89 state in a postmortem crash report, and transmitting this report to a backend
90 server - a “collection server”. The Crashpad library is embedded by the client
91 application. Conceptually, Crashpad breaks down into the handler and the client.
92 The handler runs in a separate process from the client or clients. It is
93 responsible for snapshotting the crashing client process’ state on a crash,
94 saving it to a crash dump, and transmitting the crash dump to an upstream
95 server. Clients register with the handler to allow it to capture and upload
96 their crashes.
97
98 ### The Crashpad handler
99
100 The Crashpad handler is instantiated in a process supplied by the embedding
101 application. It provides means for clients to register themselves by some means
102 of IPC, or where operating system support is available, by taking advantage of
103 such support to cause crash notifications to be delivered to the handler. On
104 crash, the handler snapshots the crashed client process’ state, writes it to a
105 postmortem dump in a database, and may also transmit the dump to an upstream
106 server if so configured.
107
108 The Crashpad handler is able to handle cross-bitted requests and generate crash
109 dumps across bitness, where e.g. the handler is a 64-bit process while the
110 client is a 32-bit process or vice versa. In the case of Windows, this is
111 limited by the OS such that a 32-bit handler can only generate crash dumps for
112 32-bit clients, but a 64-bit handler can acquire nearly all of the detail for a
113 32-bit process.
114
115 ### The Crashpad client
116
117 The Crashpad client provides two main facilities.
118 1. Registration with the Crashpad handler.
119 2. Metadata communication to the Crashpad handler on crash.
120
121 A Crashpad embedder links the Crashpad client library into one or more
122 executables, whether a loadable library or a program file. The client process
123 then registers with the Crashpad handler through some mode of IPC or other
124 operating system-specific support.
125
126 On crash, metadata is communicated to the Crashpad handler via the CrashpadInfo
127 structure. Each client executable module linking the Crashpad client library
128 embeds a CrashpadInfo structure, which can be updated by the client with
129 whatever state the client wishes to record with a crash.
130
131 ![Overview image](overview.png)
132
133 Here is an overview picture of the conceptual relationships between embedder (in
134 light blue), client modules (darker blue), and Crashpad (in green). Note that
135 multiple client modules can contain a CrashpadInfo structure, but only one
136 registration is necessary.
137
138 ## Detailed Design
139
140 ### Requirements
141
142 The purpose of Crashpad is to capture machine, OS and application state in
143 sufficient detail and fidelity to allow developers to diagnose and, where
144 possible, fix the issue causing the crash.
145
146 Each distinct crash report is assigned a globally unique ID, in order to allow
147 users to associate them with a user report, report in bug reports and so on.
148
149 It’s critical to safeguard the user’s privacy by ensuring that no crash report
150 is ever uploaded without user consent. Likewise it’s important to ensure that
151 Crashpad never captures or uploads reports from non-client processes.
152
153 ### Concepts
154
155 * **Client ID**. A UUID tied to a single instance of a Crashpad database. When
156 creating a crash report, the Crashpad handler includes the client ID stored
157 in the database. This provides a means to determine how many individual end
158 users are affected by a specific crash signature.
159
160 * **Crash ID**. A UUID representing a single crash report. Uploaded crash
161 reports also receive a “server ID.” The Crashpad database indexes both the
162 locally-generated and server-generated IDs.
163
164 * **Collection Server**. See [crash server documentation.](
165 https://goto.google.com/crash-server-overview)
166
167 * **Client Process**. Any process that has registered with a Crashpad handler.
168
169 * **Handler process**. A process hosting the Crashpad handler library. This may
170 be a dedicated executable, or it may be hosted within a client executable
171 with control passed to it based on special signaling under the client’s
172 control, such as a command-line parameter.
173
174 * **CrashpadInfo**. A structure used by client modules to provide information to
175 the handler.
176
177 * **Annotations**. Each CrashpadInfo structure points to a dictionary of
178 {string, string} annotations that the client can use to communicate
179 application state in the case of crash.
180
181 * **Database**. The Crashpad database contains persistent client settings as
182 well as crash dumps pending upload.
183
184 TODO(siggi): moar concepts?
185
186 ### Overview Picture
187
188 Here is a rough overview picture of the various Crashpad constructs, their
189 layering and intended use by clients.
190
191 ![Layering image](layering.png)
192
193 Dark blue boxes are interfaces, light blue boxes are implementation. Gray is the
194 embedding client application. Note that wherever possible, implementation that
195 necessarily has to be OS-specific, exposes OS-agnostic interfaces to the rest of
196 Crashpad and the client.
197
198 ### Registration
199
200 The particulars of how a client registers with the handler varies across
201 operating systems.
202
203 #### macOS
204
205 At registration time, the client designates a Mach port monitored by the
206 Crashpad handler as the EXC_CRASH exception port for the client. The port may be
207 acquired by launching a new handler process or by retrieving service already
208 registered with the system. The registration is maintained by the kernel and is
209 inherited by subprocesses at creation time by default, so only the topmost
210 process of a process tree need register.
211
212 Crashpad provides a facility for a process to disassociate (unregister) with an
213 existing crash handler, which can be necessary when an older client spawns an
214 updated version.
215
216 #### Windows
217
218 There are two modes of registration on Windows. In both cases the handler is
219 advised of the address of a set of structures in the client process’ address
220 space. These structures include a pair of ExceptionInformation structs, one for
221 generating a postmortem dump for a crashing process, and another one for
222 generating a dump for a non- crashing process.
223
224 ##### Normal registration
225
226 In the normal registration mode, the client connects to a named pipe by a
227 pre-arranged name. A registration request is written to the pipe. During
228 registration, the handler creates a set of events, duplicates them to the
229 registering client, then returns the handle values in the registration response.
230 This is a blocking process.
231
232 ##### Initial Handler Creation
233
234 In order to avoid blocking client startup for the creation and initialization of
235 the handler, a different mode of registration can be used for the handler
236 creation. In this mode, the client creates a set of event handles and inherits
237 them into the newly created handler process. The handler process is advised of
238 the handle values and the location of the ExceptionInformation structures by way
239 of command line arguments in this mode.
240
241 #### Linux/Android
242
243 TODO(mmentovai): describe this. See this preliminary doc.
244
245 ### Capturing Exceptions
246
247 The details of how Crashpad captures the exceptions leading to crashes varies
248 between operating systems.
249
250 #### macOS
251
252 On macOS, the operating system will notify the handler of client crashes via the
253 Mach port set as the client process’ exception port. As exceptions are
254 dispatched to the Mach port by the kernel, on macOS, exceptions can be handled
255 entirely from the Crashpad handler without the need to run any code in the crash
256 process at the time of the exception.
257
258 #### Windows
259
260 On Windows, the OS dispatches exceptions in the context of the crashing thread.
261 To notify the handler of exceptions, the Crashpad client registers an
262 UnhandledExceptionFilter (UEF) in the client process. When an exception trickles
263 up to the UEF, it stores the exception information and the crashing thread’s ID
264 in the ExceptionInformation structure registered with the handler. It then sets
265 an event handle to signal the handler to go ahead and process the exception.
266
267 ##### Caveats
268
269 * If the crashing thread’s stack is smashed when an exception occurs, the
270 exception cannot be dispatched. In this case the OS will summarily terminate
271 the process, without the handler having an opportunity to generate a crash
272 report.
273 * If an exception is handled in the crashing thread, it will never propagate
274 to the UEF, and thus a crash report won’t be generated. This happens a fair
275 bit in Windows as system libraries will often dispatch callbacks under a
276 structured exception handler. This occurs during Window message dispatching
277 on some system configurations, as well as during e.g. DLL entry point
278 notifications.
279 * A growing number of conditions in the system and runtime exist where
280 detected corruption or illegal calls result in summary termination of the
281 process, in which case no crash report will be generated.
282
283 ###### Out-Of-Process Exception Handling
284
285 There exists a mechanism in Windows Error Reporting (WER) that allows a client
286 process to register for handling client exceptions out of the crashing process.
287 Unfortunately this mechanism is difficult to use, and doesn’t provide coverage
288 for many of the caveats above. [Details
289 here.](https://crashpad.chromium.org/bug/133)
290
291 #### Linux/Android
292
293 TODO(mmentovai): describe this. See [this preliminary
294 doc.](https://goto.google.com/crashpad-android-dd)
295
296 ### The CrashpadInfo structure
297
298 The CrashpadInfo structure is used to communicate information from the client to
299 the handler. Each executable module in a client process can contain a
300 CrashpadInfo structure. On a crash, the handler crawls all modules in the
301 crashing process to locate all CrashpadInfo structures present. The CrashpadInfo
302 structures are linked into a special, named section of the executable, where the
303 handler can readily find them.
304
305 The CrashpadInfo structure has a magic signature, and contains a size and a
306 version field. The intent is to allow backwards compatibility from older client
307 modules to newer handler. It may also be necessary to provide forwards
308 compatibility from newer clients to older handler, though this hasn’t occurred
309 yet.
310
311 The CrashpadInfo structure contains such properties as the cap for how much
312 memory to include in the crash dump, some tristate flags for controlling the
313 handler’s behavior, a pointer to an annotation dictionary and so on.
314
315 ### Snapshot
316
317 Snapshot is a layer of interfaces that represent the machine and OS entities
318 that Crashpad cares about. Different concrete implementations of snapshot can
319 then be backed different ways, such as e.g. from the in-memory representation of
320 a crashed process, or e.g. from the contents of a minidump.
321
322 ### Crash Dump Creation
323
324 To create a crash dump, a subset of the machine, OS and application state is
325 grabbed from the crashed process into an in-memory snapshot structure in the
326 handler process. Since the full application state is typically too large for
327 capturing to disk and transmitting to an upstream server, the snapshot contains
328 a heuristically selected subset of the full state.
329
330 The precise details of what’s captured varies between operating systems, but
331 generally includes the following
332 * The set of modules (executable, shared libraries) that are loaded into the
333 crashing process.
334 * An enumeration of the threads running in the crashing process, including the
335 register contents and the contents of stack memory of each thread.
336 * A selection of the OS-related state of the process, such as e.g. the command
337 line, environment and so on.
338 * A selection of memory potentially referenced from registers and from stack.
339
340 To capture a crash dump, the crashing process is first suspended, then a
341 snapshot is created in the handler process. The snapshot includes the
342 CrashpadInfo structures of the modules loaded into the process, and the contents
343 of those is used to control the level of detail captured for the crash dump.
344
345 Once the snapshot has been constructed, it is then written to a minidump file,
346 which is added to the database. The process is un-suspended after the minidump
347 file has been written. In the case of a crash (as opposed to a client request to
348 produce a dump without crashing), it is then either killed by the operating
349 system or the Crashpad handler.
350
351 In general the snapshotting process has to be very intimate with the operating
352 system it’s working with, so there will be a set of concrete implementation
353 classes, many deriving from the snapshot interfaces, doing this for each
354 operating system.
355
356 ### Minidump
357
358 The minidump implementation is responsible for writing a snapshot to a
359 serialized on-disk file in the minidump format. The minidump implementation is
360 OS-agnostic, as it works on an OS-agnostic Snapshot interface.
361
362 TODO(siggi): Talk about two-phase writes and contents ordering here.
363
364 ### Database
365
366 The Crashpad database contains persistent client settings, including a unique
367 crash client identifier and the upload-enabled bit. Note that the crash client
368 identifier is assigned by Crashpad, and is distinct from any identifiers the
369 client application uses to identify users, installs, machines or such - if any.
370 The expectation is that the client application will manage the user’s upload
371 consent, and inform Crashpad of changes in consent.
372
373 The unique client identifier is set at the time of database creation. It is then
374 recorded into every crash report collected by the handler and communicated to
375 the upstream server.
376
377 The database stores a configurable number of recorded crash dumps to a
378 configurable maximum aggregate size. For each crash dump it stores annotations
379 relating to whether the crash dumps have been uploaded. For successfully
380 uploaded crash dumps it also stores their server-assigned ID.
381
382 The database consists of a settings file, named "settings.dat" with binary
383 contents (see crashpad::Settings::Data for the file format), as well as
384 directory containing the crash dumps. Additionally each crash dump is adorned
385 with properties relating to the state of the dump for upload and such. The
386 details of how these properties are stored vary between platforms.
387
388 #### macOS
389
390 The macOS implementation simply stores database properties on the minidump files
391 in filesystem extended attributes.
392
393 #### Windows
394
395 The Windows implementation stores database properties in a binary file named
396 “metadata” at the top level of the database directory.
397
398 ### Report Format
399
400 Crash reports are recorded in the Windows minidump format with
401 extensions to support Crashpad additions, such as e.g. Annotations.
402
403 ### Upload to collection server
404
405 #### Wire Format
406
407 For the time being, Crashpad uses the Breakpad wire protocol, which is
408 essentially a MIME multipart message communicated over HTTP(S). To support this,
409 the annotations from all the CrashpadInfo structures found in the crashing
410 process are merged to create the Breakpad “crash keys” as form data. The
411 postmortem minidump is then attached as an “application/octet- stream”
412 attachment with the name “upload_file_minidump”. The entirety of the request
413 body, including the minidump, can be gzip-compressed to reduce transmission time
414 and increase transmission reliability. Note that by convention there is a set of
415 “crash keys” that are used to communicate the product, version, client ID and
416 other relevant data about the client, to the server. Crashpad normally stores
417 these values in the minidump file itself, but retrieves them from the minidump
418 and supplies them as form data for compatibility with the Breakpad-style server.
419
420 This is a temporary compatibility measure to allow the current Breakpad-based
421 upstream server to handle Crashpad reports. In the fullness of time, the wire
422 protocol is expected to change to remove this redundant transmission and
423 processing of the Annotations.
424
425 #### Transport
426
427 The embedding client controls the URL of the collection server by the command
428 line passed to the handler. The handler can upload crashes with HTTP or HTTPS,
429 depending on client’s preference. It’s strongly suggested use HTTPS transport
430 for crash uploads to protect the user’s privacy against man-in-the-middle
431 snoopers.
432
433 TODO(mmentovai): Certificate pinning.
434
435 #### Throttling & Retry Strategy
436
437 To protect both the collection server from DDoS as well as to protect the
438 clients from unreasonable data transfer demands, the handler implements a
439 client-side throttling strategy. At the moment, the strategy is very simplistic,
440 it simply limits uploads to one upload per hour, and failed uploads are aborted.
441
442 An experiment has been conducted to lift all throttling. Analysis on the
443 aggregate data this produced shows that multiple crashes within a short timespan
444 on the same client are nearly always due to the same cause. Therefore there is
445 very little loss of signal due to the throttling, though the ability to
446 reconstruct at least the full crash count is highly desirable.
447
448 The lack of retry is expected to [change
449 soon](https://crashpad.chromium.org/bug/23), as this creates blind spots for
450 client crashes that exclusively occur on e.g. network down events, during
451 suspend and resume and such.
452
453 ### Extensibility
454
455 Clients are able to extend the generated crash reports in two ways, by
456 manipulating their CrashpadInfo structure.
457 The two extensibility points are:
458 1. Nominating a set of address ranges for inclusion in the crash report.
459 2. Adding user-defined minidump streams for inclusion in the crash report.
460
461 In both cases the CrashpadInfo structure has to be updated before a crash
462 occurs.
463
464 ### Dependencies
465
466 Aside from system headers and APIs, when used outside of Chromium, Crashpad has
467 a dependency on “mini_chromium”, which is a subset of the Chromium base library.
468 This is to allow non-Chromium clients to use Crashpad, without taking a direct
469 dependency on the Chromium base, while allowing Chromium projects to use
470 Crashpad with minimum code duplication or hassle. When using Crashpad as part of
471 Chromium, Chromium’s own copy of the base library is used instead of
472 mini_chromium.
473
474 The downside to this is that mini_chromium must be kept up to date with
475 interface and implementation changes in Chromium base, for the subset of
476 functionality used by Crashpad.
477
478 ## Caveats
479
480 TODO(anyone): You may need to describe what you did not do or why simpler
481 approaches don't work. Mention other things to watch out for (if any).
482
483 ## Security Considerations
484
485 Crashpad may be used to capture the state of sandboxed processes and it writes
486 minidumps to disk. It may therefore straddle security boundaries, so it’s
487 important that Crashpad handle all data it reads out of the crashed process with
488 extreme care. The Crashpad handler takes care to access client address spaces
489 through specially-designed accessors that check pointer validity and enforce
490 accesses within prescribed bounds. The flow of information into the Crashpad
491 handler is exclusively one-way: Crashpad never communicates anything back to
492 its clients, aside from providing single-bit indications of completion.
493
494 ## Privacy Considerations
495
496 Crashpad may capture arbitrary contents from crashed process’ memory, including
497 user IDs and passwords, credit card information, URLs and whatever other content
498 users have trusted the crashing program with. The client program must acquire
499 and honor the user’s consent to upload crash reports, and appropriately manage
500 the upload state in Crashpad’s database.
501
502 Crashpad must also be careful not to upload crashes for arbitrary processes on
503 the user’s system. To this end, Crashpad will never upload a process that hasn’t
504 registered with the handler, but note that registrations are inherited by child
505 processes on some operating systems.
OLDNEW
« no previous file with comments | « third_party/crashpad/crashpad/doc/overview.png ('k') | third_party/crashpad/crashpad/handler/win/crash_other_program.cc » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698