Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(80)

Side by Side Diff: docs/symbol_files.md

Issue 1357773004: [Docs] add markdown docs (converted from Wiki) (Closed) Base URL: https://chromium.googlesource.com/breakpad/breakpad.git@master
Patch Set: whoops' Created 5 years, 3 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « docs/stack_walking.md ('k') | docs/windows_client_integration.md » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Introduction
2
3 Given a minidump file, the Breakpad processor produces stack traces that include
4 function names and source locations. However, minidump files contain only the
5 byte-by-byte contents of threads' registers and stacks, without function names
6 or machine-code-to-source mapping data. The processor consults Breakpad symbol
7 files for the information it needs to produce human-readable stack traces from
8 the binary-only minidump file.
9
10 The platform-specific symbol dumping tools parse the debugging information the
11 compiler provides (whether as DWARF or STABS sections in an ELF file or as
12 stand-alone PDB files), and write that information back out in the Breakpad
13 symbol file format. This format is much simpler and less detailed than compiler
14 debugging information, and values legibility over compactness.
15
16 # Overview
17
18 Breakpad symbol files are ASCII text files, with lines delimited as appropriate
19 for the host platform. Each line is a _record_, divided into fields by single
20 spaces; in some cases, the last field of the record can contain spaces. The
21 first field is a string indicating what sort of record the line represents
22 (except for line records; these are very common, making them the default saves
23 space). Some fields hold decimal or hexadecimal numbers; hexadecimal numbers
24 have no "0x" prefix, and use lower-case letters.
25
26 Breakpad symbol files contain the following record types. With some
27 restrictions, these may appear in any order.
28
29 * A `MODULE` record describes the executable file or shared library from which
30 this data was derived, for use by symbol suppliers. A `MODULE' record should
31 be the first record in the file.
32
33 * A `FILE` record gives a source file name, and assigns it a number by which
34 other records can refer to it.
35
36 * A `FUNC` record describes a function present in the source code.
37
38 * A line record indicates to which source file and line a given range of
39 machine code should be attributed. The line is attributed to the function
40 defined by the most recent `FUNC` record.
41
42 * A `PUBLIC` record gives the address of a linker symbol.
43
44 * A `STACK` record provides information necessary to produce stack traces.
45
46 # `MODULE` records
47
48 A `MODULE` record provides meta-information about the module the symbol file
49 describes. It has the form:
50
51 > `MODULE` _operatingsystem_ _architecture_ _id_ _name_
52
53 For example: `MODULE Linux x86 D3096ED481217FD4C16B29CD9BC208BA0 firefox-bin
54 ` These records provide meta-information about the executable or shared library
55 from which this symbol file was generated. A symbol supplier might use this
56 information to find the correct symbol files to use to interpret a given
57 minidump, or to perform other sorts of validation. If present, a `MODULE` record
58 should be the first line in the file.
59
60 The fields are separated by spaces, and cannot contain spaces themselves, except
61 for _name_.
62
63 * The _operatingsystem_ field names the operating system on which the
64 executable or shared library was intended to run. This field should have one
65 of the following values: | **Value** | **Meaning** |
66 |:----------|:--------------------| | Linux | Linux | | mac | Macintosh OSX
67 | | windows | Microsoft Windows |
68
69 * The _architecture_ field indicates what processor architecture the
70 executable or shared library contains machine code for. This field should
71 have one of the following values: | **Value** | **Instruction Set
72 Architecture** | |:----------|:---------------------------------| | x86 |
73 Intel IA-32 | | x86\_64 | AMD64/Intel 64 | | ppc | 32-bit PowerPC | | ppc64
74 | 64-bit PowerPC | | unknown | unknown |
75
76 * The _id_ field is a sequence of hexadecimal digits that identifies the exact
77 executable or library whose contents the symbol file describes. The way in
78 which it is computed varies from platform to platform.
79
80 * The _name_ field contains the base name (the final component of the
81 directory path) of the executable or library. It may contain spaces, and
82 extends to the end of the line.
83
84 # `FILE` records
85
86 A `FILE` record holds a source file name for other records to refer to. It has
87 the form:
88
89 > `FILE` _number_ _name_
90
91 For example: `FILE 2 /home/jimb/mc/in/browser/app/nsBrowserApp.cpp
92 `
93
94 A `FILE` record provides the name of a source file, and assigns it a number
95 which other records (line records, in particular) can use to refer to that file
96 name. The _number_ field is a decimal number. The _name_ field is the name of
97 the file; it may contain spaces.
98
99 # `FUNC` records
100
101 A `FUNC` record describes a source-language function. It has the form:
102
103 > `FUNC` _address_ _size_ _parameter\_size_ _name_
104
105 For example: `FUNC c184 30 0 nsQueryInterfaceWithError::operator()(nsID const&,
106 void**) const
107 `
108
109 The _address_ and _size_ fields are hexadecimal numbers indicating the start
110 address and length in bytes of the machine code instructions the function
111 occupies. (Breakpad symbol files cannot accurately describe functions whose code
112 is not contiguous.) The start address is relative to the module's load address.
113
114 The _parameter\_size_ field is a hexadecimal number indicating the size, in
115 bytes, of the arguments pushed on the stack for this function. Some calling
116 conventions, like the Microsoft Windows `stdcall` convention, require the called
117 function to pop parameters passed to it on the stack from its caller before
118 returning. The stack walker uses this value, along with data from `STACK`
119 records, to step from the called function's frame to the caller's frame.
120
121 The _name_ field is the name of the function. In languages that use linker
122 symbol name mangling like C++, this should be the source language name (the
123 "unmangled" form). This field may contain spaces.
124
125 # Line records
126
127 A line record describes the source file and line number to which a given range
128 of machine code should be attributed. It has the form:
129
130 > _address_ _size_ _line_ _filenum_
131
132 For example: `c184 7 59 4
133 `
134
135 Because they are so common, line records do not begin with a string indicating
136 the record type. All other record types' names use upper-case letters;
137 hexadecimal numbers, like a line record's _address_, use lower-case letters.
138
139 The _address_ and _size_ fields are hexadecimal numbers indicating the start
140 address and length in bytes of the machine code. The address is relative to the
141 module's load address.
142
143 The _line_ field is the line number to which the machine code should be
144 attributed, in decimal; the first line of the source file is line number 1. The
145 _filenum_ field is a decimal number appearing in a prior `FILE` record; the name
146 given in that record is the source file name for the machine code.
147
148 The line is assumed to belong to the function described by the last preceding
149 `FUNC` record. Line records may not appear before the first `FUNC' record.
150
151 No two line records in a symbol file cover the same range of addresses. However,
152 there may be many line records with identical line and file numbers, as a given
153 source line may contribute many non-contiguous blocks of machine code.
154
155 # `PUBLIC` records
156
157 A `PUBLIC` record describes a publicly visible linker symbol, such as that used
158 to identify an assembly language entry point or region of memory. It has the
159 form:
160
161 > PUBLIC _address_ _parameter\_size_ _name_
162
163 For example: `PUBLIC 2160 0 Public2_1
164 `
165
166 The Breakpad processor essentially treats a `PUBLIC` record as defining a
167 function with no line number data and an indeterminate size: the code extends to
168 the next address mentioned. If a given address is covered by both a `PUBLIC`
169 record and a `FUNC` record, the processor uses the `FUNC` data.
170
171 The _address_ field is a hexadecimal number indicating the symbol's address,
172 relative to the module's load address.
173
174 The _parameter\_size_ field is a hexadecimal number indicating the size of the
175 parameters passed to the code whose entry point the symbol marks, if known. This
176 field has the same meaning as the _parameter\_size_ field of a `FUNC` record;
177 see that description for more details.
178
179 The _name_ field is the name of the symbol. In languages that use linker symbol
180 name mangling like C++, this should be the source language name (the "unmangled"
181 form). This field may contain spaces.
182
183 # `STACK WIN` records
184
185 Given a stack frame, a `STACK WIN` record indicates how to find the frame that
186 called it. It has the form:
187
188 > STACK WIN _type_ _rva_ _code\_size_ _prologue\_size_ _epilogue\_size_
189 > _parameter\_size_ _saved\_register\_size_ _local\_size_ _max\_stack\_size_
190 > _has\_program\_string_ _program\_string\_OR\_allocates\_base\_pointer_
191
192 For example: `STACK WIN 4 2170 14 1 0 0 0 0 0 1 $eip 4 + ^ = $esp $ebp 8 + =
193 $ebp $ebp ^ =
194 `
195
196 All fields of a `STACK WIN` record, except for the last, are hexadecimal
197 numbers.
198
199 The _type_ field indicates what sort of stack frame data this record holds. Its
200 value should be one of the values of the [StackFrameTypeEnum]
201 (http://msdn.microsoft.com/en-us/library/bc5207xw%28VS.100%29.aspx) type in
202 Microsoft's [Debug Interface Access (DIA)]
203 (http://msdn.microsoft.com/en-us/library/x93ctkx8%28VS.100%29.aspx) API.
204 Breakpad uses only records of type 4 (`FrameTypeFrameData`) and 0
205 (`FrameTypeFPO`); it ignores others. These types differ only in whether the last
206 field is an _allocates\_base\_pointer_ flag (`FrameTypeFPO`) or a program string
207 (`FrameTypeFrameData`). If more than one record covers a given address, Breakpad
208 prefers `FrameTypeFrameData` records over `FrameTypeFPO` records.
209
210 The _rva_ and _code\_size_ fields give the starting address and length in bytes
211 of the machine code covered by this record. The starting address is relative to
212 the module's load address.
213
214 The _prologue\_size_ and _epilogue\_size_ fields give the length, in bytes, of
215 the prologue and epilogue machine code within the record's range. Breakpad does
216 not use these values.
217
218 The _parameter\_size_ field gives the number of argument bytes this function
219 expects to have been passed. This field has the same meaning as the
220 _parameter\_size_ field of a `FUNC` record; see that description for more
221 details.
222
223 The _saved\_register\_size_ field gives the number of bytes in the stack frame
224 dedicated to preserving the values of any callee-saves registers used by this
225 function.
226
227 The _local\_size_ field gives the number of bytes in the stack frame dedicated
228 to holding the function's local variables and temporary values.
229
230 The _max\_stack\_size_ field gives the maximum number of bytes pushed on the
231 stack in the frame. Breakpad does not use this value.
232
233 If the _has\_program\_string_ field is zero, then the `STACK WIN` record's final
234 field is an _allocates\_base\_pointer_ flag, as a hexadecimal number; this is
235 expected for records whose _type_ is 0. Otherwise, the final field is a program
236 string.
237
238 ## Interpreting a `STACK WIN` record
239
240 Given the register values for a frame F, we can find the calling frame as
241 follows:
242
243 * If the _has\_program\_string_ field of a `STACK WIN` record is zero, then
244 the final field is _allocates\_base\_pointer_, a flag indicating whether the
245 frame uses the frame pointer register, `%ebp`, as a general-purpose
246 register.
247 * If _allocates\_base\_pointer_ is true, then `%ebp` does not point to the
248 frame's base address. Instead,
249 * Let _next\_parameter\_size_ be the parameter size of the function
250 frame F called (**not** this record's _parameter\_size_ field), or
251 zero if F is the youngest frame on the stack. You must find this
252 value in F's callee's `FUNC`, `STACK WIN`, or `PUBLIC` records.
253 * Let _frame\_size_ be the sum of the _local\_size_ field, the
254 _saved\_register\_size_ field, and _next\_parameter\_size_. > > With
255 those definitions in place, we can recover the calling frame as
256 follows:
257 * F's return address is at `%esp +`_frame\_size_,
258 * the caller's value of `%ebp` is saved at `%esp
259 +`_next\_parameter\_size_`+`_saved\_register\_size_`- 8`, and
260 * the caller's value of `%esp` just before the call instruction was
261 `%esp +`_frame\_size_`+ 4`. > > (Why do we include
262 _next\_parameter\_size_ in the sum when computing _frame\_size_ and
263 the address of the saved `%ebp`? When a function A has called a
264 function B, the arguments that A pushed for B are considered part of
265 A's stack frame: A's value for `%esp` points at the last argument
266 pushed for B. Thus, we must include the size of those arguments
267 (given by the debugging info for B) along with the size of A's
268 register save area and local variable area (given by the debugging
269 info for A) when computing the overall size of A's frame.)
270 * If _allocates\_base\_pointer_ is false, then F's function doesn't use
271 `%ebp` at all. You may recover the calling frame as above, except that
272 the caller's value of `%ebp` is the same as F's value for `%ebp`, so no
273 steps are necessary to recover it.
274 * If the _has\_program\_string_ field of a `STACK WIN` record is not zero,
275 then the record's final field is a string containing a program to be
276 interpreted to recover the caller's frame. The comments in the
277 [postfix\_evaluator.h]
278 (http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/ postfix_evaluator.h#40)
279 header file explain the language in which the program is written. You should
280 place the following variables in the dictionary before interpreting the
281 program:
282 * `$ebp` and `$esp` should be the values of the `%ebp` and `%esp`
283 registers in F.
284 * `.cbParams`, `.cbSavedRegs`, and `.cbLocals`, should be the values of
285 the `STACK WIN` record's _parameter\_size_, _saved\_register\_size_, and
286 _local\_size_ fields.
287 * `.raSearchStart` should be set to the address on the stack to begin
288 scanning for a return address, if necessary. The Breakpad processor sets
289 this to the value of `%esp` in F, plus the _frame\_size_ value mentioned
290 above.
291
292 > If the program stores values for `$eip`, `$esp`, `$ebp`, `$ebx`, `$esi`, or
293 > `$edi`, then those are the values of the given registers in the caller. If the
294 > value of `$eip` is zero, that indicates that the end of the stack has been
295 > reached.
296
297 The Breakpad processor checks that the value yielded by the above for the
298 calling frame's instruction address refers to known code; if the address seems
299 to be bogus, then it uses a heuristic search to find F's return address and
300 stack base.
301
302 # `STACK CFI` records
303
304 `STACK CFI` ("Call Frame Information") records describe how to walk the stack
305 when execution is at a given machine instruction. These records take one of two
306 forms:
307
308 > `STACK CFI INIT` _address_ _size_ _register<sub>1</sub>_:
309 > _expression<sub>1</sub>_ _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
310 >
311 > `STACK CFI` _address_ _register<sub>1</sub>_: _expression<sub>1</sub>_
312 > _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
313
314 For example:
315
316 ```
317 STACK CFI INIT 804c4b0 40 .cfa: $esp 4 + $eip: .cfa 4 - ^
318 STACK CFI 804c4b1 .cfa: $esp 8 + $ebp: .cfa 8 - ^
319 ```
320
321 The _address_ and _size_ fields are hexadecimal numbers. Each
322 _register_<sub>i</sub> is the name of a register or pseudoregister. Each
323 _expression_ is a Breakpad postfix expression, which may contain spaces, but
324 never ends with a colon. (The appropriate register names for a given
325 architecture are determined when `STACK CFI` records are first enabled for that
326 architecture, and should be documented in the appropriate
327 `stackwalker_`_architecture_`.cc` source file.)
328
329 STACK CFI records describe, at each machine instruction in a given function, how
330 to recover the values the machine registers had in the function's caller.
331 Naturally, some registers' values are simply lost, but there are three cases in
332 which they can be recovered:
333
334 * You can always recover the program counter, because that's the function's
335 return address. If the function is ever going to return, the PC must be
336 saved somewhere.
337
338 * You can always recover the stack pointer. The function is responsible for
339 popping its stack frame before it returns to the caller, so it must be able
340 to restore this, as well.
341
342 * You should be able to recover the values of callee-saves registers. These
343 are registers whose values the callee must preserve, either by saving them
344 in its own stack frame before using them and re-loading them before
345 returning, or by not using them at all.
346
347 (As an exception, note that functions which never return may not save any of
348 this data. It may not be possible to walk the stack past such functions' stack
349 frames.)
350
351 Given rules for recovering the values of a function's caller's registers, we can
352 walk up the stack. Starting with the current set of registers --- the PC of the
353 instruction we're currently executing, the current stack pointer, etc. --- we
354 use CFI to recover the values those registers had in the caller of the current
355 frame. This gives us a PC in the caller whose CFI we can look up; we apply the
356 process again to find that function's caller; and so on.
357
358 Concretely, CFI records represent a table with a row for each machine
359 instruction address and a column for each register. The table entry for a given
360 address and register contains a rule describing how, when the PC is at that
361 address, to restore the value that register had in the caller.
362
363 There are some special columns:
364
365 * A column named `.cfa`, for "Canonical Frame Address", tells how to compute
366 the base address of the frame; other entries can refer to the CFA in their
367 rules.
368
369 * A column named `.ra` represents the return address.
370
371 For example, suppose we have a machine with 32-bit registers, one-byte
372 instructions, a stack that grows downwards, and an assembly language that
373 resembles C. Suppose further that we have a function whose machine code looks
374 like this:
375
376 ```
377 func: ; entry point; return address at sp
378 func+0: sp -= 16 ; allocate space for stack frame
379 func+1: sp[12] = r0 ; save 4-byte r0 at sp+12
380 ... ; stuff that doesn't affect stack
381 func+10: sp -= 4; *sp = x ; push some 4-byte x on the stack
382 ... ; stuff that doesn't affect stack
383 func+20: r0 = sp[16] ; restore saved r0
384 func+21: sp += 20 ; pop whole stack frame
385 func+22: pc = *sp; sp += 4 ; pop return address and jump to it
386 ```
387
388 The following table would describe the function above:
389
390 **code address** | **.cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **.ra**
391 :--------------- | :------- | :---------------------- | :---------------------- | :-- | :-------
392 func+0 | sp | | | | `cfa[0]`
393 func+1 | sp+16 | | | | `cfa[0]`
394 func+2 | sp+16 | `cfa[-4]` | | | `cfa[0]`
395 func+11 | sp+20 | `cfa[-4]` | | | `cfa[0]`
396 func+21 | sp+20 | | | | `cfa[0]`
397 func+22 | sp | | | | `cfa[0]`
398
399 Some things to note here:
400
401 * Each row describes the state of affairs **before** executing the instruction
402 at the given address. Thus, the row for func+0 describes the state before we
403 execute the first instruction, which allocates the stack frame. In the next
404 row, the formula for computing the CFA has changed, reflecting the
405 allocation.
406
407 * The other entries are written in terms of the CFA; this allows them to
408 remain unchanged as the stack pointer gets bumped around. For example, to
409 find the caller's value for r0 (on Google Code) at func+2, we would first
410 compute the CFA by adding 16 to the sp, and then subtract four from that to
411 find the address at which r0 (on Google Code) was saved.
412
413 * Although the example doesn't show this, most calling conventions designate
414 "callee-saves" and "caller-saves" registers. The callee must restore the
415 values of "callee-saves" registers before returning (if it uses them at
416 all), whereas the callee is free to use "caller-saves" registers without
417 restoring their values. A function that uses caller-saves registers
418 typically does not save their original values at all; in this case, the CFI
419 marks such registers' values as "unrecoverable".
420
421 * Exactly where the CFA points in the frame --- at the return address? below
422 it? At some fixed point within the frame? --- is a question of definition
423 that depends on the architecture and ABI in use. But by definition, the CFA
424 remains constant throughout the lifetime of the frame. It's up to
425 architecture- specific code to know what significance to assign the CFA, if
426 any.
427
428 To save space, the most common type of CFI record only mentions the table
429 entries at which changes take place. So for the above, the CFI data would only
430 actually mention the non-blank entries here:
431
432 **insn** | **cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | * *ra**
433 :------- | :------ | :---------------------- | :---------------------- | :-- | : -------
434 func+0 | sp | | | | ` cfa[0]`
435 func+1 | sp+16 | | | |
436 func+2 | | `cfa[-4]` | | |
437 func+11 | sp+20 | | | |
438 func+21 | | r0 (on Google Code) | | |
439 func+22 | sp | | | |
440
441 A `STACK CFI INIT` record indicates that, at the machine instruction at
442 _address_, belonging to some function, the value that _register<sub>n</sub>_ had
443 in that function's caller can be recovered by evaluating
444 _expression<sub>n</sub>_. The values of any callee-saves registers not mentioned
445 are assumed to be unchanged. (`STACK CFI` records never mention caller-saves
446 registers.) These rules apply starting at _address_ and continue up to, but not
447 including, the address given in the next `STACK CFI` record. The _size_ field is
448 the total number of bytes of machine code covered by this record and any
449 subsequent `STACK CFI` records (until the next `STACK CFI INIT` record). The
450 _address_ field is relative to the module's load address.
451
452 A `STACK CFI` record (no `INIT`) is the same, except that it mentions only those
453 registers whose recovery rules have changed from the previous CFI record. There
454 must be a prior `STACK CFI INIT` or `STACK CFI` record in the symbol file. The
455 _address_ field of this record must be greater than that of the previous record,
456 and it must not be at or beyond the end of the range given by the most recent
457 `STACK CFI INIT` record. The address is relative to the module's load address.
458
459 Each expression is a breakpad-style postfix expression. Expressions may contain
460 spaces, but their tokens may not end with colons. When an expression mentions a
461 register, it refers to the value of that register in the callee, even if a prior
462 name/expression pair gives that register's value in the caller. The exception is
463 `.cfa`, which refers to the canonical frame address computed by the .cfa rule in
464 force at the current instruction.
465
466 The special expression `.undef` indicates that the given register's value cannot
467 be recovered.
468
469 The register names preceding the expressions are always followed by colons. The
470 expressions themselves never contain tokens ending with colons.
471
472 There are two special register names:
473
474 * `.cfa` ("Canonical Frame Address") is the base address of the stack frame.
475 Other registers' rules may refer to this. If no rule is provided for the
476 stack pointer, the value of `.cfa` is the caller's stack pointer.
477
478 * `.ra` is the return address. This is the value of the restored program
479 counter. We use `.ra` instead of the architecture-specific name for the
480 program counter.
481
482 The Breakpad stack walker requires that there be rules in force for `.cfa` and
483 `.ra` at every code address from which it unwinds. If those rules are not
484 present, the stack walker will ignore the `STACK CFI` data, and try to use a
485 different strategy.
486
487 So the CFI for the example function above would be as follows, if `func` were at
488 address 0x1000 (relative to the module's load address):
489
490 ```
491 STACK CFI INIT 1000 .cfa: $sp .ra: .cfa ^
492 STACK CFI 1001 .cfa: $sp 16 +
493 STACK CFI 1002 $r0: .cfa 4 - ^
494 STACK CFI 100b .cfa: $sp 20 +
495 STACK CFI 1015 $r0: $r0
496 STACK CFI 1016 .cfa: $sp
497 ```
OLDNEW
« no previous file with comments | « docs/stack_walking.md ('k') | docs/windows_client_integration.md » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698