Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(4)

Side by Side Diff: tracing/bin/symbolize_trace

Issue 2810523002: symbolize_trace: support new heap dump format. (Closed)
Patch Set: Address comments Created 3 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 # Copyright 2016 The Chromium Authors. All rights reserved. 2 # Copyright 2016 The Chromium Authors. All rights reserved.
3 # Use of this source code is governed by a BSD-style license that can be 3 # Use of this source code is governed by a BSD-style license that can be
4 # found in the LICENSE file. 4 # found in the LICENSE file.
5 5
6 """
7 This script processes trace files and symbolizes stack frames generated by
8 Chrome's native heap profiler.
9
10 === Overview ===
11
12 Trace file is essentially a giant JSON array of dictionaries (events).
13 Events have some predefined keys (e.g. 'pid'), but otherwise are free to
14 have anything inside. Trace file contains events from all Chrome processes
15 that were sampled during tracing period.
16
17 This script cares only about memory dump events generated with memory-infra
18 category enabled.
19
20 When Chrome native heap profiling is enabled, some memory dump events
21 include the following extra information:
22
23 * (Per allocator) Information about live allocations at the moment of the
24 memory dump (the information includes backtraces, types / categories,
25 sizes, and counts of allocations). There are several allocators in
26 Chrome: e.g. malloc, blink_gc, partition_alloc.
27
28 * (Per process) Stack frame tree of all functions that called allocators
29 above.
30
31 This script does the following:
32
33 1. Parses the given trace file (loads JSON).
34 2. Finds memory dump events and parses stack frame tree for each process.
35 3. Finds stack frames that have PC addresses instead of function names.
36 4. Symbolizes PCs and modifies loaded JSON.
37 5. Writes modified JSON back to the file.
38
39 The script supports trace files from the following platforms:
40 * Android (the script itself must be run on Linux)
41 * Linux
42 * macOS
43 * Windows
44
45 Important note - the script doesn't check that it symbolizes same binaries
46 that were used at the time trace was taken. I.e. if you take a trace, change
47 and rebuild Chrome binaries, the script will blindly use the new binaries.
48
49 === Details ===
50
51 There are two formats of heap profiler information: legacy and modern. The
52 main differences relevant to this script are:
53
54 * In the modern format the stack frame tree, type name mapping, and string
55 mapping nodes are dumped incrementally. These nodes are dumped in each
56 memory dump event and carry updates that occurred since the last event.
57
58 For example, let's say that when the first memory dump event is generated
59 we only know about a function foo() (called from main()) allocating objects
60 of type "int":
61
62 {
63 "args": {
64 "dumps": {
65 "heaps_v2": {
66 "maps": {
67 "nodes": [
68 { "id": 1, "name_sid": 1 },
69 { "id": 2, "parent": 1, "name_sid": 3 },
70 ],
71 "types": [
72 { "id": 1, "name_sid": 2 },
73 ],
74 "strings": [
75 { "id": 1, "string": "main()" },
76 { "id": 2, "string": "int" },
77 { "id": 3, "string": "foo()" },
78 ]
79 },
80 "allocators": { ...live allocations per allocator... },
81 ...
82 },
83 ...
84 }
85 },
86 ...
87 }
88
89 Here:
90 * 'nodes' node encodes stack frame tree
91 * 'types' node encodes type name mappings
92 * 'strings' node encodes string mapping (explained below)
93
94 Then, by the time second memory dump even is generated, we learn about
95 bar() (called from main()), which also allocated "int" objects. Only the
96 new information is dumped, i.e. bar() stack frame:
97
98 {
99 "args": {
100 "dumps": {
101 "heaps_v2": {
102 "maps": {
103 "nodes": [
104 { "id": 2, "parent": 1, "name_sid": 4 },
105 ],
106 "types": [],
107 "strings": [
108 { "id": 4, "string": "bar()" },
109 ]
110 },
111 "allocators": { ...live allocations per allocator... },
112 ...
113 },
114 ...
115 }
116 },
117 ...
118 }
119
120 Note that 'types' node is empty, since there were no updates. All three
121 nodes ('nodes', types', and 'strings') can be empty if there were no updates
122 to them.
123
124 For simplicity, when the script updates incremental nodes, it puts updated
125 content in the first node, and clears all others. I.e. the following stack
126 frame nodes:
127
128 'nodes': [
129 { "id": 1, "name_sid": 1 },
130 { "id": 2, "parent": 1, "name_sid": 2 },
131 ]
132 'nodes': [
133 { "id": 3, "parent": 2, "name_sid": 3 },
134 ]
135 'nodes': [
136 { "id": 4, "parent": 3, "name_sid": 4 },
137 { "id": 5, "parent": 1, "name_sid": 5 },
138 ]
139
140 After symbolization are written as:
141
142 'nodes': [
143 { "id": 1, "name_sid": 1 },
144 { "id": 2, "parent": 1, "name_sid": 2 },
145 { "id": 3, "parent": 2, "name_sid": 3 },
146 { "id": 4, "parent": 3, "name_sid": 4 },
147 { "id": 5, "parent": 1, "name_sid": 5 },
148 ]
149 'nodes': []
150 'nodes': []
151
152
153 * In contrast, in the legacy format stack frame tree and type mappings are
154 dumped separately from memory dump events, once per process.
155
156 Here is how trace file with two memory dump events looks like in the
157 legacy format:
158
159 {
160 "args": {
161 "dumps": {
162 "heaps": { ...live allocations per allocator... },
163 ...
164 }
165 },
166 ...
167 }
168
169 {
170 "args": {
171 "dumps": {
172 "heaps": { ...live allocations per allocator... },
173 ...
174 }
175 },
176 ...
177 }
178
179 {
180 "args": {
181 "typeNames": {
182 1: "int",
183 }
184 },
185 "cat": "__metadata",
186 "name": "typeNames",
187 ...
188 }
189
190 {
191 "args": {
192 "stackFrames": {
193 1: { "name": "main" },
194 2: { "name": "foo", "parent": 1 },
195 3: { "name": "bar", "parent": 1 },
196 }
197 },
198 "cat": "__metadata",
199 "name": "stackFrames",
200 ...
201 }
202
203
204 * Another change in the modern format is 'strings' node, which was added
205 to deduplicate stack frame names (mainly for trace file size reduction).
206 For consistency 'types' node also uses string mappings.
207
208
209 See crbug.com/708930 for more information about the modern format.
210 """
211
6 import argparse 212 import argparse
7 import bisect 213 import bisect
8 import collections 214 import collections
9 import gzip 215 import gzip
216 import itertools
10 import json 217 import json
11 import os 218 import os
12 import re 219 import re
13 import subprocess 220 import subprocess
14 import sys 221 import sys
15 222
16 _SYMBOLS_PATH = os.path.abspath(os.path.join( 223 _SYMBOLS_PATH = os.path.abspath(os.path.join(
17 os.path.dirname(os.path.realpath(__file__)), 224 os.path.dirname(os.path.realpath(__file__)),
18 '..', 225 '..',
19 'third_party', 226 'third_party',
20 'symbols')) 227 'symbols'))
21 sys.path.append(_SYMBOLS_PATH) 228 sys.path.append(_SYMBOLS_PATH)
22 # pylint: disable=import-error 229 # pylint: disable=import-error
23 import symbols.elf_symbolizer as elf_symbolizer 230 import symbols.elf_symbolizer as elf_symbolizer
24 231
25 import symbolize_trace_atos_regex 232 import symbolize_trace_atos_regex
26 import symbolize_trace_macho_reader 233 import symbolize_trace_macho_reader
27 234
28 235
29 # Relevant trace event phases from Chromium's 236 class NodeWrapper(object):
30 # src/base/trace_event/common/trace_event_common.h. 237 """Wraps an event data node(s).
31 TRACE_EVENT_PHASE_METADATA = 'M' 238
32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v' 239 A node is a reference into a trace event JSON. Wrappers parse nodes to
240 provide convenient APIs and update nodes when asked to propagate changes
241 back (see ApplyModifications() below).
242
243 Here is an example of legacy metadata event that contains stack frame tree:
244
245 {
246 "args": {
247 "stackFrames": { ... }
248 },
249 "cat": "__metadata",
250 "name": "stackFrames",
251 "ph": "M",
252 ...
253 }
254
255 When this event is encountered, a reference to the "stackFrames" dictionary
256 is obtained and passed down to a specific wrapped class, which knows how to
257 parse / update the dictionary.
258
259 There are two parsing patterns depending on whether node is serialized
260 incrementally:
261
262 * If node is not incremental, then parsing is done by __init__(),
263 see MemoryMap for an example.
264
265 * If node is incremental, then __init__() does nothing, and instead
266 ParseNext() method is called when next node (from a next event) is
267 encountered.
268
269 Some wrappers can also modify nodes they parsed. In such cases they have
270 additional APIs:
271
272 * 'modified' flag, which indicates whether the wrapper was changed.
273
274 * 'ApplyModifications' method, which propagates changes made to the wrapper
275 back to nodes. Successful invocation of ApplyModifications() resets
276 'modified' flag.
277
278 """
279 pass
33 280
34 281
35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) 282 class MemoryMap(NodeWrapper):
36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available 283 """Wraps 'process_mmaps' node.
37 # via 'name' group.
38 ANDROID_PATH_MATCHER = re.compile(
39 r'^/data/(?:'
40 r'app/[^/]+/lib/[^/]+/|'
41 r'app-lib/[^/]+/|'
42 r'data/[^/]+/incremental-install-files/lib/'
43 r')(?P<name>.*\.so)')
44 284
45 # Subpath of output path where unstripped libraries are stored. 285 'process_mmaps' node contains information about file mappings.
46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
47 286
48 287 "process_mmaps": {
49 def FindInSystemPath(binary_name): 288 "vm_regions": [
50 paths = os.environ['PATH'].split(os.pathsep) 289 {
51 for path in paths: 290 "mf": "<file_path>",
52 binary_path = os.path.join(path, binary_name) 291 "sa": "<start_address>",
53 if os.path.isfile(binary_path): 292 "sz": "<size>",
54 return binary_path 293 ...
55 return None 294 },
56 295 ...
57 296 ]
58 class Symbolizer(object): 297 }
59 # Encapsulates platform-specific symbolization logic. 298 """
60 def __init__(self):
61 self.is_mac = sys.platform == 'darwin'
62 self.is_win = sys.platform == 'win32'
63 if self.is_mac:
64 self.binary = 'atos'
65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
66 elif self.is_win:
67 self.binary = 'addr2line-pdb.exe'
68 else:
69 self.binary = 'addr2line'
70 self.symbolizer_path = FindInSystemPath(self.binary)
71
72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
73 def _SymbolizerCallback(sym_info, frames):
74 # Unwind inline chain to the top.
75 while sym_info.inlined_by:
76 sym_info = sym_info.inlined_by
77
78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
79 for frame in frames:
80 frame.name = symbolized_name
81
82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
83 self.symbolizer_path,
84 _SymbolizerCallback,
85 inlines=True)
86
87 for address, frames in symfile.frames_by_address.iteritems():
88 # SymbolizeAsync() asserts that the type of address is int. We operate
89 # on longs (since they are raw pointers possibly from 64-bit processes).
90 # It's OK to cast here because we're passing relative PC, which should
91 # always fit into int.
92 symbolizer.SymbolizeAsync(int(address), frames)
93
94 symbolizer.Join()
95
96
97 def _SymbolizeMac(self, symfile):
98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
99
100 # 16 for the address, 2 for "0x", 1 for the space
101 chars_per_address = 19
102
103 load_address = (symbolize_trace_macho_reader.
104 ReadMachOTextLoadAddress(symfile.symbolizable_path))
105 assert load_address is not None
106
107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
108 '0x%x' % load_address, '-o',
109 symfile.symbolizable_path]
110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
111
112 # The maximum number of inputs that can be processed at once is limited by
113 # ARG_MAX. This currently evalutes to ~13000 on macOS.
114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
115
116 all_keys = symfile.frames_by_address.keys()
117 processed_keys_count = 0
118 while len(all_keys):
119 input_count = min(len(all_keys), max_inputs)
120 keys_to_process = all_keys[0:input_count]
121
122 cmd = list(cmd_base)
123 cmd.extend([hex(int(x) + load_address)
124 for x in keys_to_process])
125 output_array = subprocess.check_output(cmd).split('\n')
126 for i in range(len(keys_to_process)):
127 for frame in (symfile.frames_by_address.values()
128 [i + processed_keys_count]):
129 frame.name = self._matcher.Match(output_array[i])
130 processed_keys_count += len(keys_to_process)
131 all_keys = all_keys[input_count:]
132
133
134 def _SymbolizeWin(self, symfile):
135 """Invoke symbolizer binary on windows and write all input in one go.
136
137 Unlike linux, on windows, symbolization talks through a shared system
138 service that handles communication with the NT symbol servers. This
139 creates an explicit serialization (and therefor lock contention) of
140 any process using the symbol API for files do not have a local PDB.
141
142 Thus, even though the windows symbolizer binary can be make command line
143 compatible with the POSIX addr2line interface, paralellizing the
144 symbolization does not yield the same performance effects. Running
145 just one symbolizer seems good enough for now. Can optimize later
146 if this becomes a bottleneck.
147 """
148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
149 symfile.symbolizable_path]
150
151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
152 stderr=sys.stderr)
153 addrs = ["%x" % relative_pc for relative_pc in
154 symfile.frames_by_address.keys()]
155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
156 stdout_data = stdout_data.split('\n')
157
158 # This is known to be in the same order as stderr_data.
159 for i, addr in enumerate(addrs):
160 for frame in symfile.frames_by_address[int(addr, 16)]:
161 # Output of addr2line with --functions is always 2 outputs per
162 # symbol, function name followed by source line number. Only grab
163 # the function name as line info is not always available.
164 frame.name = stdout_data[i * 2]
165
166
167 def Symbolize(self, symfile, unsymbolized_name):
168 if self.is_mac:
169 self._SymbolizeMac(symfile)
170 if self.is_win:
171 self._SymbolizeWin(symfile)
172 else:
173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
174
175
176 def IsSymbolizableFile(self, file_path):
177 if self.is_win:
178 extension = os.path.splitext(file_path)[1].lower()
179 return extension in ['.dll', '.exe']
180 else:
181 result = subprocess.check_output(['file', '-0', file_path])
182 type_string = result[result.find('\0') + 1:]
183 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
184 type_string, re.DOTALL))
185
186
187 class ProcessMemoryMaps(object):
188 """Represents 'process_mmaps' trace file entry."""
189 299
190 class Region(object): 300 class Region(object):
191 def __init__(self, start_address, size, file_path): 301 def __init__(self, start_address, size, file_path):
192 self._start_address = start_address 302 self._start_address = start_address
193 self._size = size 303 self._size = size
194 self._file_path = file_path 304 self._file_path = file_path
195 305
196 @property 306 @property
197 def start_address(self): 307 def start_address(self):
198 return self._start_address 308 return self._start_address
(...skipping 15 matching lines...) Expand all
214 return long(self._start_address).__cmp__(long(other._start_address)) 324 return long(self._start_address).__cmp__(long(other._start_address))
215 elif isinstance(other, (long, int)): 325 elif isinstance(other, (long, int)):
216 return long(self._start_address).__cmp__(long(other)) 326 return long(self._start_address).__cmp__(long(other))
217 else: 327 else:
218 raise Exception('Cannot compare with %s' % type(other)) 328 raise Exception('Cannot compare with %s' % type(other))
219 329
220 def __repr__(self): 330 def __repr__(self):
221 return 'Region(0x{:X} - 0x{:X}, {})'.format( 331 return 'Region(0x{:X} - 0x{:X}, {})'.format(
222 self.start_address, self.end_address, self.file_path) 332 self.start_address, self.end_address, self.file_path)
223 333
224 def __init__(self, process_mmaps): 334 def __init__(self, process_mmaps_node):
225 """Parses 'process_mmaps' dictionary."""
226
227 regions = [] 335 regions = []
228 for region_value in process_mmaps['vm_regions']: 336 for region_node in process_mmaps_node['vm_regions']:
229 regions.append(self.Region( 337 regions.append(self.Region(
230 long(region_value['sa'], 16), 338 long(region_node['sa'], 16),
231 long(region_value['sz'], 16), 339 long(region_node['sz'], 16),
232 region_value['mf'])) 340 region_node['mf']))
233 regions.sort() 341 regions.sort()
234 342
235 # Copy regions without duplicates and check for overlaps. 343 # Copy regions without duplicates and check for overlaps.
236 self._regions = [] 344 self._regions = []
237 previous_region = None 345 previous_region = None
238 for region in regions: 346 for region in regions:
239 if previous_region is not None: 347 if previous_region is not None:
240 if region == previous_region: 348 if region == previous_region:
241 continue 349 continue
242 assert region.start_address >= previous_region.end_address, \ 350 assert region.start_address >= previous_region.end_address, \
243 'Regions {} and {} overlap.'.format(previous_region, region) 351 'Regions {} and {} overlap.'.format(previous_region, region)
244 previous_region = region 352 previous_region = region
245 self._regions.append(region) 353 self._regions.append(region)
246 354
247 @property 355 @property
248 def regions(self): 356 def regions(self):
249 return self._regions 357 return self._regions
250 358
251 def FindRegion(self, address): 359 def FindRegion(self, address):
252 """Finds region containing |address|. Returns None if none found.""" 360 """Finds region containing |address|. Returns None if none found."""
253 361
254 region_index = bisect.bisect_right(self._regions, address) - 1 362 region_index = bisect.bisect_right(self._regions, address) - 1
255 if region_index >= 0: 363 if region_index >= 0:
256 region = self._regions[region_index] 364 region = self._regions[region_index]
257 if address >= region.start_address and address < region.end_address: 365 if address >= region.start_address and address < region.end_address:
258 return region 366 return region
259 return None 367 return None
260 368
261 369
262 class StackFrames(object): 370 class UnsupportedHeapDumpVersionError(Exception):
263 """Represents 'stackFrames' trace file entry.""" 371 """Helper exception class to signal unsupported heap dump version."""
264 372
265 class PCFrame(object): 373 def __init__(self, version):
266 def __init__(self, pc, frame): 374 message = 'Unsupported heap dump version: {}'.format(version)
375 super(UnsupportedHeapDumpVersionError, self).__init__(message)
376
377
378 class StringMap(NodeWrapper):
379 """Wraps all 'strings' nodes for a process.
380
381 'strings' node contains incremental mappings between integer ids and strings.
382
383 "strings": [
384 {
385 "id": <string_id>,
386 "string": <string>
387 },
388 ...
389 ]
390 """
391
392 def __init__(self):
393 self._modified = False
394 self._strings_nodes = []
395 self._string_by_id = {}
396 self._id_by_string = {}
397 self._max_string_id = 0
398
399 @property
400 def modified(self):
401 """Returns True if the wrapper was modified (see NodeWrapper)."""
402 return self._modified
403
404 @property
405 def string_by_id(self):
406 return self._string_by_id
407
408 def ParseNext(self, heap_dump_version, strings_node):
409 """Parses and interns next node (see NodeWrapper)."""
410
411 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
412 raise UnsupportedHeapDumpVersionError(heap_dump_version)
413
414 self._strings_nodes.append(strings_node)
415 for string_node in strings_node:
416 self._Insert(string_node['id'], string_node['string'])
417
418 def Clear(self):
419 """Clears all string mappings."""
420 if self._string_by_id:
421 self._modified = True
422 # ID #0 means 'no entry' and must always be present. Carry it over.
423 null_string = self._string_by_id[0]
424 self._string_by_id = {}
425 self._id_by_string = {}
426 self._Insert(0, null_string)
427 self._max_string_id = 0
428
429 def AddString(self, string):
430 """Adds a string (if it doesn't exist) and returns its integer id."""
431 string_id = self._id_by_string.get(string)
432 if string_id is None:
433 string_id = self._max_string_id + 1
434 self._Insert(string_id, string)
435 self._modified = True
436 return string_id
437
438 def ApplyModifications(self):
439 """Propagates modifications back to nodes (see NodeWrapper)."""
440 if not self.modified:
441 return
442
443 assert self._strings_nodes, 'no nodes'
444
445 # Serialize into the first node, and clear all others.
446
447 for strings_node in self._strings_nodes:
448 del strings_node[:]
449 strings_node = self._strings_nodes[0]
450 for string_id, string in self._string_by_id.iteritems():
451 strings_node.append({'id': string_id, 'string': string})
452
453 self._modified = False
454
455 def _Insert(self, string_id, string):
456 self._id_by_string[string] = string_id
457 self._string_by_id[string_id] = string
458 self._max_string_id = max(self._max_string_id, string_id)
459
460
461 class TypeNameMap(NodeWrapper):
462 """Wraps all 'types' nodes for a process.
463
464 'types' nodes encode mappings between integer type ids and integer
465 string ids (from 'strings' nodes).
466
467 "types": [
468 {
469 "id": <type_id>,
470 "name_sid": <name_string_id>
471 }
472 ...
473 ]
474
475 For simplicity string ids are translated into strings during parsing,
476 and then translated back to ids in ApplyModifications().
477 """
478 def __init__(self):
479 self._modified = False
480 self._type_name_nodes = []
481 self._name_by_id = {}
482 self._id_by_name = {}
483 self._max_type_id = 0
484
485 @property
486 def modified(self):
487 """Returns True if the wrapper was modified (see NodeWrapper)."""
488 return self._modified
489
490 @property
491 def name_by_id(self):
492 """Returns {id -> name} dict (must not be changed directly)."""
493 return self._name_by_id
494
495 def ParseNext(self, heap_dump_version, type_name_node, string_map):
496 """Parses and interns next node (see NodeWrapper).
497
498 |string_map| - A StringMap object to use to translate string ids
499 to strings.
500 """
501 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
502 raise UnsupportedHeapDumpVersionError(heap_dump_version)
503
504 self._type_name_nodes.append(type_name_node)
505 for type_node in type_name_node:
506 self._Insert(type_node['id'],
507 string_map.string_by_id[type_node['name_sid']])
508
509 def AddType(self, type_name):
510 """Adds a type name (if it doesn't exist) and returns its id."""
511 type_id = self._id_by_name.get(type_name)
512 if type_id is None:
513 type_id = self._max_type_id + 1
514 self._Insert(type_id, type_name)
515 self._modified = True
516 return type_id
517
518 def ApplyModifications(self, string_map, force=False):
519 """Propagates modifications back to nodes.
520
521 |string_map| - A StringMap object to use to translate strings to ids.
522 |force| - Whether to propagate changes regardless of 'modified' flag.
523 """
524 if not self.modified and not force:
525 return
526
527 assert self._type_name_nodes, 'no nodes'
528
529 # Serialize into the first node, and clear all others.
530
531 for types_node in self._type_name_nodes:
532 del types_node[:]
533 types_node = self._type_name_nodes[0]
534 for type_id, type_name in self._name_by_id.iteritems():
535 types_node.append({
536 'id': type_id,
537 'name_sid': string_map.AddString(type_name)})
538
539 self._modified = False
540
541 def _Insert(self, type_id, type_name):
542 self._id_by_name[type_name] = type_id
543 self._name_by_id[type_id] = type_name
544 self._max_type_id = max(self._max_type_id, type_id)
545
546
547 class StackFrameMap(NodeWrapper):
548 """ Wraps stack frame tree nodes for a process.
549
550 For the legacy format this wrapper expects a single 'stackFrames' node
551 (which comes from metadata event):
552
553 "stackFrames": {
554 "<frame_id>": {
555 "name": "<frame_name>"
556 "parent": "<parent_frame_id>"
557 },
558 ...
559 }
560
561 For the modern format this wrapper expects several 'nodes' nodes:
562
563 "nodes": [
564 {
565 "id": <frame_id>,
566 "parent": <parent_frame_id>,
567 "name_sid": <name_string_id>
568 },
569 ...
570 ]
571
572 In both formats frame name is a string. Native heap profiler generates
573 specially formatted frame names (e.g. "pc:10eb78dba") for function
574 addresses (PCs). Inner Frame class below parses name and extracts PC,
575 if it's there.
576 """
577 class Frame(object):
578 def __init__(self, frame_id, name, parent_frame_id):
267 self._modified = False 579 self._modified = False
268 self._pc = pc 580 self._id = frame_id
269 self._frame = frame 581 self._name = name
582 self._pc = self._ParsePC(name)
583 self._parent_id = parent_frame_id
584 self._ext = None
270 585
271 @property 586 @property
272 def modified(self): 587 def modified(self):
588 """Returns True if the frame was modified.
589
590 For example changing frame's name sets this flag (since the change
591 needs to be propagated back to nodes).
592 """
273 return self._modified 593 return self._modified
274 594
275 @property 595 @property
596 def id(self):
597 """Frame id (integer)."""
598 return self._id
599
600 @property
276 def pc(self): 601 def pc(self):
602 """Parsed (integer) PC of the frame, or None."""
277 return self._pc 603 return self._pc
278 604
279 @property 605 @property
280 def name(self): 606 def name(self):
281 return self._frame['name'] 607 """Name of the frame (see above)."""
608 return self._name
282 609
283 @name.setter 610 @name.setter
284 def name(self, value): 611 def name(self, value):
612 """Changes the name. Doesn't affect value of |pc|."""
285 self._modified = True 613 self._modified = True
286 self._frame['name'] = value 614 self._name = value
287 615
288 def __init__(self, stack_frames): 616 @property
289 """Constructs object using 'stackFrames' dictionary.""" 617 def parent_id(self):
290 self._pc_frames = [] 618 """Parent frame id (integer)."""
291 for frame in stack_frames.itervalues(): 619 return self._parent_id
292 pc_frame = self._ParsePCFrame(frame) 620
293 if pc_frame: 621 _PC_TAG = 'pc:'
294 self._pc_frames.append(pc_frame) 622
295 623 def _ParsePC(self, name):
296 @property 624 if not name.startswith(self._PC_TAG):
297 def pc_frames(self): 625 return None
298 return self._pc_frames 626 return long(name[len(self._PC_TAG):], 16)
627
628 def _ClearModified(self):
629 self._modified = False
630
631 def __init__(self):
632 self._modified = False
633 self._heap_dump_version = None
634 self._stack_frames_nodes = []
635 self._frame_by_id = {}
299 636
300 @property 637 @property
301 def modified(self): 638 def modified(self):
302 return any(f.modified for f in self._pc_frames) 639 """Returns True if the wrapper or any of its frames were modified."""
303 640 return (self._modified or
304 _PC_TAG = 'pc:' 641 any(f.modified for f in self._frame_by_id.itervalues()))
305 642
306 @classmethod 643 @property
307 def _ParsePCFrame(self, frame): 644 def frame_by_id(self):
308 name = frame['name'] 645 """Returns {id -> frame} dict (must not be modified directly)."""
309 if not name.startswith(self._PC_TAG): 646 return self._frame_by_id
310 return None 647
311 pc = long(name[len(self._PC_TAG):], 16) 648 def ParseNext(self, heap_dump_version, stack_frames_node, string_map):
312 return self.PCFrame(pc, frame) 649 """Parses the next stack frames node (see NodeWrapper).
313 650
314 651 For the modern format |string_map| is used to translate string ids
315 class Process(object): 652 to strings.
316 """Holds various bits of information about a process in a trace file.""" 653 """
317 654
318 def __init__(self, pid): 655 frame_by_id = {}
319 self.pid = pid 656 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
320 self.name = None 657 if self._stack_frames_nodes:
321 self.mmaps = None 658 raise Exception('Legacy stack frames node is expected only once.')
322 self.stack_frames = None 659 for frame_id, frame_node in stack_frames_node.iteritems():
323 660 frame = self.Frame(frame_id,
324 661 frame_node['name'],
325 def CollectProcesses(trace): 662 frame_node.get('parent'))
326 """Parses trace dictionary and returns pid->Process map of all processes 663 frame_by_id[frame.id] = frame
327 suitable for symbolization (which have both mmaps and stack_frames). 664 else:
665 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
666 raise UnsupportedHeapDumpVersionError(heap_dump_version)
667 for frame_node in stack_frames_node:
668 frame = self.Frame(frame_node['id'],
669 string_map.string_by_id[frame_node['name_sid']],
670 frame_node.get('parent'))
671 frame_by_id[frame.id] = frame
672
673 self._heap_dump_version = heap_dump_version
674 self._stack_frames_nodes.append(stack_frames_node)
675
676 self._frame_by_id = frame_by_id
677
678 def ApplyModifications(self, string_map, force=False):
679 """Applies modifications back to nodes (see NodeWrapper)."""
680
681 if not self.modified and not force:
682 return
683
684 assert self._stack_frames_nodes, 'no nodes'
685 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
686 assert string_map is None, \
687 'string_map should not be used with the legacy format'
688
689 # Serialize frames into the first node, clear all others.
690
691 for frames_node in self._stack_frames_nodes:
692 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
693 frames_node.clear()
694 else:
695 del frames_node[:]
696
697 frames_node = self._stack_frames_nodes[0]
698 for frame in self._frame_by_id.itervalues():
699 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
700 frame_node = {'name': frame.name}
701 frames_node[frame.id] = frame_node
702 else:
703 frame_node = {
704 'id': frame.id,
705 'name_sid': string_map.AddString(frame.name)
706 }
707 frames_node.append(frame_node)
708 if frame.parent_id is not None:
709 frame_node['parent'] = frame.parent_id
710 frame._ClearModified()
711
712 self._modified = False
713
714
715 class Trace(NodeWrapper):
716 """Wrapper for the root trace node (i.e. the trace JSON itself).
717
718 This wrapper parses select nodes from memory-infra events and groups
719 parsed data per-process (see inner Process class below).
328 """ 720 """
329 721
330 process_map = {} 722 # Indicates legacy heap dump format.
331 723 HEAP_DUMP_VERSION_LEGACY = 'Legacy'
332 # Android traces produced via 'chrome://inspect/?tracing#devices' are 724
333 # just list of events. 725 # Indicates variation of a modern heap dump format.
334 events = trace if isinstance(trace, list) else trace['traceEvents'] 726 HEAP_DUMP_VERSION_1 = 1
335 for event in events: 727
336 name = event.get('name') 728 class Process(object):
337 if not name: 729 """Collection of per-process data and wrappers."""
338 continue 730
339 731 def __init__(self, pid):
340 pid = event['pid'] 732 self._pid = pid
341 process = process_map.get(pid) 733 self._name = None
342 if process is None: 734 self._memory_map = None
343 process = Process(pid) 735 self._stack_frame_map = StackFrameMap()
344 process_map[pid] = process 736 self._type_name_map = TypeNameMap()
345 737 self._string_map = StringMap()
346 phase = event['ph'] 738 self._heap_dump_version = None
347 if phase == TRACE_EVENT_PHASE_METADATA: 739
348 if name == 'process_name': 740 @property
349 process.name = event['args']['name'] 741 def modified(self):
350 elif name == 'stackFrames': 742 return self._stack_frame_map.modified or self._type_name_map.modified
351 process.stack_frames = StackFrames(event['args']['stackFrames']) 743
352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP: 744 @property
353 process_mmaps = event['args']['dumps'].get('process_mmaps') 745 def pid(self):
354 if process_mmaps: 746 return self._pid
355 # TODO(dskiba): this parses all process_mmaps, but retains only the 747
356 # last one. We need to parse only once (lazy parsing?). 748 @property
357 process.mmaps = ProcessMemoryMaps(process_mmaps) 749 def name(self):
358 750 return self._name
359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames] 751
752 @property
753 def unique_name(self):
754 """Returns string that includes both process name and its pid."""
755 name = self._name if self._name else 'UnnamedProcess'
756 return '{}({})'.format(name, self._pid)
757
758 @property
759 def memory_map(self):
760 return self._memory_map
761
762 @property
763 def stack_frame_map(self):
764 return self._stack_frame_map
765
766 @property
767 def type_name_map(self):
768 return self._type_name_map
769
770 def ApplyModifications(self):
771 """Calls ApplyModifications() on contained wrappers."""
772 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
773 self._stack_frame_map.ApplyModifications(None)
774 else:
775 if self._stack_frame_map.modified or self._type_name_map.modified:
776 self._string_map.Clear()
777 self._stack_frame_map.ApplyModifications(self._string_map, force=True)
778 self._type_name_map.ApplyModifications(self._string_map, force=True)
779 self._string_map.ApplyModifications()
780
781 def __init__(self, trace_node):
782 self._trace_node = trace_node
783 self._processes = []
784 self._heap_dump_version = None
785
786 # Misc per-process information needed only during parsing.
787 class ProcessExt(object):
788 def __init__(self, pid):
789 self.process = Trace.Process(pid)
790 self.mapped_entry_names = set()
791 self.process_mmaps_node = None
792 self.seen_strings_node = False
793
794 process_ext_by_pid = {}
795
796 # Android traces produced via 'chrome://inspect/?tracing#devices' are
797 # just list of events.
798 events = trace_node if isinstance(trace_node, list) \
799 else trace_node['traceEvents']
800 for event in events:
801 name = event.get('name')
802 if not name:
803 continue
804
805 pid = event['pid']
806 process_ext = process_ext_by_pid.get(pid)
807 if process_ext is None:
808 process_ext = ProcessExt(pid)
809 process_ext_by_pid[pid] = process_ext
810 process = process_ext.process
811
812 phase = event['ph']
813 if phase == self._EVENT_PHASE_METADATA:
814 if name == 'process_name':
815 process._name = event['args']['name']
816 elif name == 'stackFrames':
817 process._stack_frame_map.ParseNext(
818 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),
819 event['args']['stackFrames'],
820 process._string_map)
821 elif phase == self._EVENT_PHASE_MEMORY_DUMP:
822 dumps = event['args']['dumps']
823 process_mmaps = dumps.get('process_mmaps')
824 if process_mmaps:
825 # We want the most recent memory map, so parsing happens later
826 # once we finished reading all events.
827 process_ext.process_mmaps_node = process_mmaps
828 heaps = dumps.get('heaps_v2')
829 if heaps:
830 version = self._UseHeapDumpVersion(heaps['version'])
831 maps = heaps.get('maps')
832 if maps:
833 process_ext.mapped_entry_names.update(maps.iterkeys())
834 types = maps.get('types')
835 stack_frames = maps.get('nodes')
836 strings = maps.get('strings')
837 if (strings is None and (types or stack_frames)
838 and not process_ext.seen_strings_node):
839 # ApplyModifications() for TypeNameMap and StackFrameMap puts
840 # everything into the first node and depends on StringMap. So
841 # we need to make sure that 'strings' node is there if any of
842 # other two nodes present.
843 strings = []
844 maps['strings'] = strings
845 if strings is not None:
846 process_ext.seen_strings_node = True
847 process._string_map.ParseNext(version, strings)
848 if types:
849 process._type_name_map.ParseNext(
850 version, types, process._string_map)
851 if stack_frames:
852 process._stack_frame_map.ParseNext(
853 version, stack_frames, process._string_map)
854
855 self._processes = []
856 for pe in process_ext_by_pid.itervalues():
857 pe.process._heap_dump_version = self._heap_dump_version
858 if pe.process_mmaps_node:
859 # Now parse the most recent memory map.
860 pe.process._memory_map = MemoryMap(pe.process_mmaps_node)
861 self._processes.append(pe.process)
862
863 @property
864 def node(self):
865 """Root node (that was passed to the __init__)."""
866 return self._trace_node
867
868 @property
869 def modified(self):
870 """Returns True if trace file needs to be updated.
871
872 Before writing trace JSON back to a file ApplyModifications() needs
873 to be called.
874 """
875 return any(p.modified for p in self._processes)
876
877 @property
878 def processes(self):
879 return self._processes
880
881 @property
882 def heap_dump_version(self):
883 return self._heap_dump_version
884
885 def ApplyModifications(self):
886 """Propagates modifications back to the trace JSON."""
887 for process in self._processes:
888 process.ApplyModifications()
889 assert not self.modified, 'still modified'
890
891 # Relevant trace event phases from Chromium's
892 # src/base/trace_event/common/trace_event_common.h.
893 _EVENT_PHASE_METADATA = 'M'
894 _EVENT_PHASE_MEMORY_DUMP = 'v'
895
896 def _UseHeapDumpVersion(self, version):
897 if self._heap_dump_version is None:
898 self._heap_dump_version = version
899 return version
900 elif self._heap_dump_version != version:
901 raise Exception(
902 ("Inconsistent trace file: first saw '{}' heap dump version, "
903 "then '{}'.").format(self._heap_dump_version, version))
904 else:
905 return version
360 906
361 907
362 class SymbolizableFile(object): 908 class SymbolizableFile(object):
363 """Holds file path, addresses to symbolize and stack frames to update. 909 """Holds file path, addresses to symbolize and stack frames to update.
364 910
365 This class is a link between ELFSymbolizer and a trace file: it specifies 911 This class is a link between ELFSymbolizer and a trace file: it specifies
366 what to symbolize (addresses) and what to update with the symbolization 912 what to symbolize (addresses) and what to update with the symbolization
367 result (frames). 913 result (frames).
368 """ 914 """
369 def __init__(self, file_path): 915 def __init__(self, file_path):
370 self.path = file_path 916 self.path = file_path
371 self.symbolizable_path = file_path # path to use for symbolization 917 self.symbolizable_path = file_path # path to use for symbolization
372 self.frames_by_address = collections.defaultdict(list) 918 self.frames_by_address = collections.defaultdict(list)
373 919
374 920
375 def ResolveSymbolizableFiles(processes): 921 def ResolveSymbolizableFiles(processes):
376 """Resolves and groups PCs into list of SymbolizableFiles. 922 """Resolves and groups PCs into list of SymbolizableFiles.
377 923
378 As part of the grouping process, this function resolves PC from each stack 924 As part of the grouping process, this function resolves PC from each stack
379 frame to the corresponding mmap region. Stack frames that failed to resolve 925 frame to the corresponding mmap region. Stack frames that failed to resolve
380 are symbolized with '<unresolved>'. 926 are symbolized with '<unresolved>'.
381 """ 927 """
382 symfile_by_path = {} 928 symfile_by_path = {}
383 for process in processes: 929 for process in processes:
384 for frame in process.stack_frames.pc_frames: 930 if not process.memory_map:
385 region = process.mmaps.FindRegion(frame.pc) 931 continue
932 for frame in process.stack_frame_map.frame_by_id.itervalues():
933 if frame.pc is None:
934 continue
935 region = process.memory_map.FindRegion(frame.pc)
386 if region is None: 936 if region is None:
387 frame.name = '<unresolved>' 937 frame.name = '<unresolved>'
388 continue 938 continue
389 939
390 symfile = symfile_by_path.get(region.file_path) 940 symfile = symfile_by_path.get(region.file_path)
391 if symfile is None: 941 if symfile is None:
392 symfile = SymbolizableFile(region.file_path) 942 symfile = SymbolizableFile(region.file_path)
393 symfile_by_path[symfile.path] = symfile 943 symfile_by_path[symfile.path] = symfile
394 944
395 relative_pc = frame.pc - region.start_address 945 relative_pc = frame.pc - region.start_address
396 symfile.frames_by_address[relative_pc].append(frame) 946 symfile.frames_by_address[relative_pc].append(frame)
397 return symfile_by_path.values() 947 return symfile_by_path.values()
398 948
399 949
950 def FindInSystemPath(binary_name):
951 paths = os.environ['PATH'].split(os.pathsep)
952 for path in paths:
953 binary_path = os.path.join(path, binary_name)
954 if os.path.isfile(binary_path):
955 return binary_path
956 return None
957
958
959 class Symbolizer(object):
960 """Encapsulates platform-specific symbolization logic."""
961
962 def __init__(self):
963 self.is_mac = sys.platform == 'darwin'
964 self.is_win = sys.platform == 'win32'
965 if self.is_mac:
966 self.binary = 'atos'
967 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
968 elif self.is_win:
969 self.binary = 'addr2line-pdb.exe'
970 else:
971 self.binary = 'addr2line'
972 self.symbolizer_path = FindInSystemPath(self.binary)
973
974 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
975 def _SymbolizerCallback(sym_info, frames):
976 # Unwind inline chain to the top.
977 while sym_info.inlined_by:
978 sym_info = sym_info.inlined_by
979
980 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
981 for frame in frames:
982 frame.name = symbolized_name
983 frame.ext.source_path = sym_info.source_path
984
985 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
986 self.symbolizer_path,
987 _SymbolizerCallback,
988 inlines=True)
989
990 for address, frames in symfile.frames_by_address.iteritems():
991 # SymbolizeAsync() asserts that the type of address is int. We operate
992 # on longs (since they are raw pointers possibly from 64-bit processes).
993 # It's OK to cast here because we're passing relative PC, which should
994 # always fit into int.
995 symbolizer.SymbolizeAsync(int(address), frames)
996
997 symbolizer.Join()
998
999
1000 def _SymbolizeMac(self, symfile):
1001 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
1002
1003 # 16 for the address, 2 for "0x", 1 for the space
1004 chars_per_address = 19
1005
1006 load_address = (symbolize_trace_macho_reader.
1007 ReadMachOTextLoadAddress(symfile.symbolizable_path))
1008 assert load_address is not None
1009
1010 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
1011 '0x%x' % load_address, '-o',
1012 symfile.symbolizable_path]
1013 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
1014
1015 # The maximum number of inputs that can be processed at once is limited by
1016 # ARG_MAX. This currently evalutes to ~13000 on macOS.
1017 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
1018
1019 all_keys = symfile.frames_by_address.keys()
1020 processed_keys_count = 0
1021 while len(all_keys):
1022 input_count = min(len(all_keys), max_inputs)
1023 keys_to_process = all_keys[0:input_count]
1024 cmd = list(cmd_base)
1025 cmd.extend([hex(int(x) + load_address)
1026 for x in keys_to_process])
1027 output_array = subprocess.check_output(cmd).split('\n')
1028 for i in range(len(keys_to_process)):
1029 for frame in (symfile.frames_by_address.values()
1030 [i + processed_keys_count]):
1031 frame.name = self._matcher.Match(output_array[i])
1032 processed_keys_count += len(keys_to_process)
1033 all_keys = all_keys[input_count:]
1034
1035 def _SymbolizeWin(self, symfile):
1036 """Invoke symbolizer binary on windows and write all input in one go.
1037
1038 Unlike linux, on windows, symbolization talks through a shared system
1039 service that handles communication with the NT symbol servers. This
1040 creates an explicit serialization (and therefor lock contention) of
1041 any process using the symbol API for files do not have a local PDB.
1042
1043 Thus, even though the windows symbolizer binary can be make command line
1044 compatible with the POSIX addr2line interface, paralellizing the
1045 symbolization does not yield the same performance effects. Running
1046 just one symbolizer seems good enough for now. Can optimize later
1047 if this becomes a bottleneck.
1048 """
1049 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
1050 symfile.symbolizable_path]
1051
1052 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
1053 stderr=sys.stderr)
1054 addrs = ["%x" % relative_pc for relative_pc in
1055 symfile.frames_by_address.keys()]
1056 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
1057 stdout_data = stdout_data.split('\n')
1058
1059 # This is known to be in the same order as stderr_data.
1060 for i, addr in enumerate(addrs):
1061 for frame in symfile.frames_by_address[int(addr, 16)]:
1062 # Output of addr2line with --functions is always 2 outputs per
1063 # symbol, function name followed by source line number. Only grab
1064 # the function name as line info is not always available.
1065 frame.name = stdout_data[i * 2]
1066
1067 def Symbolize(self, symfile, unsymbolized_name):
1068 if self.is_mac:
1069 self._SymbolizeMac(symfile)
1070 elif self.is_win:
1071 self._SymbolizeWin(symfile)
1072 else:
1073 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
1074
1075 def IsSymbolizableFile(self, file_path):
1076 if self.is_win:
1077 extension = os.path.splitext(file_path)[1].lower()
1078 return extension in ['.dll', '.exe']
1079 else:
1080 result = subprocess.check_output(['file', '-0', file_path])
1081 type_string = result[result.find('\0') + 1:]
1082 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
1083 type_string, re.DOTALL))
1084
1085
400 def SymbolizeFiles(symfiles, symbolizer): 1086 def SymbolizeFiles(symfiles, symbolizer):
401 """Symbolizes each file in the given list of SymbolizableFiles 1087 """Symbolizes each file in the given list of SymbolizableFiles
402 and updates stack frames with symbolization results.""" 1088 and updates stack frames with symbolization results."""
1089
1090 if not symfiles:
1091 print 'Nothing to symbolize.'
1092 return
1093
403 print 'Symbolizing...' 1094 print 'Symbolizing...'
404 1095
405 def _SubPrintf(message, *args): 1096 def _SubPrintf(message, *args):
406 print (' ' + message).format(*args) 1097 print (' ' + message).format(*args)
407 1098
408 symbolized = False
409 for symfile in symfiles: 1099 for symfile in symfiles:
410 unsymbolized_name = '<{}>'.format( 1100 unsymbolized_name = '<{}>'.format(
411 symfile.path if symfile.path else 'unnamed') 1101 symfile.path if symfile.path else 'unnamed')
412 1102
413 problem = None 1103 problem = None
414 if not os.path.isabs(symfile.symbolizable_path): 1104 if not os.path.isabs(symfile.symbolizable_path):
415 problem = 'not a file' 1105 problem = 'not a file'
416 elif not os.path.isfile(symfile.symbolizable_path): 1106 elif not os.path.isfile(symfile.symbolizable_path):
417 problem = "file doesn't exist" 1107 problem = "file doesn't exist"
418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): 1108 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):
419 problem = 'file is not symbolizable' 1109 problem = 'file is not symbolizable'
420 if problem: 1110 if problem:
421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", 1111 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",
422 len(symfile.frames_by_address), 1112 len(symfile.frames_by_address),
423 symfile.symbolizable_path, 1113 symfile.symbolizable_path,
424 problem) 1114 problem)
425 for frames in symfile.frames_by_address.itervalues(): 1115 for frames in symfile.frames_by_address.itervalues():
426 for frame in frames: 1116 for frame in frames:
427 frame.name = unsymbolized_name 1117 frame.name = unsymbolized_name
428 continue 1118 continue
429 1119
430 _SubPrintf('Symbolizing {} PCs from {}...', 1120 _SubPrintf('Symbolizing {} PCs from {}...',
431 len(symfile.frames_by_address), 1121 len(symfile.frames_by_address),
432 symfile.path) 1122 symfile.path)
433 1123
434 symbolizer.Symbolize(symfile, unsymbolized_name) 1124 symbolizer.Symbolize(symfile, unsymbolized_name)
435 symbolized = True
436 1125
437 return symbolized 1126
1127 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)
1128 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available
1129 # via 'name' group.
1130 ANDROID_PATH_MATCHER = re.compile(
1131 r'^/data/(?:'
1132 r'app/[^/]+/lib/[^/]+/|'
1133 r'app-lib/[^/]+/|'
1134 r'data/[^/]+/incremental-install-files/lib/'
1135 r')(?P<name>.*\.so)')
1136
1137 # Subpath of output path where unstripped libraries are stored.
1138 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
438 1139
439 1140
440 def HaveFilesFromAndroid(symfiles): 1141 def HaveFilesFromAndroid(symfiles):
441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) 1142 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)
442 1143
443 1144
444 def RemapAndroidFiles(symfiles, output_path): 1145 def RemapAndroidFiles(symfiles, output_path):
445 for symfile in symfiles: 1146 for symfile in symfiles:
446 match = ANDROID_PATH_MATCHER.match(symfile.path) 1147 match = ANDROID_PATH_MATCHER.match(symfile.path)
447 if match: 1148 if match:
448 name = match.group('name') 1149 name = match.group('name')
449 symfile.symbolizable_path = os.path.join( 1150 symfile.symbolizable_path = os.path.join(
450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) 1151 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)
451 else: 1152 else:
452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). 1153 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().
453 # Without this, files won't be symbolized with "file not found" problem, 1154 # Without this, files won't be symbolized with "file not found" problem,
454 # which is not accurate. 1155 # which is not accurate.
455 symfile.symbolizable_path = 'android://{}'.format(symfile.path) 1156 symfile.symbolizable_path = 'android://{}'.format(symfile.path)
456 1157
457 1158
1159 def Symbolize(options, trace, symbolizer):
1160 symfiles = ResolveSymbolizableFiles(trace.processes)
1161
1162 # Android trace files don't have any indication they are from Android.
1163 # So we're checking for Android-specific paths.
1164 if HaveFilesFromAndroid(symfiles):
1165 if not options.output_directory:
1166 sys.exit('The trace file appears to be from Android. Please '
1167 'specify output directory to properly symbolize it.')
1168 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
1169
1170 SymbolizeFiles(symfiles, symbolizer)
1171
1172
1173 def OpenTraceFile(file_path, mode):
1174 if file_path.endswith('.gz'):
1175 return gzip.open(file_path, mode + 'b')
1176 else:
1177 return open(file_path, mode + 't')
1178
1179
458 # Suffix used for backup files. 1180 # Suffix used for backup files.
459 BACKUP_FILE_TAG = '.BACKUP' 1181 BACKUP_FILE_TAG = '.BACKUP'
460 1182
461 def main(): 1183 def main():
462 parser = argparse.ArgumentParser() 1184 parser = argparse.ArgumentParser()
463 parser.add_argument('file', 1185 parser.add_argument(
464 help='Trace file to symbolize (.json or .json.gz)') 1186 'file',
465 parser.add_argument('--no-backup', 1187 help='Trace file to symbolize (.json or .json.gz)')
466 dest='backup', default='true', action='store_false',
467 help="Don't create {} files".format(BACKUP_FILE_TAG))
468 parser.add_argument('--output-directory',
469 help='The path to the build output directory, such ' +
470 'as out/Debug. Only needed for Android.')
471 options = parser.parse_args()
472 1188
473 trace_file_path = options.file 1189 parser.add_argument(
474 def _OpenTraceFile(mode): 1190 '--no-backup', dest='backup', default='true', action='store_false',
475 if trace_file_path.endswith('.gz'): 1191 help="Don't create {} files".format(BACKUP_FILE_TAG))
476 return gzip.open(trace_file_path, mode + 'b') 1192
477 else: 1193 parser.add_argument(
478 return open(trace_file_path, mode + 't') 1194 '--output-directory',
1195 help='The path to the build output directory, such as out/Debug.')
479 1196
480 symbolizer = Symbolizer() 1197 symbolizer = Symbolizer()
481 if symbolizer.symbolizer_path is None: 1198 if symbolizer.symbolizer_path is None:
482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) 1199 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)
483 1200
1201 options = parser.parse_args()
1202
1203 trace_file_path = options.file
1204
484 print 'Reading trace file...' 1205 print 'Reading trace file...'
485 with _OpenTraceFile('r') as trace_file: 1206 with OpenTraceFile(trace_file_path, 'r') as trace_file:
486 trace = json.load(trace_file) 1207 trace = Trace(json.load(trace_file))
487 1208
488 processes = CollectProcesses(trace) 1209 Symbolize(options, trace, symbolizer)
489 symfiles = ResolveSymbolizableFiles(processes)
490 1210
491 # Android trace files don't have any indication they are from Android. 1211 if trace.modified:
492 # So we're checking for Android-specific paths. 1212 trace.ApplyModifications()
493 if HaveFilesFromAndroid(symfiles):
494 if not options.output_directory:
495 parser.error('The trace file appears to be from Android. Please '
496 "specify output directory (e.g. 'out/Debug') to properly "
497 'symbolize it.')
498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
499 1213
500 if SymbolizeFiles(symfiles, symbolizer):
501 if options.backup: 1214 if options.backup:
502 backup_file_path = trace_file_path + BACKUP_FILE_TAG 1215 backup_file_path = trace_file_path + BACKUP_FILE_TAG
503 print 'Backing up trace file to {}...'.format(backup_file_path) 1216 print 'Backing up trace file to {}'.format(backup_file_path)
504 os.rename(trace_file_path, backup_file_path) 1217 os.rename(trace_file_path, backup_file_path)
505 1218
506 print 'Updating trace file...' 1219 print 'Updating the trace file...'
507 with _OpenTraceFile('w') as trace_file: 1220 with OpenTraceFile(trace_file_path, 'w') as trace_file:
508 json.dump(trace, trace_file) 1221 json.dump(trace.node, trace_file)
509 else: 1222 else:
510 print 'No PCs symbolized - not updating trace file.' 1223 print 'No modifications were made - not updating the trace file.'
511 1224
512 1225
513 if __name__ == '__main__': 1226 if __name__ == '__main__':
514 main() 1227 main()
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698