Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(739)

Side by Side Diff: tracing/bin/symbolize_trace

Issue 2810523002: symbolize_trace: support new heap dump format. (Closed)
Patch Set: We need to go deeper Created 3 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 # Copyright 2016 The Chromium Authors. All rights reserved. 2 # Copyright 2016 The Chromium Authors. All rights reserved.
3 # Use of this source code is governed by a BSD-style license that can be 3 # Use of this source code is governed by a BSD-style license that can be
4 # found in the LICENSE file. 4 # found in the LICENSE file.
5 5
6 """
7 This script processes trace files and symbolizes stack frames generated by
8 Chrome's native heap profiler.
9
10 === Overview ===
11
12 Trace file is essentially a giant JSON array of dictionaries (events).
13 Events have some predefined keys (e.g. 'pid'), but otherwise are free to
14 have anything inside. Trace file contains events from all Chrome processes
15 that were sampled during tracing period.
16
17 This script cares only about memory dump events generated with memory-infra
18 category enabled.
19
20 When Chrome native heap profiling is enabled, some memory dump events
21 include the following extra information:
22
23 * (Per allocator) Information about live allocations at the moment of the
24 memory dump (the information includes backtraces, types / categories,
25 sizes, and counts of allocations). There are several allocators in
26 Chrome: malloc, blink_gc, and partition_alloc.
27
28 * (Per process) Stack frame tree of all functions that called allocators
29 above.
30
31 This script does the following:
32
33 1. Parses the given trace file (loads JSON).
34 2. Finds memory dump events and parses stack frame tree for each process.
35 3. Finds stack frames that have PC addresses instead of function names.
36 4. Symbolizes PCs and modifies loaded JSON.
37 5. Writes modified JSON back to the file.
38
39 === Details ===
40
41 There are two formats of heap profiler information: legacy and modern. The
42 main differences relevant to this script are:
43
44 * In the modern format stack frame tree, type name mapping, and string
Wez 2017/05/03 00:17:10 nit: " ... modern format the stack frame ..." Oth
DmitrySkiba 2017/05/04 00:30:55 Done.
45 mapping nodes are dumped incrementally. These nodes are dumped in each
46 memory dump event and carry updates that occurred since the last event.
47
48 For example, let's say that when the first memory dump event is generated
49 we only know about a function foo() (called from main()) allocating objects
50 of type "int":
51
52 {
53 "args": {
54 "dumps": {
55 "heaps_v2": {
56 "maps": {
57 "nodes": [
58 { "id": 1, "name_sid": 1 },
59 { "id": 2, "parent": 1, "name_sid": 3 },
60 ],
61 "types": [
62 { "id": 1, "name_sid": 2 },
63 ],
64 "strings": [
65 { "id": 1, "string": "main()" },
66 { "id": 2, "string": "int" },
67 { "id": 3, "string": "foo()" },
68 ]
69 },
70 "allocators": { ...live allocations per allocator... },
71 ...
72 },
73 ...
74 }
75 },
76 ...
77 }
78
79 Here:
80 * 'nodes' node encodes stack frame tree
81 * 'types' node encodes type name mappings
82 * 'strings' node encodes string mapping (explained below)
83
84 Then, by the time second memory dump even is generated, we learn about
85 bar() (called from main()), which also allocated "int" objects. Only the
86 new information is dumped, i.e. bar() stack frame:
87
88 {
89 "args": {
90 "dumps": {
91 "heaps_v2": {
92 "maps": {
93 "nodes": [
94 { "id": 2, "parent": 1, "name_sid": 4 },
95 ],
96 "types": [],
97 "strings": [
98 { "id": 4, "string": "bar()" },
99 ]
100 },
101 "allocators": { ...live allocations per allocator... },
102 ...
103 },
104 ...
105 }
106 },
107 ...
108 }
109
110 Note that 'types' node is empty, since there were no updates. All three
111 nodes ('nodes', types', and 'strings') can be empty if there were no updates
112 to them.
113
114 For simplicity, when the script updates incremental nodes, it puts updated
115 content in the first node, and clears all others. I.e. the following stack
116 frame nodes:
117
118 'nodes': [
119 { "id": 1, "name_sid": 1 },
120 { "id": 2, "parent": 1, "name_sid": 2 },
121 ]
122 'nodes': [
123 { "id": 3, "parent": 2, "name_sid": 3 },
124 ]
125 'nodes': [
126 { "id": 4, "parent": 3, "name_sid": 4 },
127 { "id": 5, "parent": 1, "name_sid": 5 },
128 ]
129
130 After symbolization are written as:
131
132 'nodes': [
133 { "id": 1, "name_sid": 1 },
134 { "id": 2, "parent": 1, "name_sid": 2 },
135 { "id": 3, "parent": 2, "name_sid": 3 },
136 { "id": 4, "parent": 3, "name_sid": 4 },
137 { "id": 5, "parent": 1, "name_sid": 5 },
138 ]
139 'nodes': []
140 'nodes': []
141
142
143 * In contrast, in the legacy format stack frame tree and type mappings are
144 dumped separately from memory dump events, once per process.
145
146 Here is how trace file with two memory dump events looks like in the
147 legacy format:
148
149 {
150 "args": {
151 "dumps": {
152 "heaps": { ...live allocations per allocator... },
153 ...
154 }
155 },
156 ...
157 }
158
159 {
160 "args": {
161 "dumps": {
162 "heaps": { ...live allocations per allocator... },
163 ...
164 }
165 },
166 ...
167 }
168
169 {
170 "args": {
171 "typeNames": {
172 1: "int",
173 }
174 },
175 "cat": "__metadata",
176 "name": "typeNames",
177 ...
178 }
179
180 {
181 "args": {
182 "stackFrames": {
183 1: { "name": "main" },
184 2: { "name": "foo", "parent": 1 },
185 3: { "name": "bar", "parent": 1 },
186 }
187 },
188 "cat": "__metadata",
189 "name": "stackFrames",
190 ...
191 }
192
193
194 * Another change in the modern format is 'strings' node, which was added
195 to deduplicate stack frame names (mainly for trace file size reduction).
196 For consistency 'types' node also uses string mappings.
197
198
199 See crbug.com/708930 for more information about the modern format.
200 """
201
6 import argparse 202 import argparse
7 import bisect 203 import bisect
8 import collections 204 import collections
9 import gzip 205 import gzip
206 import itertools
10 import json 207 import json
11 import os 208 import os
12 import re 209 import re
13 import subprocess 210 import subprocess
14 import sys 211 import sys
15 212
16 _SYMBOLS_PATH = os.path.abspath(os.path.join( 213 _SYMBOLS_PATH = os.path.abspath(os.path.join(
17 os.path.dirname(os.path.realpath(__file__)), 214 os.path.dirname(os.path.realpath(__file__)),
18 '..', 215 '..',
19 'third_party', 216 'third_party',
20 'symbols')) 217 'symbols'))
21 sys.path.append(_SYMBOLS_PATH) 218 sys.path.append(_SYMBOLS_PATH)
22 # pylint: disable=import-error 219 # pylint: disable=import-error
23 import symbols.elf_symbolizer as elf_symbolizer 220 import symbols.elf_symbolizer as elf_symbolizer
24 221
25 import symbolize_trace_atos_regex 222 import symbolize_trace_atos_regex
26 import symbolize_trace_macho_reader 223 import symbolize_trace_macho_reader
27 224
28 225
29 # Relevant trace event phases from Chromium's 226 class NodeWrapper(object):
30 # src/base/trace_event/common/trace_event_common.h. 227 """Wraps an event data node(s).
31 TRACE_EVENT_PHASE_METADATA = 'M' 228
32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v' 229 A node is a reference into a trace event JSON. Wrappers parse nodes to
230 provide convenient APIs and update nodes when asked to propagate changes
231 back (see ApplyModifications() below).
232
233 Here is an example of legacy metadata event that contains stack frame tree:
234
235 {
236 "args": {
237 "stackFrames": { ... }
238 },
239 "cat": "__metadata",
240 "name": "stackFrames",
241 "ph": "M",
242 ...
243 }
244
245 When this event is encountered, a reference to the "stackFrames" dictionary
246 is obtained and passed down to a specific wrapped class, which knows how to
247 parse / update the dictionary.
248
249 There are two parsing patterns depending on whether node is serialized
250 incrementally:
251
252 * If node is not incremental, then parsing is done by __init__(),
253 see MemoryMap for an example.
254
255 * If node is incremental, then __init__() does nothing, and ParseNext()
256 is called when next node (from a next event) is encountered.
257
258 Some wrappers can also modify nodes they parsed. In such cases they have
259 additional APIs:
260
261 * 'modified' flag, which indicates whether the wrapper was changed.
262
263 * 'ApplyModifications' method, which propagates changes made to the wrapper
264 back to nodes. Successful invocation of ApplyModifications() resets
265 'modified' flag.
266
267 """
268
269 # def __init__(self, node):
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 Are these commented lines intentional ? I think th
DmitrySkiba 2017/05/04 00:30:56 The thing is that their exact shape is not determi
270 # ...
271
272 # def ParseNext(self, node, ...):
273 # ...
274
275 # @property
276 # def modified(self):
277 # ...
278
279 # def ApplyModifications(self, ...):
280 # ...
281
282 pass
33 283
34 284
35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) 285 class MemoryMap(NodeWrapper):
36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available 286 """Wraps 'process_mmaps' node.
37 # via 'name' group.
38 ANDROID_PATH_MATCHER = re.compile(
39 r'^/data/(?:'
40 r'app/[^/]+/lib/[^/]+/|'
41 r'app-lib/[^/]+/|'
42 r'data/[^/]+/incremental-install-files/lib/'
43 r')(?P<name>.*\.so)')
44 287
45 # Subpath of output path where unstripped libraries are stored. 288 'process_mmaps' node contains information about file mappings.
46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
47 289
48 290 "process_mmaps": {
49 def FindInSystemPath(binary_name): 291 "vm_regions": [
50 paths = os.environ['PATH'].split(os.pathsep) 292 {
51 for path in paths: 293 "mf": "<file_path>",
52 binary_path = os.path.join(path, binary_name) 294 "sa": "<start_address>",
53 if os.path.isfile(binary_path): 295 "sz": "<size>",
54 return binary_path 296 ...
55 return None 297 },
56 298 ...
57 299 ]
58 class Symbolizer(object): 300 }
59 # Encapsulates platform-specific symbolization logic. 301 """
60 def __init__(self):
61 self.is_mac = sys.platform == 'darwin'
62 self.is_win = sys.platform == 'win32'
63 if self.is_mac:
64 self.binary = 'atos'
65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
66 elif self.is_win:
67 self.binary = 'addr2line-pdb.exe'
68 else:
69 self.binary = 'addr2line'
70 self.symbolizer_path = FindInSystemPath(self.binary)
71
72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
73 def _SymbolizerCallback(sym_info, frames):
74 # Unwind inline chain to the top.
75 while sym_info.inlined_by:
76 sym_info = sym_info.inlined_by
77
78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
79 for frame in frames:
80 frame.name = symbolized_name
81
82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
83 self.symbolizer_path,
84 _SymbolizerCallback,
85 inlines=True)
86
87 for address, frames in symfile.frames_by_address.iteritems():
88 # SymbolizeAsync() asserts that the type of address is int. We operate
89 # on longs (since they are raw pointers possibly from 64-bit processes).
90 # It's OK to cast here because we're passing relative PC, which should
91 # always fit into int.
92 symbolizer.SymbolizeAsync(int(address), frames)
93
94 symbolizer.Join()
95
96
97 def _SymbolizeMac(self, symfile):
98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
99
100 # 16 for the address, 2 for "0x", 1 for the space
101 chars_per_address = 19
102
103 load_address = (symbolize_trace_macho_reader.
104 ReadMachOTextLoadAddress(symfile.symbolizable_path))
105 assert load_address is not None
106
107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
108 '0x%x' % load_address, '-o',
109 symfile.symbolizable_path]
110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
111
112 # The maximum number of inputs that can be processed at once is limited by
113 # ARG_MAX. This currently evalutes to ~13000 on macOS.
114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
115
116 all_keys = symfile.frames_by_address.keys()
117 processed_keys_count = 0
118 while len(all_keys):
119 input_count = min(len(all_keys), max_inputs)
120 keys_to_process = all_keys[0:input_count]
121
122 cmd = list(cmd_base)
123 cmd.extend([hex(int(x) + load_address)
124 for x in keys_to_process])
125 output_array = subprocess.check_output(cmd).split('\n')
126 for i in range(len(keys_to_process)):
127 for frame in (symfile.frames_by_address.values()
128 [i + processed_keys_count]):
129 frame.name = self._matcher.Match(output_array[i])
130 processed_keys_count += len(keys_to_process)
131 all_keys = all_keys[input_count:]
132
133
134 def _SymbolizeWin(self, symfile):
135 """Invoke symbolizer binary on windows and write all input in one go.
136
137 Unlike linux, on windows, symbolization talks through a shared system
138 service that handles communication with the NT symbol servers. This
139 creates an explicit serialization (and therefor lock contention) of
140 any process using the symbol API for files do not have a local PDB.
141
142 Thus, even though the windows symbolizer binary can be make command line
143 compatible with the POSIX addr2line interface, paralellizing the
144 symbolization does not yield the same performance effects. Running
145 just one symbolizer seems good enough for now. Can optimize later
146 if this becomes a bottleneck.
147 """
148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
149 symfile.symbolizable_path]
150
151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
152 stderr=sys.stderr)
153 addrs = ["%x" % relative_pc for relative_pc in
154 symfile.frames_by_address.keys()]
155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
156 stdout_data = stdout_data.split('\n')
157
158 # This is known to be in the same order as stderr_data.
159 for i, addr in enumerate(addrs):
160 for frame in symfile.frames_by_address[int(addr, 16)]:
161 # Output of addr2line with --functions is always 2 outputs per
162 # symbol, function name followed by source line number. Only grab
163 # the function name as line info is not always available.
164 frame.name = stdout_data[i * 2]
165
166
167 def Symbolize(self, symfile, unsymbolized_name):
168 if self.is_mac:
169 self._SymbolizeMac(symfile)
170 if self.is_win:
171 self._SymbolizeWin(symfile)
172 else:
173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
174
175
176 def IsSymbolizableFile(self, file_path):
177 if self.is_win:
178 extension = os.path.splitext(file_path)[1].lower()
179 return extension in ['.dll', '.exe']
180 else:
181 result = subprocess.check_output(['file', '-0', file_path])
182 type_string = result[result.find('\0') + 1:]
183 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
184 type_string, re.DOTALL))
185
186
187 class ProcessMemoryMaps(object):
188 """Represents 'process_mmaps' trace file entry."""
189 302
190 class Region(object): 303 class Region(object):
191 def __init__(self, start_address, size, file_path): 304 def __init__(self, start_address, size, file_path):
192 self._start_address = start_address 305 self._start_address = start_address
193 self._size = size 306 self._size = size
194 self._file_path = file_path 307 self._file_path = file_path
195 308
196 @property 309 @property
197 def start_address(self): 310 def start_address(self):
198 return self._start_address 311 return self._start_address
(...skipping 15 matching lines...) Expand all
214 return long(self._start_address).__cmp__(long(other._start_address)) 327 return long(self._start_address).__cmp__(long(other._start_address))
215 elif isinstance(other, (long, int)): 328 elif isinstance(other, (long, int)):
216 return long(self._start_address).__cmp__(long(other)) 329 return long(self._start_address).__cmp__(long(other))
217 else: 330 else:
218 raise Exception('Cannot compare with %s' % type(other)) 331 raise Exception('Cannot compare with %s' % type(other))
219 332
220 def __repr__(self): 333 def __repr__(self):
221 return 'Region(0x{:X} - 0x{:X}, {})'.format( 334 return 'Region(0x{:X} - 0x{:X}, {})'.format(
222 self.start_address, self.end_address, self.file_path) 335 self.start_address, self.end_address, self.file_path)
223 336
224 def __init__(self, process_mmaps): 337 def __init__(self, process_mmaps_node):
225 """Parses 'process_mmaps' dictionary."""
226
227 regions = [] 338 regions = []
228 for region_value in process_mmaps['vm_regions']: 339 for region_node in process_mmaps_node['vm_regions']:
229 regions.append(self.Region( 340 regions.append(self.Region(
230 long(region_value['sa'], 16), 341 long(region_node['sa'], 16),
231 long(region_value['sz'], 16), 342 long(region_node['sz'], 16),
232 region_value['mf'])) 343 region_node['mf']))
233 regions.sort() 344 regions.sort()
234 345
235 # Copy regions without duplicates and check for overlaps. 346 # Copy regions without duplicates and check for overlaps.
236 self._regions = [] 347 self._regions = []
237 previous_region = None 348 previous_region = None
238 for region in regions: 349 for region in regions:
239 if previous_region is not None: 350 if previous_region is not None:
240 if region == previous_region: 351 if region == previous_region:
241 continue 352 continue
242 assert region.start_address >= previous_region.end_address, \ 353 assert region.start_address >= previous_region.end_address, \
243 'Regions {} and {} overlap.'.format(previous_region, region) 354 'Regions {} and {} overlap.'.format(previous_region, region)
244 previous_region = region 355 previous_region = region
245 self._regions.append(region) 356 self._regions.append(region)
246 357
247 @property 358 @property
248 def regions(self): 359 def regions(self):
249 return self._regions 360 return self._regions
250 361
251 def FindRegion(self, address): 362 def FindRegion(self, address):
252 """Finds region containing |address|. Returns None if none found.""" 363 """Finds region containing |address|. Returns None if none found."""
253 364
254 region_index = bisect.bisect_right(self._regions, address) - 1 365 region_index = bisect.bisect_right(self._regions, address) - 1
255 if region_index >= 0: 366 if region_index >= 0:
256 region = self._regions[region_index] 367 region = self._regions[region_index]
257 if address >= region.start_address and address < region.end_address: 368 if address >= region.start_address and address < region.end_address:
258 return region 369 return region
259 return None 370 return None
260 371
261 372
262 class StackFrames(object): 373 class UnsupportedHeapDumpVersionError(Exception):
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 No need to change it now, but for the future I hon
DmitrySkiba 2017/05/04 00:30:56 I wanted to surface the version that caused the er
263 """Represents 'stackFrames' trace file entry.""" 374 """Helper exception class to signal unsupported heap dump version."""
264 375
265 class PCFrame(object): 376 def __init__(self, version):
266 def __init__(self, pc, frame): 377 message = 'Unsupported heap dump version: {}'.format(version)
378 super(UnsupportedHeapDumpVersionError, self).__init__(message)
379
380
381 class StringMap(NodeWrapper):
382 """Wraps all 'strings' nodes for a process.
383
384 'strings' node contains incremental mappings between integer ids and strings.
385
386 "strings": [
387 {
388 "id": <string_id>,
389 "string": <string>
390 },
391 ...
392 ]
393 """
394
395 def __init__(self):
396 self._modified = False
397 self._strings_nodes = []
398 self._string_by_id = {}
399 self._id_by_string = {}
400 self._max_string_id = 0
401
402 @property
403 def modified(self):
404 """Returns True if the wrapper was modified (see NodeWrapper)."""
405 return self._modified
406
407 @property
408 def string_by_id(self):
409 return self._string_by_id
410
411 def ParseNext(self, heap_dump_version, strings_node):
412 """Parses and interns next node (see NodeWrapper)."""
413
414 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 Here I would have just done assert(heap_dump_vers
DmitrySkiba 2017/05/04 00:30:55 Acknowledged.
415 raise UnsupportedHeapDumpVersionError(heap_dump_version)
416
417 self._strings_nodes.append(strings_node)
418 for string_node in strings_node:
419 self._Insert(string_node['id'], string_node['string'])
420
421 def Clear(self):
422 """Clears all string mappings."""
423 if self._string_by_id:
424 self._modified = True
425 self._string_by_id = {}
426 self._id_by_string = {}
427 self._Insert(0, '[null]')
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 is it intentional that clear does this _Insert and
DmitrySkiba 2017/05/04 00:30:55 __init__() (or rather ParseNext) wraps existing no
428 self._max_string_id = 0
429
430 def AddString(self, string):
431 """Adds a string (if it doesn't exist) and returns its integer id."""
432 string_id = self._id_by_string.get(string)
433 if string_id is None:
434 string_id = self._max_string_id + 1
435 self._Insert(string_id, string)
436 self._modified = True
437 return string_id
438
439 def ApplyModifications(self):
440 """Propagates modifications back to nodes (see NodeWrapper)."""
441 if not self.modified:
442 return
443
444 assert self._strings_nodes, 'no nodes'
445
446 # Serialize into the first node, and clear all others.
447
448 for strings_node in self._strings_nodes:
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 maybe when you do this add a comment explaining th
DmitrySkiba 2017/05/04 00:30:55 See comments at the top of the file. "Details" exp
449 del strings_node[:]
450 strings_node = self._strings_nodes[0]
451 for string_id, string in self._string_by_id.iteritems():
452 strings_node.append({'id': string_id, 'string': string})
453
454 self._modified = False
455
456 def _Insert(self, string_id, string):
457 self._id_by_string[string] = string_id
458 self._string_by_id[string_id] = string
459 self._max_string_id = max(self._max_string_id, string_id)
460
461
462 class TypeNameMap(NodeWrapper):
463 """Wraps all 'types' nodes for a process.
464
465 'types' nodes encode mappings between integer type ids and integer
466 string ids (from 'strings' nodes).
467
468 "types": [
469 {
470 "id": <type_id>,
471 "name_sid": <name_string_id>
472 }
473 ...
474 ]
475
476 For simplicity string ids are translated into strings during parsing,
477 and then translated back to ids in ApplyModifications().
478 """
479 def __init__(self):
480 self._modified = False
481 self._type_name_nodes = []
482 self._name_by_id = {}
483 self._id_by_name = {}
484 self._max_type_id = 0
485
486 @property
487 def modified(self):
488 """Returns True if the wrapper was modified (see NodeWrapper)."""
489 return self._modified
490
491 @property
492 def name_by_id(self):
493 """Returns {id -> name} dict (must not be changed directly)."""
494 return self._name_by_id
495
496 def ParseNext(self, heap_dump_version, type_name_node, string_map):
497 """Parses and interns next node (see NodeWrapper).
498
499 |string_map| - A StringMap object to use to translate string ids
500 to strings.
501 """
502 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
503 raise UnsupportedHeapDumpVersionError(heap_dump_version)
504
505 self._type_name_nodes.append(type_name_node)
506 for type_node in type_name_node:
507 self._Insert(type_node['id'],
508 string_map.string_by_id[type_node['name_sid']])
509
510 def AddType(self, type_name):
511 """Adds a type name (if it doesn't exist) and returns its id."""
512 type_id = self._id_by_name.get(type_name)
513 if type_id is None:
514 type_id = self._max_type_id + 1
515 self._Insert(type_id, type_name)
516 self._modified = True
517 return type_id
518
519 def ApplyModifications(self, string_map, force=False):
520 """Propagates modifications back to nodes.
521
522 |string_map| - A StringMap object to use to translate strings to ids.
523 |force| - Whether to propagate changes regardless of 'modified' flag.
524 """
525 if not self.modified and not force:
526 return
527
528 assert self._type_name_nodes, 'no nodes'
529
530 # Serialize into the first node, and clear all others.
531
532 for types_node in self._type_name_nodes:
533 del types_node[:]
534 types_node = self._type_name_nodes[0]
535 for type_id, type_name in self._name_by_id.iteritems():
536 types_node.append({
537 'id': type_id,
538 'name_sid': string_map.AddString(type_name)})
539
540 self._modified = False
541
542 def _Insert(self, type_id, type_name):
543 self._id_by_name[type_name] = type_id
544 self._name_by_id[type_id] = type_name
545 self._max_type_id = max(self._max_type_id, type_id)
546
547
548 class StackFrameMap(NodeWrapper):
549 """ Wraps stack frame tree nodes for a process.
550
551 For the legacy format this wrapper expects a single 'stackFrames' node
552 (which comes from metadata event):
553
554 "stackFrames": {
555 "<frame_id>": {
556 "name": "<frame_name>"
557 "parent": "<parent_frame_id>"
558 },
559 ...
560 }
561
562 For the modern format this wrapper expects several 'nodes' nodes:
563
564 "nodes": [
565 {
566 "id": <frame_id>,
567 "parent": <parent_frame_id>,
568 "name_sid": <name_string_id>
569 },
570 ...
571 ]
572
573 In both formats frame name is a string. Native heap profiler generates
574 specially formatted frame names (e.g. "pc:10eb78dba") for function
575 addresses (PCs). Inner Frame class below parses name and extracts PC,
576 if it's there.
577 """
578 class Frame(object):
579 def __init__(self, frame_id, name, parent_frame_id):
267 self._modified = False 580 self._modified = False
268 self._pc = pc 581 self._id = frame_id
269 self._frame = frame 582 self._name = name
583 self._pc = self._ParsePC(name)
584 self._parent_id = parent_frame_id
585 self._ext = None
270 586
271 @property 587 @property
272 def modified(self): 588 def modified(self):
589 """Returns True if the frame was modified.
590
591 For example changing frame's name sets this flag (since the change
592 needs to be propagated back to nodes).
593 """
273 return self._modified 594 return self._modified
274 595
275 @property 596 @property
597 def id(self):
598 """Frame id (integer)."""
599 return self._id
600
601 @property
276 def pc(self): 602 def pc(self):
603 """Parsed (integer) PC of the frame, or None."""
277 return self._pc 604 return self._pc
278 605
279 @property 606 @property
280 def name(self): 607 def name(self):
281 return self._frame['name'] 608 """Name of the frame (see above)."""
609 return self._name
282 610
283 @name.setter 611 @name.setter
284 def name(self, value): 612 def name(self, value):
613 """Changes the name. Doesn't affect value of |pc|."""
285 self._modified = True 614 self._modified = True
286 self._frame['name'] = value 615 self._name = value
287 616
288 def __init__(self, stack_frames): 617 @property
289 """Constructs object using 'stackFrames' dictionary.""" 618 def parent_id(self):
290 self._pc_frames = [] 619 """Parent frame id (integer)."""
291 for frame in stack_frames.itervalues(): 620 return self._parent_id
292 pc_frame = self._ParsePCFrame(frame) 621
293 if pc_frame: 622 _PC_TAG = 'pc:'
294 self._pc_frames.append(pc_frame) 623
295 624 def _ParsePC(self, name):
296 @property 625 if not name.startswith(self._PC_TAG):
297 def pc_frames(self): 626 return None
298 return self._pc_frames 627 return long(name[len(self._PC_TAG):], 16)
628
629 def _ClearModified(self):
630 self._modified = False
631
632 def __init__(self):
633 self._modified = False
634 self._heap_dump_version = None
635 self._stack_frames_nodes = []
636 self._frame_by_id = {}
299 637
300 @property 638 @property
301 def modified(self): 639 def modified(self):
302 return any(f.modified for f in self._pc_frames) 640 """Returns True if the wrapper or any of its frames were modified."""
303 641 return (self._modified or
304 _PC_TAG = 'pc:' 642 any(f.modified for f in self._frame_by_id.itervalues()))
305 643
306 @classmethod 644 @property
307 def _ParsePCFrame(self, frame): 645 def frame_by_id(self):
308 name = frame['name'] 646 """Returns {id -> frame} dict (must not be modified directly)."""
309 if not name.startswith(self._PC_TAG): 647 return self._frame_by_id
310 return None 648
311 pc = long(name[len(self._PC_TAG):], 16) 649 def ParseNext(self, heap_dump_version, stack_frames_node, string_map):
312 return self.PCFrame(pc, frame) 650 """Parses the next stack frames node (see NodeWrapper).
313 651
314 652 For the modern format |string_map| is used to translate string ids
315 class Process(object): 653 to strings.
316 """Holds various bits of information about a process in a trace file.""" 654 """
317 655
318 def __init__(self, pid): 656 frame_by_id = {}
319 self.pid = pid 657 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
320 self.name = None 658 if self._stack_frames_nodes:
321 self.mmaps = None 659 raise Exception('Legacy stack frames node is expected only once.')
322 self.stack_frames = None 660 for frame_id, frame_node in stack_frames_node.iteritems():
323 661 frame = self.Frame(frame_id,
324 662 frame_node['name'],
325 def CollectProcesses(trace): 663 frame_node.get('parent'))
326 """Parses trace dictionary and returns pid->Process map of all processes 664 frame_by_id[frame.id] = frame
327 suitable for symbolization (which have both mmaps and stack_frames). 665 else:
666 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
667 raise UnsupportedHeapDumpVersionError(heap_dump_version)
668 for frame_node in stack_frames_node:
669 frame = self.Frame(frame_node['id'],
670 string_map.string_by_id[frame_node['name_sid']],
671 frame_node.get('parent'))
672 frame_by_id[frame.id] = frame
673
674 self._heap_dump_version = heap_dump_version
675 self._stack_frames_nodes.append(stack_frames_node)
676
677 self._frame_by_id = frame_by_id
678
679 def ApplyModifications(self, string_map, force=False):
680 """Applies modifications back to nodes (see NodeWrapper)."""
681
682 if not self.modified and not force:
683 return
684
685 assert self._stack_frames_nodes, 'no nodes'
686 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
687 assert string_map is None, \
688 'string_map should not be used with the legacy format'
689
690 # Serialize frames into the first node, clear all others.
691
692 for frames_node in self._stack_frames_nodes:
693 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
694 frames_node.clear()
695 else:
696 del frames_node[:]
697
698 frames_node = self._stack_frames_nodes[0]
699 for frame in self._frame_by_id.itervalues():
700 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
701 frame_node = {'name': frame.name}
702 frames_node[frame.id] = frame_node
703 else:
704 frame_node = {
705 'id': frame.id,
706 'name_sid': string_map.AddString(frame.name)
707 }
708 frames_node.append(frame_node)
709 if frame.parent_id is not None:
710 frame_node['parent'] = frame.parent_id
711 frame._ClearModified()
712
713 self._modified = False
714
715
716 class Trace(NodeWrapper):
717 """Wrapper for the root trace node (i.e. the trace JSON itself).
718
719 This wrapper parses select nodes from memory-infra events and groups
720 parsed data per-process (see inner Process class below).
328 """ 721 """
329 722
330 process_map = {} 723 # Indicates legacy heap dump format.
331 724 HEAP_DUMP_VERSION_LEGACY = 'Legacy'
332 # Android traces produced via 'chrome://inspect/?tracing#devices' are 725
333 # just list of events. 726 # Indicates variation of a modern heap dump format.
334 events = trace if isinstance(trace, list) else trace['traceEvents'] 727 HEAP_DUMP_VERSION_1 = 1
335 for event in events: 728
336 name = event.get('name') 729 class Process(object):
337 if not name: 730 """Collection of per-process data and wrappers."""
338 continue 731
339 732 def __init__(self, pid):
340 pid = event['pid'] 733 self._pid = pid
341 process = process_map.get(pid) 734 self._name = None
342 if process is None: 735 self._memory_map = None
343 process = Process(pid) 736 self._stack_frame_map = StackFrameMap()
344 process_map[pid] = process 737 self._type_name_map = TypeNameMap()
345 738 self._string_map = StringMap()
346 phase = event['ph'] 739 self._heap_dump_version = None
347 if phase == TRACE_EVENT_PHASE_METADATA: 740
348 if name == 'process_name': 741 @property
349 process.name = event['args']['name'] 742 def modified(self):
350 elif name == 'stackFrames': 743 return self._stack_frame_map.modified or self._type_name_map.modified
351 process.stack_frames = StackFrames(event['args']['stackFrames']) 744
352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP: 745 @property
353 process_mmaps = event['args']['dumps'].get('process_mmaps') 746 def pid(self):
354 if process_mmaps: 747 return self._pid
355 # TODO(dskiba): this parses all process_mmaps, but retains only the 748
356 # last one. We need to parse only once (lazy parsing?). 749 @property
357 process.mmaps = ProcessMemoryMaps(process_mmaps) 750 def name(self):
358 751 return self._name
359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames] 752
753 @property
754 def unique_name(self):
755 """Returns string that includes both process name and its pid."""
756 name = self._name if self._name else 'UnnamedProcess'
757 return '{}({})'.format(name, self._pid)
758
759 @property
760 def memory_map(self):
761 return self._memory_map
762
763 @property
764 def stack_frame_map(self):
765 return self._stack_frame_map
766
767 @property
768 def type_name_map(self):
769 return self._type_name_map
770
771 def ApplyModifications(self):
772 """Calls ApplyModifications() on contained wrappers."""
773 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
774 self._stack_frame_map.ApplyModifications(None)
775 else:
776 if self._stack_frame_map.modified or self._type_name_map.modified:
777 self._string_map.Clear()
778 self._stack_frame_map.ApplyModifications(self._string_map, force=True)
779 self._type_name_map.ApplyModifications(self._string_map, force=True)
780 self._string_map.ApplyModifications()
781
782 def __init__(self, trace_node):
783 self._trace_node = trace_node
784 self._processes = []
785 self._heap_dump_version = None
786
787 # Misc per-process information needed only during parsing.
788 class ProcessExt(object):
789 def __init__(self, pid):
790 self.process = Trace.Process(pid)
791 self.mapped_entry_names = set()
792 self.process_mmaps_node = None
793 self.seen_strings_node = False
794
795 process_ext_by_pid = {}
796
797 # Android traces produced via 'chrome://inspect/?tracing#devices' are
798 # just list of events.
799 events = trace_node if isinstance(trace_node, list) \
800 else trace_node['traceEvents']
801 for event in events:
802 name = event.get('name')
803 if not name:
804 continue
805
806 pid = event['pid']
807 process_ext = process_ext_by_pid.get(pid)
808 if process_ext is None:
809 process_ext = ProcessExt(pid)
810 process_ext_by_pid[pid] = process_ext
811 process = process_ext.process
812
813 phase = event['ph']
814 if phase == self._EVENT_PHASE_METADATA:
815 if name == 'process_name':
816 process._name = event['args']['name']
817 elif name == 'stackFrames':
818 process._stack_frame_map.ParseNext(
819 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),
820 event['args']['stackFrames'],
821 process._string_map)
822 elif phase == self._EVENT_PHASE_MEMORY_DUMP:
823 dumps = event['args']['dumps']
824 process_mmaps = dumps.get('process_mmaps')
825 if process_mmaps:
826 # We want the most recent memory map, so parsing happens later
827 # once we finished reading all events.
828 process_ext.process_mmaps_node = process_mmaps
829 heaps = dumps.get('heaps_v2')
830 if heaps:
831 version = self._UseHeapDumpVersion(heaps['version'])
832 maps = heaps.get('maps')
833 if maps:
834 process_ext.mapped_entry_names.update(maps.iterkeys())
835 types = maps.get('types')
836 stack_frames = maps.get('nodes')
837 strings = maps.get('strings')
838 if (strings is None and (types or stack_frames)
839 and not process_ext.seen_strings_node):
840 # ApplyModifications() for TypeNameMap and StackFrameMap puts
841 # everything into the first node and depends on StringMap. So
842 # we need to make sure that 'strings' node is there if any of
843 # other two nodes present.
844 strings = []
845 maps['strings'] = strings
846 if strings is not None:
847 process_ext.seen_strings_node = True
848 process._string_map.ParseNext(version, strings)
849 if types:
850 process._type_name_map.ParseNext(
851 version, types, process._string_map)
852 if stack_frames:
853 process._stack_frame_map.ParseNext(
854 version, stack_frames, process._string_map)
855
856 self._processes = []
857 for pe in process_ext_by_pid.itervalues():
858 pe.process._heap_dump_version = self._heap_dump_version
859 if pe.process_mmaps_node:
860 # Now parse the most recent memory map.
861 pe.process._memory_map = MemoryMap(pe.process_mmaps_node)
862 self._processes.append(pe.process)
863
864 @property
865 def node(self):
866 """Root node (that was passed to the __init__)."""
867 return self._trace_node
868
869 @property
870 def modified(self):
871 """Returns True if trace file needs to be updated.
872
873 Before writing trace JSON back to a file ApplyModifications() needs
874 to be called.
875 """
876 return any(p.modified for p in self._processes)
877
878 @property
879 def processes(self):
880 return self._processes
881
882 @property
883 def heap_dump_version(self):
884 return self._heap_dump_version
885
886 def ApplyModifications(self):
887 """Propagates modifications back to the trace JSON."""
888 for process in self._processes:
889 process.ApplyModifications()
890 assert not self.modified, 'still modified'
891
892 # Relevant trace event phases from Chromium's
893 # src/base/trace_event/common/trace_event_common.h.
894 _EVENT_PHASE_METADATA = 'M'
895 _EVENT_PHASE_MEMORY_DUMP = 'v'
896
897 def _UseHeapDumpVersion(self, version):
898 if self._heap_dump_version is None:
899 self._heap_dump_version = version
900 return version
901 elif self._heap_dump_version != version:
902 raise Exception(
903 ("Inconsistent trace file: first saw '{}' heap dump version, "
904 "then '{}'.").format(self._heap_dump_version, version))
905 else:
906 return version
360 907
361 908
362 class SymbolizableFile(object): 909 class SymbolizableFile(object):
363 """Holds file path, addresses to symbolize and stack frames to update. 910 """Holds file path, addresses to symbolize and stack frames to update.
364 911
365 This class is a link between ELFSymbolizer and a trace file: it specifies 912 This class is a link between ELFSymbolizer and a trace file: it specifies
366 what to symbolize (addresses) and what to update with the symbolization 913 what to symbolize (addresses) and what to update with the symbolization
367 result (frames). 914 result (frames).
368 """ 915 """
369 def __init__(self, file_path): 916 def __init__(self, file_path):
370 self.path = file_path 917 self.path = file_path
371 self.symbolizable_path = file_path # path to use for symbolization 918 self.symbolizable_path = file_path # path to use for symbolization
372 self.frames_by_address = collections.defaultdict(list) 919 self.frames_by_address = collections.defaultdict(list)
373 920
374 921
375 def ResolveSymbolizableFiles(processes): 922 def ResolveSymbolizableFiles(processes):
376 """Resolves and groups PCs into list of SymbolizableFiles. 923 """Resolves and groups PCs into list of SymbolizableFiles.
377 924
378 As part of the grouping process, this function resolves PC from each stack 925 As part of the grouping process, this function resolves PC from each stack
379 frame to the corresponding mmap region. Stack frames that failed to resolve 926 frame to the corresponding mmap region. Stack frames that failed to resolve
380 are symbolized with '<unresolved>'. 927 are symbolized with '<unresolved>'.
381 """ 928 """
382 symfile_by_path = {} 929 symfile_by_path = {}
383 for process in processes: 930 for process in processes:
384 for frame in process.stack_frames.pc_frames: 931 if not process.memory_map:
385 region = process.mmaps.FindRegion(frame.pc) 932 continue
933 for frame in process.stack_frame_map.frame_by_id.itervalues():
934 if frame.pc is None:
935 continue
936 region = process.memory_map.FindRegion(frame.pc)
386 if region is None: 937 if region is None:
387 frame.name = '<unresolved>' 938 frame.name = '<unresolved>'
388 continue 939 continue
389 940
390 symfile = symfile_by_path.get(region.file_path) 941 symfile = symfile_by_path.get(region.file_path)
391 if symfile is None: 942 if symfile is None:
392 symfile = SymbolizableFile(region.file_path) 943 symfile = SymbolizableFile(region.file_path)
393 symfile_by_path[symfile.path] = symfile 944 symfile_by_path[symfile.path] = symfile
394 945
395 relative_pc = frame.pc - region.start_address 946 relative_pc = frame.pc - region.start_address
396 symfile.frames_by_address[relative_pc].append(frame) 947 symfile.frames_by_address[relative_pc].append(frame)
397 return symfile_by_path.values() 948 return symfile_by_path.values()
398 949
399 950
951 def FindInSystemPath(binary_name):
952 paths = os.environ['PATH'].split(os.pathsep)
953 for path in paths:
954 binary_path = os.path.join(path, binary_name)
955 if os.path.isfile(binary_path):
956 return binary_path
957 return None
958
959
960 class Symbolizer(object):
961 """Encapsulates platform-specific symbolization logic."""
962
963 def __init__(self):
964 self.is_mac = sys.platform == 'darwin'
965 self.is_win = sys.platform == 'win32'
966 if self.is_mac:
967 self.binary = 'atos'
968 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
969 elif self.is_win:
970 self.binary = 'addr2line-pdb.exe'
971 else:
972 self.binary = 'addr2line'
973 self.symbolizer_path = FindInSystemPath(self.binary)
974
975 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
976 def _SymbolizerCallback(sym_info, frames):
977 # Unwind inline chain to the top.
978 while sym_info.inlined_by:
979 sym_info = sym_info.inlined_by
980
981 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
982 for frame in frames:
983 frame.name = symbolized_name
984 frame.ext.source_path = sym_info.source_path
985
986 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
987 self.symbolizer_path,
988 _SymbolizerCallback,
989 inlines=True)
990
991 for address, frames in symfile.frames_by_address.iteritems():
992 # SymbolizeAsync() asserts that the type of address is int. We operate
993 # on longs (since they are raw pointers possibly from 64-bit processes).
994 # It's OK to cast here because we're passing relative PC, which should
995 # always fit into int.
996 symbolizer.SymbolizeAsync(int(address), frames)
997
998 symbolizer.Join()
999
1000
1001 def _SymbolizeMac(self, symfile):
1002 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
1003
1004 # 16 for the address, 2 for "0x", 1 for the space
1005 chars_per_address = 19
1006
1007 load_address = (symbolize_trace_macho_reader.
1008 ReadMachOTextLoadAddress(symfile.symbolizable_path))
1009 assert load_address is not None
1010
1011 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
1012 '0x%x' % load_address, '-o',
1013 symfile.symbolizable_path]
1014 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
1015
1016 # The maximum number of inputs that can be processed at once is limited by
1017 # ARG_MAX. This currently evalutes to ~13000 on macOS.
1018 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
1019
1020 all_keys = symfile.frames_by_address.keys()
1021 processed_keys_count = 0
1022 while len(all_keys):
1023 input_count = min(len(all_keys), max_inputs)
1024 keys_to_process = all_keys[0:input_count]
1025 cmd = list(cmd_base)
1026 cmd.extend([hex(int(x) + load_address)
1027 for x in keys_to_process])
1028 output_array = subprocess.check_output(cmd).split('\n')
1029 for i in range(len(keys_to_process)):
1030 for frame in (symfile.frames_by_address.values()
1031 [i + processed_keys_count]):
1032 frame.name = self._matcher.Match(output_array[i])
1033 processed_keys_count += len(keys_to_process)
1034 all_keys = all_keys[input_count:]
1035
1036 def _SymbolizeWin(self, symfile):
1037 """Invoke symbolizer binary on windows and write all input in one go.
1038
1039 Unlike linux, on windows, symbolization talks through a shared system
1040 service that handles communication with the NT symbol servers. This
1041 creates an explicit serialization (and therefor lock contention) of
1042 any process using the symbol API for files do not have a local PDB.
1043
1044 Thus, even though the windows symbolizer binary can be make command line
1045 compatible with the POSIX addr2line interface, paralellizing the
1046 symbolization does not yield the same performance effects. Running
1047 just one symbolizer seems good enough for now. Can optimize later
1048 if this becomes a bottleneck.
1049 """
1050 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
1051 symfile.symbolizable_path]
1052
1053 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
1054 stderr=sys.stderr)
1055 addrs = ["%x" % relative_pc for relative_pc in
1056 symfile.frames_by_address.keys()]
1057 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
1058 stdout_data = stdout_data.split('\n')
1059
1060 # This is known to be in the same order as stderr_data.
1061 for i, addr in enumerate(addrs):
1062 for frame in symfile.frames_by_address[int(addr, 16)]:
1063 # Output of addr2line with --functions is always 2 outputs per
1064 # symbol, function name followed by source line number. Only grab
1065 # the function name as line info is not always available.
1066 frame.name = stdout_data[i * 2]
1067
1068 def Symbolize(self, symfile, unsymbolized_name):
1069 if self.is_mac:
1070 self._SymbolizeMac(symfile)
1071 elif self.is_win:
1072 self._SymbolizeWin(symfile)
1073 else:
1074 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
1075
1076 def IsSymbolizableFile(self, file_path):
1077 if self.is_win:
1078 extension = os.path.splitext(file_path)[1].lower()
1079 return extension in ['.dll', '.exe']
1080 else:
1081 result = subprocess.check_output(['file', '-0', file_path])
1082 type_string = result[result.find('\0') + 1:]
1083 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
1084 type_string, re.DOTALL))
1085
1086
400 def SymbolizeFiles(symfiles, symbolizer): 1087 def SymbolizeFiles(symfiles, symbolizer):
401 """Symbolizes each file in the given list of SymbolizableFiles 1088 """Symbolizes each file in the given list of SymbolizableFiles
402 and updates stack frames with symbolization results.""" 1089 and updates stack frames with symbolization results."""
1090
1091 if not symfiles:
1092 print 'Nothing to symbolize.'
1093 return
1094
403 print 'Symbolizing...' 1095 print 'Symbolizing...'
404 1096
405 def _SubPrintf(message, *args): 1097 def _SubPrintf(message, *args):
406 print (' ' + message).format(*args) 1098 print (' ' + message).format(*args)
407 1099
408 symbolized = False
409 for symfile in symfiles: 1100 for symfile in symfiles:
410 unsymbolized_name = '<{}>'.format( 1101 unsymbolized_name = '<{}>'.format(
411 symfile.path if symfile.path else 'unnamed') 1102 symfile.path if symfile.path else 'unnamed')
412 1103
413 problem = None 1104 problem = None
414 if not os.path.isabs(symfile.symbolizable_path): 1105 if not os.path.isabs(symfile.symbolizable_path):
415 problem = 'not a file' 1106 problem = 'not a file'
416 elif not os.path.isfile(symfile.symbolizable_path): 1107 elif not os.path.isfile(symfile.symbolizable_path):
417 problem = "file doesn't exist" 1108 problem = "file doesn't exist"
418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): 1109 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):
419 problem = 'file is not symbolizable' 1110 problem = 'file is not symbolizable'
420 if problem: 1111 if problem:
421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", 1112 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",
422 len(symfile.frames_by_address), 1113 len(symfile.frames_by_address),
423 symfile.symbolizable_path, 1114 symfile.symbolizable_path,
424 problem) 1115 problem)
425 for frames in symfile.frames_by_address.itervalues(): 1116 for frames in symfile.frames_by_address.itervalues():
426 for frame in frames: 1117 for frame in frames:
427 frame.name = unsymbolized_name 1118 frame.name = unsymbolized_name
428 continue 1119 continue
429 1120
430 _SubPrintf('Symbolizing {} PCs from {}...', 1121 _SubPrintf('Symbolizing {} PCs from {}...',
431 len(symfile.frames_by_address), 1122 len(symfile.frames_by_address),
432 symfile.path) 1123 symfile.path)
433 1124
434 symbolizer.Symbolize(symfile, unsymbolized_name) 1125 symbolizer.Symbolize(symfile, unsymbolized_name)
435 symbolized = True
436 1126
437 return symbolized 1127
1128 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)
1129 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available
1130 # via 'name' group.
1131 ANDROID_PATH_MATCHER = re.compile(
1132 r'^/data/(?:'
1133 r'app/[^/]+/lib/[^/]+/|'
1134 r'app-lib/[^/]+/|'
1135 r'data/[^/]+/incremental-install-files/lib/'
1136 r')(?P<name>.*\.so)')
1137
1138 # Subpath of output path where unstripped libraries are stored.
1139 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
438 1140
439 1141
440 def HaveFilesFromAndroid(symfiles): 1142 def HaveFilesFromAndroid(symfiles):
441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) 1143 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)
442 1144
443 1145
444 def RemapAndroidFiles(symfiles, output_path): 1146 def RemapAndroidFiles(symfiles, output_path):
445 for symfile in symfiles: 1147 for symfile in symfiles:
446 match = ANDROID_PATH_MATCHER.match(symfile.path) 1148 match = ANDROID_PATH_MATCHER.match(symfile.path)
447 if match: 1149 if match:
448 name = match.group('name') 1150 name = match.group('name')
449 symfile.symbolizable_path = os.path.join( 1151 symfile.symbolizable_path = os.path.join(
450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) 1152 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)
451 else: 1153 else:
452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). 1154 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().
453 # Without this, files won't be symbolized with "file not found" problem, 1155 # Without this, files won't be symbolized with "file not found" problem,
454 # which is not accurate. 1156 # which is not accurate.
455 symfile.symbolizable_path = 'android://{}'.format(symfile.path) 1157 symfile.symbolizable_path = 'android://{}'.format(symfile.path)
456 1158
457 1159
1160 def Symbolize(options, trace, symbolizer):
1161 symfiles = ResolveSymbolizableFiles(trace.processes)
1162
1163 # Android trace files don't have any indication they are from Android.
Primiano Tucci (use gerrit) 2017/05/03 17:25:04 As per discussion offline, maybe specify: traces c
fmeawad 2017/05/03 18:19:34 look for os-name in the metadata
1164 # So we're checking for Android-specific paths.
1165 if HaveFilesFromAndroid(symfiles):
1166 if not options.output_directory:
1167 sys.exit('The trace file appears to be from Android. Please '
1168 'specify output directory to properly symbolize it.')
1169 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
1170
1171 SymbolizeFiles(symfiles, symbolizer)
1172
1173
1174 def OpenTraceFile(file_path, mode):
1175 if file_path.endswith('.gz'):
1176 return gzip.open(file_path, mode + 'b')
1177 else:
1178 return open(file_path, mode + 't')
1179
1180
458 # Suffix used for backup files. 1181 # Suffix used for backup files.
459 BACKUP_FILE_TAG = '.BACKUP' 1182 BACKUP_FILE_TAG = '.BACKUP'
460 1183
461 def main(): 1184 def main():
462 parser = argparse.ArgumentParser() 1185 class MultilineHelpFormatter(argparse.HelpFormatter):
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 For a one file python script, having a custom form
DmitrySkiba 2017/05/04 00:30:55 Hmm, actually this is a leftover from a version th
463 parser.add_argument('file', 1186 def _split_lines(self, text, width):
464 help='Trace file to symbolize (.json or .json.gz)') 1187 extra_lines = []
465 parser.add_argument('--no-backup', 1188 if '\n' in text:
466 dest='backup', default='true', action='store_false', 1189 lines = text.splitlines()
467 help="Don't create {} files".format(BACKUP_FILE_TAG)) 1190 text = lines[0]
468 parser.add_argument('--output-directory', 1191 extra_lines = lines[1:]
469 help='The path to the build output directory, such ' + 1192 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \
470 'as out/Debug. Only needed for Android.') 1193 extra_lines
471 options = parser.parse_args()
472 1194
473 trace_file_path = options.file 1195 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter)
474 def _OpenTraceFile(mode): 1196 parser.add_argument(
475 if trace_file_path.endswith('.gz'): 1197 'file',
476 return gzip.open(trace_file_path, mode + 'b') 1198 help='Trace file to symbolize (.json or .json.gz)')
477 else: 1199
478 return open(trace_file_path, mode + 't') 1200 parser.add_argument(
1201 '--no-backup', dest='backup', default='true', action='store_false',
1202 help="Don't create {} files".format(BACKUP_FILE_TAG))
1203
1204 parser.add_argument(
1205 '--output-directory',
1206 help='The path to the build output directory, such as out/Debug.')
479 1207
480 symbolizer = Symbolizer() 1208 symbolizer = Symbolizer()
481 if symbolizer.symbolizer_path is None: 1209 if symbolizer.symbolizer_path is None:
482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) 1210 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)
483 1211
1212 options = parser.parse_args()
1213
1214 trace_file_path = options.file
1215
484 print 'Reading trace file...' 1216 print 'Reading trace file...'
485 with _OpenTraceFile('r') as trace_file: 1217 with OpenTraceFile(trace_file_path, 'r') as trace_file:
486 trace = json.load(trace_file) 1218 trace = Trace(json.load(trace_file))
487 1219
488 processes = CollectProcesses(trace) 1220 Symbolize(options, trace, symbolizer)
489 symfiles = ResolveSymbolizableFiles(processes)
490 1221
491 # Android trace files don't have any indication they are from Android. 1222 if trace.modified:
492 # So we're checking for Android-specific paths. 1223 trace.ApplyModifications()
493 if HaveFilesFromAndroid(symfiles):
494 if not options.output_directory:
495 parser.error('The trace file appears to be from Android. Please '
496 "specify output directory (e.g. 'out/Debug') to properly "
497 'symbolize it.')
498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
499 1224
500 if SymbolizeFiles(symfiles, symbolizer):
501 if options.backup: 1225 if options.backup:
502 backup_file_path = trace_file_path + BACKUP_FILE_TAG 1226 backup_file_path = trace_file_path + BACKUP_FILE_TAG
503 print 'Backing up trace file to {}...'.format(backup_file_path) 1227 if os.path.exists(backup_file_path):
Primiano Tucci (use gerrit) 2017/05/03 17:25:05 isn't this a bit too much and really worth the com
DmitrySkiba 2017/05/04 00:30:55 Also a leftover from a previous versions. Removed.
1228 for i in itertools.count(1):
1229 unique_file_path = '{}{}'.format(backup_file_path, i)
1230 if not os.path.exists(unique_file_path):
1231 backup_file_path = unique_file_path
1232 break
1233 print 'Backing up trace file to {}'.format(backup_file_path)
504 os.rename(trace_file_path, backup_file_path) 1234 os.rename(trace_file_path, backup_file_path)
505 1235
506 print 'Updating trace file...' 1236 print 'Updating the trace file...'
507 with _OpenTraceFile('w') as trace_file: 1237 with OpenTraceFile(trace_file_path, 'w') as trace_file:
508 json.dump(trace, trace_file) 1238 json.dump(trace.node, trace_file)
509 else: 1239 else:
510 print 'No PCs symbolized - not updating trace file.' 1240 print 'No modifications were made - not updating the trace file.'
511 1241
512 1242
513 if __name__ == '__main__': 1243 if __name__ == '__main__':
514 main() 1244 main()
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698