Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(36)

Side by Side Diff: tracing/bin/symbolize_trace

Issue 2810523002: symbolize_trace: support new heap dump format. (Closed)
Patch Set: ParseMore -> ParseNext Created 3 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 # Copyright 2016 The Chromium Authors. All rights reserved. 2 # Copyright 2016 The Chromium Authors. All rights reserved.
3 # Use of this source code is governed by a BSD-style license that can be 3 # Use of this source code is governed by a BSD-style license that can be
4 # found in the LICENSE file. 4 # found in the LICENSE file.
5 5
6 """
7 This script processes trace files and symbolizes stack frames generated by
8 Chrome's native heap profiler.
9
10 === Overview ===
11
12 Trace file is essentially a giant JSON array of dictionaries (events).
13 Events have some predefined keys, but otherwise are free to have anything
14 inside. Trace file contains events from all Chrome processes that were
15 sampled during tracing period (and 'pid' is an example of a predefined key).
Wez 2017/04/29 00:41:21 nit: "(and 'pid' is an example..." reads oddly her
DmitrySkiba 2017/05/02 06:19:59 Done.
16
17 This script cares only about memory dump events generated by memory-infra
Wez 2017/04/29 00:41:21 nit: Suggest "...dump events in trace files genera
DmitrySkiba 2017/05/02 06:19:59 Done.
18 component.
19
20 When Chrome native heap profiling is enabled, some memory dump events
21 include the following extra information:
22
23 * (Per allocator) Information about live allocations at the moment of the
24 memory dump (the information includes backtraces, types / categories,
25 sizes, and counts of allocations). There are several allocators in
26 Chrome: malloc, blink_gc, and partition_alloc.
Wez 2017/04/29 00:41:21 nit: If these are examples, not an exhaustive list
DmitrySkiba 2017/05/02 06:19:59 This is actually an exhaustive list.
Wez 2017/05/03 00:17:09 OK; in that case I would say "There are three allo
DmitrySkiba 2017/05/04 00:30:55 Acknowledged.
27
28 * (Per process) Stack frame tree of all functions that called allocators
29 above.
Wez 2017/04/29 00:41:21 nit: If we failed to trace all the way back to mai
DmitrySkiba 2017/05/02 06:19:59 It's still a single tree, just with an implicit ro
Wez 2017/05/03 00:17:10 OK; you could add a brief note that effect here, f
DmitrySkiba 2017/05/04 00:30:55 Acknowledged.
30
31 This script does the following:
32
33 1. Parses the given trace file.
34 2. Finds memory dump events and parses stack frame tree for each process.
35 3. Finds stack frames that have PC addresses instead of function names.
36 4. Symbolizes these PCs.
37 6. Rewrites stack frame names (this updates parts of memory dump events).
Wez 2017/04/29 00:41:21 nit: You're missing #5 ;) It's also not clear wha
DmitrySkiba 2017/05/02 06:19:59 Done. Added note about script not coalescing such
Wez 2017/05/03 00:17:09 Acknowledged.
38 7. Updates the trace file.
39
40 === Details ===
41
42 There are two formats of heap profiler information: legacy and modern. The
43 main differences are:
44
45 * In the legacy format stack frame tree is not dumped in memory dump events,
46 but in metadata events (one per process). I.e. it's sufficient to parse
47 a single metadata event to get full stack frame tree for a process.
Wez 2017/04/29 00:41:21 IIUC the point here is that every "event" in a leg
DmitrySkiba 2017/05/02 06:19:59 Both formats dump live objects per allocator in ea
Wez 2017/05/03 00:17:10 Thanks for adding this detail, however it seems a
DmitrySkiba 2017/05/04 00:30:55 Well, the section is named "Details", and details
48
49 * In the modern format stack frame tree (also type name and string mappings)
50 are dumped incrementally. I.e. each memory dump event carries additions to
51 the stack frame tree that occurred since the previous memory dump event.
Wez 2017/04/29 00:41:21 You might express this as each memory-infra event
DmitrySkiba 2017/05/02 06:19:59 Done.
52 To get the full stack frame tree for a process the script needs to parse
53 all memory dump events. However, when wrappers update incremental nodes,
54 they put everything in the first node, and clear all others.
Wez 2017/04/29 00:41:21 Not sure what you mean about moving everything int
DmitrySkiba 2017/05/02 06:19:59 Explained more.
55
56 * In the modern format stack frame tree doesn't reference name strings
57 directly, but through a string mapping table.
58
59 See crbug.com/708930 for more information about the modern format.
60 """
61
6 import argparse 62 import argparse
7 import bisect 63 import bisect
8 import collections 64 import collections
9 import gzip 65 import gzip
66 import itertools
10 import json 67 import json
11 import os 68 import os
12 import re 69 import re
13 import subprocess 70 import subprocess
14 import sys 71 import sys
15 72
16 _SYMBOLS_PATH = os.path.abspath(os.path.join( 73 _SYMBOLS_PATH = os.path.abspath(os.path.join(
17 os.path.dirname(os.path.realpath(__file__)), 74 os.path.dirname(os.path.realpath(__file__)),
18 '..', 75 '..',
19 'third_party', 76 'third_party',
20 'symbols')) 77 'symbols'))
21 sys.path.append(_SYMBOLS_PATH) 78 sys.path.append(_SYMBOLS_PATH)
22 # pylint: disable=import-error 79 # pylint: disable=import-error
23 import symbols.elf_symbolizer as elf_symbolizer 80 import symbols.elf_symbolizer as elf_symbolizer
24 81
25 import symbolize_trace_atos_regex 82 import symbolize_trace_atos_regex
26 import symbolize_trace_macho_reader 83 import symbolize_trace_macho_reader
27 84
28 85
29 # Relevant trace event phases from Chromium's 86 class NodeWrapper(object):
30 # src/base/trace_event/common/trace_event_common.h. 87 """Wraps an event data node(s).
31 TRACE_EVENT_PHASE_METADATA = 'M' 88
32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v' 89 A node is a reference into a trace event JSON. Wrappers parse nodes to
90 provide convenient APIs and update nodes when asked to propagate changes
91 back (see ApplyModifications() below).
92
93 Here is an example of legacy metadata event that contains stack frame tree:
94
95 {
96 "args": {
97 "stackFrames": { ... }
98 },
99 "cat": "__metadata",
100 "name": "stackFrames",
101 "ph": "M",
102 ...
103 }
104
105 When this event is encountered, a reference to the "stackFrames" dictionary
106 is obtained and passed down to a specific wrapped class, which knows how to
107 parse / update the dictionary.
108
109 There are two parsing patterns depending on whether node is serialized
110 incrementally:
111
112 * If node is not incremental, then parsing is done by __init__(),
113 see MemoryMap for an example.
114
115 * If node is incremental, then __init__() does nothing, and ParseNext()
116 is called when next node (from a next event) is encountered.
117
118 Some wrappers can also modify nodes they parsed. In such cases they have
119 additional APIs:
120
121 * 'modified' flag, which indicates whether the wrapper was changed.
122
123 * 'ApplyModifications' method, which propagates changes made to the wrapper
124 back to nodes. Successful invocation of ApplyModifications() resets
125 'modified' flag.
126
127 """
128
129 # def __init__(self, node):
130 # ...
131
132 # def ParseNext(self, node, ...):
133 # ...
134
135 # @property
136 # def modified(self):
137 # ...
138
139 # def ApplyModifications(self, ...):
140 # ...
141
142 pass
33 143
34 144
35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) 145 class MemoryMap(NodeWrapper):
36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available 146 """Wraps 'process_mmaps' node.
37 # via 'name' group.
38 ANDROID_PATH_MATCHER = re.compile(
39 r'^/data/(?:'
40 r'app/[^/]+/lib/[^/]+/|'
41 r'app-lib/[^/]+/|'
42 r'data/[^/]+/incremental-install-files/lib/'
43 r')(?P<name>.*\.so)')
44 147
45 # Subpath of output path where unstripped libraries are stored. 148 'process_mmaps' node contains information about file mappings.
46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
47 149
48 150 "process_mmaps": {
49 def FindInSystemPath(binary_name): 151 "vm_regions": [
50 paths = os.environ['PATH'].split(os.pathsep) 152 {
51 for path in paths: 153 "mf": "<file_path>",
52 binary_path = os.path.join(path, binary_name) 154 "sa": "<start_address>",
53 if os.path.isfile(binary_path): 155 "sz": "<size>",
54 return binary_path 156 ...
55 return None 157 },
56 158 ...
57 159 ]
58 class Symbolizer(object): 160 }
59 # Encapsulates platform-specific symbolization logic. 161 """
60 def __init__(self):
61 self.is_mac = sys.platform == 'darwin'
62 self.is_win = sys.platform == 'win32'
63 if self.is_mac:
64 self.binary = 'atos'
65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
66 elif self.is_win:
67 self.binary = 'addr2line-pdb.exe'
68 else:
69 self.binary = 'addr2line'
70 self.symbolizer_path = FindInSystemPath(self.binary)
71
72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
73 def _SymbolizerCallback(sym_info, frames):
74 # Unwind inline chain to the top.
75 while sym_info.inlined_by:
76 sym_info = sym_info.inlined_by
77
78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
79 for frame in frames:
80 frame.name = symbolized_name
81
82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
83 self.symbolizer_path,
84 _SymbolizerCallback,
85 inlines=True)
86
87 for address, frames in symfile.frames_by_address.iteritems():
88 # SymbolizeAsync() asserts that the type of address is int. We operate
89 # on longs (since they are raw pointers possibly from 64-bit processes).
90 # It's OK to cast here because we're passing relative PC, which should
91 # always fit into int.
92 symbolizer.SymbolizeAsync(int(address), frames)
93
94 symbolizer.Join()
95
96
97 def _SymbolizeMac(self, symfile):
98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
99
100 # 16 for the address, 2 for "0x", 1 for the space
101 chars_per_address = 19
102
103 load_address = (symbolize_trace_macho_reader.
104 ReadMachOTextLoadAddress(symfile.symbolizable_path))
105 assert load_address is not None
106
107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
108 '0x%x' % load_address, '-o',
109 symfile.symbolizable_path]
110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
111
112 # The maximum number of inputs that can be processed at once is limited by
113 # ARG_MAX. This currently evalutes to ~13000 on macOS.
114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
115
116 all_keys = symfile.frames_by_address.keys()
117 processed_keys_count = 0
118 while len(all_keys):
119 input_count = min(len(all_keys), max_inputs)
120 keys_to_process = all_keys[0:input_count]
121
122 cmd = list(cmd_base)
123 cmd.extend([hex(int(x) + load_address)
124 for x in keys_to_process])
125 output_array = subprocess.check_output(cmd).split('\n')
126 for i in range(len(keys_to_process)):
127 for frame in (symfile.frames_by_address.values()
128 [i + processed_keys_count]):
129 frame.name = self._matcher.Match(output_array[i])
130 processed_keys_count += len(keys_to_process)
131 all_keys = all_keys[input_count:]
132
133
134 def _SymbolizeWin(self, symfile):
135 """Invoke symbolizer binary on windows and write all input in one go.
136
137 Unlike linux, on windows, symbolization talks through a shared system
138 service that handles communication with the NT symbol servers. This
139 creates an explicit serialization (and therefor lock contention) of
140 any process using the symbol API for files do not have a local PDB.
141
142 Thus, even though the windows symbolizer binary can be make command line
143 compatible with the POSIX addr2line interface, paralellizing the
144 symbolization does not yield the same performance effects. Running
145 just one symbolizer seems good enough for now. Can optimize later
146 if this becomes a bottleneck.
147 """
148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
149 symfile.symbolizable_path]
150
151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
152 stderr=sys.stderr)
153 addrs = ["%x" % relative_pc for relative_pc in
154 symfile.frames_by_address.keys()]
155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
156 stdout_data = stdout_data.split('\n')
157
158 # This is known to be in the same order as stderr_data.
159 for i, addr in enumerate(addrs):
160 for frame in symfile.frames_by_address[int(addr, 16)]:
161 # Output of addr2line with --functions is always 2 outputs per
162 # symbol, function name followed by source line number. Only grab
163 # the function name as line info is not always available.
164 frame.name = stdout_data[i * 2]
165
166
167 def Symbolize(self, symfile, unsymbolized_name):
168 if self.is_mac:
169 self._SymbolizeMac(symfile)
170 if self.is_win:
171 self._SymbolizeWin(symfile)
172 else:
173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
174
175
176 def IsSymbolizableFile(self, file_path):
177 if self.is_win:
178 extension = os.path.splitext(file_path)[1].lower()
179 return extension in ['.dll', '.exe']
180 else:
181 result = subprocess.check_output(['file', '-0', file_path])
182 type_string = result[result.find('\0') + 1:]
183 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
184 type_string, re.DOTALL))
185
186
187 class ProcessMemoryMaps(object):
188 """Represents 'process_mmaps' trace file entry."""
189 162
190 class Region(object): 163 class Region(object):
191 def __init__(self, start_address, size, file_path): 164 def __init__(self, start_address, size, file_path):
192 self._start_address = start_address 165 self._start_address = start_address
193 self._size = size 166 self._size = size
194 self._file_path = file_path 167 self._file_path = file_path
195 168
196 @property 169 @property
197 def start_address(self): 170 def start_address(self):
198 return self._start_address 171 return self._start_address
(...skipping 15 matching lines...) Expand all
214 return long(self._start_address).__cmp__(long(other._start_address)) 187 return long(self._start_address).__cmp__(long(other._start_address))
215 elif isinstance(other, (long, int)): 188 elif isinstance(other, (long, int)):
216 return long(self._start_address).__cmp__(long(other)) 189 return long(self._start_address).__cmp__(long(other))
217 else: 190 else:
218 raise Exception('Cannot compare with %s' % type(other)) 191 raise Exception('Cannot compare with %s' % type(other))
219 192
220 def __repr__(self): 193 def __repr__(self):
221 return 'Region(0x{:X} - 0x{:X}, {})'.format( 194 return 'Region(0x{:X} - 0x{:X}, {})'.format(
222 self.start_address, self.end_address, self.file_path) 195 self.start_address, self.end_address, self.file_path)
223 196
224 def __init__(self, process_mmaps): 197 def __init__(self, process_mmaps_node):
225 """Parses 'process_mmaps' dictionary."""
226
227 regions = [] 198 regions = []
228 for region_value in process_mmaps['vm_regions']: 199 for region_node in process_mmaps_node['vm_regions']:
229 regions.append(self.Region( 200 regions.append(self.Region(
230 long(region_value['sa'], 16), 201 long(region_node['sa'], 16),
231 long(region_value['sz'], 16), 202 long(region_node['sz'], 16),
232 region_value['mf'])) 203 region_node['mf']))
233 regions.sort() 204 regions.sort()
234 205
235 # Copy regions without duplicates and check for overlaps. 206 # Copy regions without duplicates and check for overlaps.
236 self._regions = [] 207 self._regions = []
237 previous_region = None 208 previous_region = None
238 for region in regions: 209 for region in regions:
239 if previous_region is not None: 210 if previous_region is not None:
240 if region == previous_region: 211 if region == previous_region:
241 continue 212 continue
242 assert region.start_address >= previous_region.end_address, \ 213 assert region.start_address >= previous_region.end_address, \
243 'Regions {} and {} overlap.'.format(previous_region, region) 214 'Regions {} and {} overlap.'.format(previous_region, region)
244 previous_region = region 215 previous_region = region
245 self._regions.append(region) 216 self._regions.append(region)
246 217
247 @property 218 @property
248 def regions(self): 219 def regions(self):
249 return self._regions 220 return self._regions
250 221
251 def FindRegion(self, address): 222 def FindRegion(self, address):
252 """Finds region containing |address|. Returns None if none found.""" 223 """Finds region containing |address|. Returns None if none found."""
253 224
254 region_index = bisect.bisect_right(self._regions, address) - 1 225 region_index = bisect.bisect_right(self._regions, address) - 1
255 if region_index >= 0: 226 if region_index >= 0:
256 region = self._regions[region_index] 227 region = self._regions[region_index]
257 if address >= region.start_address and address < region.end_address: 228 if address >= region.start_address and address < region.end_address:
258 return region 229 return region
259 return None 230 return None
260 231
261 232
262 class StackFrames(object): 233 class UnsupportedHeapDumpVersionError(Exception):
263 """Represents 'stackFrames' trace file entry.""" 234 """Helper exception class to signal unsupported heap dump version."""
264 235
265 class PCFrame(object): 236 def __init__(self, version):
266 def __init__(self, pc, frame): 237 message = 'Unsupported heap dump version: {}'.format(version)
238 super(UnsupportedHeapDumpVersionError, self).__init__(message)
239
240
241 class StringMap(NodeWrapper):
242 """Wraps all 'strings' nodes for a process.
243
244 'strings' node contains incremental mappings between integer ids and strings.
245
246 "strings": [
247 {
248 "id": <string_id>,
249 "string": <string>
250 },
251 ...
252 ]
253 """
254
255 def __init__(self):
256 self._modified = False
257 self._strings_nodes = []
258 self._string_by_id = {}
259 self._id_by_string = {}
260 self._max_string_id = 0
261
262 @property
263 def modified(self):
264 """Returns True if the wrapper was modified (see NodeWrapper)."""
265 return self._modified
266
267 @property
268 def string_by_id(self):
269 return self._string_by_id
270
271 def ParseNext(self, heap_dump_version, strings_node):
272 """Parses and interns next node (see NodeWrapper)."""
273
274 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
275 raise UnsupportedHeapDumpVersionError(heap_dump_version)
276
277 self._strings_nodes.append(strings_node)
278 for string_node in strings_node:
279 self._Insert(string_node['id'], string_node['string'])
280
281 def Clear(self):
282 """Clears all string mappings."""
283 if self._string_by_id:
284 self._modified = True
285 self._string_by_id = {}
286 self._id_by_string = {}
287 self._Insert(0, '[null]')
288 self._max_string_id = 0
289
290 def AddString(self, string):
291 """Adds a string (if it doesn't exist) and returns its integer id."""
292 string_id = self._id_by_string.get(string)
293 if string_id is None:
294 string_id = self._max_string_id + 1
295 self._Insert(string_id, string)
296 self._modified = True
297 return string_id
298
299 def ApplyModifications(self):
300 """Propagates modifications back to nodes (see NodeWrapper)."""
301 if not self.modified:
302 return
303
304 assert self._strings_nodes, 'no nodes'
305
306 # Serialize into the first node, and clear all others.
307
308 for strings_node in self._strings_nodes:
309 del strings_node[:]
310 strings_node = self._strings_nodes[0]
311 for string_id, string in self._string_by_id.iteritems():
312 strings_node.append({'id': string_id, 'string': string})
313
314 self._modified = False
315
316 def _Insert(self, string_id, string):
317 self._id_by_string[string] = string_id
318 self._string_by_id[string_id] = string
319 self._max_string_id = max(self._max_string_id, string_id)
320
321
322 class TypeNameMap(NodeWrapper):
323 """Wraps all 'types' nodes for a process.
324
325 'types' nodes encode mappings between integer type ids and integer
326 string ids (from 'strings' nodes).
327
328 "types": [
329 {
330 "id": <type_id>,
331 "name_sid": <name_string_id>
332 }
333 ...
334 ]
335
336 For simplicity string ids are translated into strings during parsing,
337 and then translated back to ids in ApplyModifications().
338 """
339 def __init__(self):
340 self._modified = False
341 self._type_name_nodes = []
342 self._name_by_id = {}
343 self._id_by_name = {}
344 self._max_type_id = 0
345
346 @property
347 def modified(self):
348 """Returns True if the wrapper was modified (see NodeWrapper)."""
349 return self._modified
350
351 @property
352 def name_by_id(self):
353 """Returns {id -> name} dict (must not be changed directly)."""
354 return self._name_by_id
355
356 def ParseNext(self, heap_dump_version, type_name_node, string_map):
357 """Parses and interns next node (see NodeWrapper).
358
359 |string_map| - A StringMap object to use to translate string ids
360 to strings.
361 """
362 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
363 raise UnsupportedHeapDumpVersionError(heap_dump_version)
364
365 self._type_name_nodes.append(type_name_node)
366 for type_node in type_name_node:
367 self._Insert(type_node['id'],
368 string_map.string_by_id[type_node['name_sid']])
369
370 def AddType(self, type_name):
371 """Adds a type name (if it doesn't exist) and returns its id."""
372 type_id = self._id_by_name.get(type_name)
373 if type_id is None:
374 type_id = self._max_type_id + 1
375 self._Insert(type_id, type_name)
376 self._modified = True
377 return type_id
378
379 def ApplyModifications(self, string_map, force=False):
380 """Propagates modifications back to nodes.
381
382 |string_map| - A StringMap object to use to translate strings to ids.
383 |force| - Whether to propagate changes regardless of 'modified' flag.
384 """
385 if not self.modified and not force:
386 return
387
388 assert self._type_name_nodes, 'no nodes'
389
390 # Serialize into the first node, and clear all others.
391
392 for types_node in self._type_name_nodes:
393 del types_node[:]
394 types_node = self._type_name_nodes[0]
395 for type_id, type_name in self._name_by_id.iteritems():
396 types_node.append({
397 'id': type_id,
398 'name_sid': string_map.AddString(type_name)})
399
400 self._modified = False
401
402 def _Insert(self, type_id, type_name):
403 self._id_by_name[type_name] = type_id
404 self._name_by_id[type_id] = type_name
405 self._max_type_id = max(self._max_type_id, type_id)
406
407
408 class StackFrameMap(NodeWrapper):
409 """ Wraps stack frame tree nodes for a process.
410
411 For the legacy format this wrapper expects a single 'stackFrames' node
412 (which comes from metadata event):
413
414 "stackFrames": {
415 "<frame_id>": {
416 "name": "<frame_name>"
417 "parent": "<parent_frame_id>"
418 },
419 ...
420 }
421
422 For the modern format this wrapper expects several 'nodes' nodes:
423
424 "nodes": [
425 {
426 "id": <frame_id>,
427 "parent": <parent_frame_id>,
428 "name_sid": <name_string_id>
429 },
430 ...
431 ]
432
433 In both formats frame name is a string. Native heap profiler generates
434 specially formatted frame names (e.g. "pc:10eb78dba") for function
435 addresses (PCs). Inner Frame class below parses name and extracts PC,
436 if it's there.
437 """
438 class Frame(object):
439 def __init__(self, frame_id, name, parent_frame_id):
267 self._modified = False 440 self._modified = False
268 self._pc = pc 441 self._id = frame_id
269 self._frame = frame 442 self._name = name
443 self._pc = self._ParsePC(name)
444 self._parent_id = parent_frame_id
445 self._ext = None
270 446
271 @property 447 @property
272 def modified(self): 448 def modified(self):
449 """Returns True if the frame was modified.
450
451 For example changing frame's name sets this flag (since the change
452 needs to be propagated back to nodes).
453 """
273 return self._modified 454 return self._modified
274 455
275 @property 456 @property
457 def id(self):
458 """Frame id (integer)."""
459 return self._id
460
461 @property
276 def pc(self): 462 def pc(self):
463 """Parsed (integer) PC of the frame, or None."""
277 return self._pc 464 return self._pc
278 465
279 @property 466 @property
280 def name(self): 467 def name(self):
281 return self._frame['name'] 468 """Name of the frame (see above)."""
469 return self._name
282 470
283 @name.setter 471 @name.setter
284 def name(self, value): 472 def name(self, value):
473 """Changes the name. Doesn't affect value of |pc|."""
285 self._modified = True 474 self._modified = True
286 self._frame['name'] = value 475 self._name = value
287 476
288 def __init__(self, stack_frames): 477 @property
289 """Constructs object using 'stackFrames' dictionary.""" 478 def parent_id(self):
290 self._pc_frames = [] 479 """Parent frame id (integer)."""
291 for frame in stack_frames.itervalues(): 480 return self._parent_id
292 pc_frame = self._ParsePCFrame(frame) 481
293 if pc_frame: 482 _PC_TAG = 'pc:'
294 self._pc_frames.append(pc_frame) 483
295 484 def _ParsePC(self, name):
296 @property 485 if not name.startswith(self._PC_TAG):
297 def pc_frames(self): 486 return None
298 return self._pc_frames 487 return long(name[len(self._PC_TAG):], 16)
488
489 def _ClearModified(self):
490 self._modified = False
491
492 def __init__(self):
493 self._modified = False
494 self._heap_dump_version = None
495 self._stack_frames_nodes = []
496 self._frame_by_id = {}
299 497
300 @property 498 @property
301 def modified(self): 499 def modified(self):
302 return any(f.modified for f in self._pc_frames) 500 """Returns True if the wrapper or any of its frames were modified."""
303 501 return (self._modified or
304 _PC_TAG = 'pc:' 502 any(f.modified for f in self._frame_by_id.itervalues()))
305 503
306 @classmethod 504 @property
307 def _ParsePCFrame(self, frame): 505 def frame_by_id(self):
308 name = frame['name'] 506 """Returns {id -> frame} dict (must not be modified directly)."""
309 if not name.startswith(self._PC_TAG): 507 return self._frame_by_id
310 return None 508
311 pc = long(name[len(self._PC_TAG):], 16) 509 def ParseNext(self, heap_dump_version, stack_frames_node, string_map):
312 return self.PCFrame(pc, frame) 510 """Parses the next stack frames node (see NodeWrapper).
313 511
314 512 For the modern format |string_map| is used to translate string ids
315 class Process(object): 513 to strings.
316 """Holds various bits of information about a process in a trace file.""" 514 """
317 515
318 def __init__(self, pid): 516 frame_by_id = {}
319 self.pid = pid 517 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
320 self.name = None 518 if self._stack_frames_nodes:
321 self.mmaps = None 519 raise Exception('Legacy stack frames node is expected only once.')
322 self.stack_frames = None 520 for frame_id, frame_node in stack_frames_node.iteritems():
323 521 frame = self.Frame(frame_id,
324 522 frame_node['name'],
325 def CollectProcesses(trace): 523 frame_node.get('parent'))
326 """Parses trace dictionary and returns pid->Process map of all processes 524 frame_by_id[frame.id] = frame
327 suitable for symbolization (which have both mmaps and stack_frames). 525 else:
526 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
527 raise UnsupportedHeapDumpVersionError(heap_dump_version)
528 for frame_node in stack_frames_node:
529 frame = self.Frame(frame_node['id'],
530 string_map.string_by_id[frame_node['name_sid']],
531 frame_node.get('parent'))
532 frame_by_id[frame.id] = frame
533
534 self._heap_dump_version = heap_dump_version
535 self._stack_frames_nodes.append(stack_frames_node)
536
537 self._frame_by_id = frame_by_id
538
539 def ApplyModifications(self, string_map, force=False):
540 """Applies modifications back to nodes (see NodeWrapper)."""
541
542 if not self.modified and not force:
543 return
544
545 assert self._stack_frames_nodes, 'no nodes'
546 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
547 assert string_map is None, \
548 'string_map should not be used with the legacy format'
549
550 # Serialize frames into the first node, clear all others.
551
552 for frames_node in self._stack_frames_nodes:
553 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
554 frames_node.clear()
555 else:
556 del frames_node[:]
557
558 frames_node = self._stack_frames_nodes[0]
559 for frame in self._frame_by_id.itervalues():
560 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
561 frame_node = {'name': frame.name}
562 frames_node[frame.id] = frame_node
563 else:
564 frame_node = {
565 'id': frame.id,
566 'name_sid': string_map.AddString(frame.name)
567 }
568 frames_node.append(frame_node)
569 if frame.parent_id is not None:
570 frame_node['parent'] = frame.parent_id
571 frame._ClearModified()
572
573 self._modified = False
574
575
576 class Trace(NodeWrapper):
577 """Wrapper for the root trace node (i.e. the trace JSON itself).
578
579 This wrapper parses select nodes from memory-infra events and groups
580 parsed data per-process (see inner Process class below).
328 """ 581 """
329 582
330 process_map = {} 583 # Indicates legacy heap dump format.
331 584 HEAP_DUMP_VERSION_LEGACY = 'Legacy'
332 # Android traces produced via 'chrome://inspect/?tracing#devices' are 585
333 # just list of events. 586 # Indicates variation of a modern heap dump format.
334 events = trace if isinstance(trace, list) else trace['traceEvents'] 587 HEAP_DUMP_VERSION_1 = 1
335 for event in events: 588
336 name = event.get('name') 589 class Process(object):
337 if not name: 590 """Collection of per-process data and wrappers."""
338 continue 591
339 592 def __init__(self, pid):
340 pid = event['pid'] 593 self._pid = pid
341 process = process_map.get(pid) 594 self._name = None
342 if process is None: 595 self._memory_map = None
343 process = Process(pid) 596 self._stack_frame_map = StackFrameMap()
344 process_map[pid] = process 597 self._type_name_map = TypeNameMap()
345 598 self._string_map = StringMap()
346 phase = event['ph'] 599 self._heap_dump_version = None
347 if phase == TRACE_EVENT_PHASE_METADATA: 600
348 if name == 'process_name': 601 @property
349 process.name = event['args']['name'] 602 def modified(self):
350 elif name == 'stackFrames': 603 return self._stack_frame_map.modified or self._type_name_map.modified
351 process.stack_frames = StackFrames(event['args']['stackFrames']) 604
352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP: 605 @property
353 process_mmaps = event['args']['dumps'].get('process_mmaps') 606 def pid(self):
354 if process_mmaps: 607 return self._pid
355 # TODO(dskiba): this parses all process_mmaps, but retains only the 608
356 # last one. We need to parse only once (lazy parsing?). 609 @property
357 process.mmaps = ProcessMemoryMaps(process_mmaps) 610 def name(self):
358 611 return self._name
359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames] 612
613 @property
614 def unique_name(self):
615 """Returns string that includes both process name and its pid."""
616 name = self._name if self._name else 'UnnamedProcess'
617 return '{}({})'.format(name, self._pid)
618
619 @property
620 def memory_map(self):
621 return self._memory_map
622
623 @property
624 def stack_frame_map(self):
625 return self._stack_frame_map
626
627 @property
628 def type_name_map(self):
629 return self._type_name_map
630
631 def ApplyModifications(self):
632 """Calls ApplyModifications() on contained wrappers."""
633 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:
634 self._stack_frame_map.ApplyModifications(None)
635 else:
636 if self._stack_frame_map.modified or self._type_name_map.modified:
637 self._string_map.Clear()
638 self._stack_frame_map.ApplyModifications(self._string_map, force=True)
639 self._type_name_map.ApplyModifications(self._string_map, force=True)
640 self._string_map.ApplyModifications()
641
642 def __init__(self, trace_node):
643 self._trace_node = trace_node
644 self._processes = []
645 self._heap_dump_version = None
646
647 # Misc per-process information needed only during parsing.
648 class ProcessExt(object):
649 def __init__(self, pid):
650 self.process = Trace.Process(pid)
651 self.mapped_entry_names = set()
652 self.process_mmaps_node = None
653 self.seen_strings_node = False
654
655 process_ext_by_pid = {}
656
657 # Android traces produced via 'chrome://inspect/?tracing#devices' are
658 # just list of events.
659 events = trace_node if isinstance(trace_node, list) \
660 else trace_node['traceEvents']
661 for event in events:
662 name = event.get('name')
663 if not name:
664 continue
665
666 pid = event['pid']
667 process_ext = process_ext_by_pid.get(pid)
668 if process_ext is None:
669 process_ext = ProcessExt(pid)
670 process_ext_by_pid[pid] = process_ext
671 process = process_ext.process
672
673 phase = event['ph']
674 if phase == self._EVENT_PHASE_METADATA:
675 if name == 'process_name':
676 process._name = event['args']['name']
677 elif name == 'stackFrames':
678 process._stack_frame_map.ParseNext(
679 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),
680 event['args']['stackFrames'],
681 process._string_map)
682 elif phase == self._EVENT_PHASE_MEMORY_DUMP:
683 dumps = event['args']['dumps']
684 process_mmaps = dumps.get('process_mmaps')
685 if process_mmaps:
686 # We want the most recent memory map, so parsing happens later
687 # once we finished reading all events.
688 process_ext.process_mmaps_node = process_mmaps
689 heaps = dumps.get('heaps_v2')
690 if heaps:
691 version = self._UseHeapDumpVersion(heaps['version'])
692 maps = heaps.get('maps')
693 if maps:
694 process_ext.mapped_entry_names.update(maps.iterkeys())
695 types = maps.get('types')
696 stack_frames = maps.get('nodes')
697 strings = maps.get('strings')
698 if (strings is None and (types or stack_frames)
699 and not process_ext.seen_strings_node):
700 # ApplyModifications() for TypeNameMap and StackFrameMap puts
701 # everything into the first node and depends on StringMap. So
702 # we need to make sure that 'strings' node is there if any of
703 # other two nodes present.
704 strings = []
705 maps['strings'] = strings
706 if strings is not None:
707 process_ext.seen_strings_node = True
708 process._string_map.ParseNext(version, strings)
709 if types:
710 process._type_name_map.ParseNext(
711 version, types, process._string_map)
712 if stack_frames:
713 process._stack_frame_map.ParseNext(
714 version, stack_frames, process._string_map)
715
716 self._processes = []
717 for pe in process_ext_by_pid.itervalues():
718 pe.process._heap_dump_version = self._heap_dump_version
719 if pe.process_mmaps_node:
720 # Now parse the most recent memory map.
721 pe.process._memory_map = MemoryMap(pe.process_mmaps_node)
722 self._processes.append(pe.process)
723
724 @property
725 def node(self):
726 """Root node (that was passed to the __init__)."""
727 return self._trace_node
728
729 @property
730 def modified(self):
731 """Returns True if trace file needs to be updated.
732
733 Before writing trace JSON back to a file ApplyModifications() needs
734 to be called.
735 """
736 return any(p.modified for p in self._processes)
737
738 @property
739 def processes(self):
740 return self._processes
741
742 @property
743 def heap_dump_version(self):
744 return self._heap_dump_version
745
746 def ApplyModifications(self):
747 """Propagates modifications back to the trace JSON."""
748 for process in self._processes:
749 process.ApplyModifications()
750 assert not self.modified, 'still modified'
751
752 # Relevant trace event phases from Chromium's
753 # src/base/trace_event/common/trace_event_common.h.
754 _EVENT_PHASE_METADATA = 'M'
755 _EVENT_PHASE_MEMORY_DUMP = 'v'
756
757 def _UseHeapDumpVersion(self, version):
758 if self._heap_dump_version is None:
759 self._heap_dump_version = version
760 return version
761 elif self._heap_dump_version != version:
762 raise Exception(
763 ("Inconsistent trace file: first saw '{}' heap dump version, "
764 "then '{}'.").format(self._heap_dump_version, version))
765 else:
766 return version
360 767
361 768
362 class SymbolizableFile(object): 769 class SymbolizableFile(object):
363 """Holds file path, addresses to symbolize and stack frames to update. 770 """Holds file path, addresses to symbolize and stack frames to update.
364 771
365 This class is a link between ELFSymbolizer and a trace file: it specifies 772 This class is a link between ELFSymbolizer and a trace file: it specifies
366 what to symbolize (addresses) and what to update with the symbolization 773 what to symbolize (addresses) and what to update with the symbolization
367 result (frames). 774 result (frames).
368 """ 775 """
369 def __init__(self, file_path): 776 def __init__(self, file_path):
370 self.path = file_path 777 self.path = file_path
371 self.symbolizable_path = file_path # path to use for symbolization 778 self.symbolizable_path = file_path # path to use for symbolization
372 self.frames_by_address = collections.defaultdict(list) 779 self.frames_by_address = collections.defaultdict(list)
373 780
374 781
375 def ResolveSymbolizableFiles(processes): 782 def ResolveSymbolizableFiles(processes):
376 """Resolves and groups PCs into list of SymbolizableFiles. 783 """Resolves and groups PCs into list of SymbolizableFiles.
377 784
378 As part of the grouping process, this function resolves PC from each stack 785 As part of the grouping process, this function resolves PC from each stack
379 frame to the corresponding mmap region. Stack frames that failed to resolve 786 frame to the corresponding mmap region. Stack frames that failed to resolve
380 are symbolized with '<unresolved>'. 787 are symbolized with '<unresolved>'.
381 """ 788 """
382 symfile_by_path = {} 789 symfile_by_path = {}
383 for process in processes: 790 for process in processes:
384 for frame in process.stack_frames.pc_frames: 791 if not process.memory_map:
385 region = process.mmaps.FindRegion(frame.pc) 792 continue
793 for frame in process.stack_frame_map.frame_by_id.itervalues():
794 if frame.pc is None:
795 continue
796 region = process.memory_map.FindRegion(frame.pc)
386 if region is None: 797 if region is None:
387 frame.name = '<unresolved>' 798 frame.name = '<unresolved>'
388 continue 799 continue
389 800
390 symfile = symfile_by_path.get(region.file_path) 801 symfile = symfile_by_path.get(region.file_path)
391 if symfile is None: 802 if symfile is None:
392 symfile = SymbolizableFile(region.file_path) 803 symfile = SymbolizableFile(region.file_path)
393 symfile_by_path[symfile.path] = symfile 804 symfile_by_path[symfile.path] = symfile
394 805
395 relative_pc = frame.pc - region.start_address 806 relative_pc = frame.pc - region.start_address
396 symfile.frames_by_address[relative_pc].append(frame) 807 symfile.frames_by_address[relative_pc].append(frame)
397 return symfile_by_path.values() 808 return symfile_by_path.values()
398 809
399 810
811 def FindInSystemPath(binary_name):
812 paths = os.environ['PATH'].split(os.pathsep)
813 for path in paths:
814 binary_path = os.path.join(path, binary_name)
815 if os.path.isfile(binary_path):
816 return binary_path
817 return None
818
819
820 class Symbolizer(object):
821 """Encapsulates platform-specific symbolization logic."""
822
823 def __init__(self):
824 self.is_mac = sys.platform == 'darwin'
825 self.is_win = sys.platform == 'win32'
826 if self.is_mac:
827 self.binary = 'atos'
828 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()
829 elif self.is_win:
830 self.binary = 'addr2line-pdb.exe'
831 else:
832 self.binary = 'addr2line'
833 self.symbolizer_path = FindInSystemPath(self.binary)
834
835 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):
836 def _SymbolizerCallback(sym_info, frames):
837 # Unwind inline chain to the top.
838 while sym_info.inlined_by:
839 sym_info = sym_info.inlined_by
840
841 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name
842 for frame in frames:
843 frame.name = symbolized_name
844 frame.ext.source_path = sym_info.source_path
845
846 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,
847 self.symbolizer_path,
848 _SymbolizerCallback,
849 inlines=True)
850
851 for address, frames in symfile.frames_by_address.iteritems():
852 # SymbolizeAsync() asserts that the type of address is int. We operate
853 # on longs (since they are raw pointers possibly from 64-bit processes).
854 # It's OK to cast here because we're passing relative PC, which should
855 # always fit into int.
856 symbolizer.SymbolizeAsync(int(address), frames)
857
858 symbolizer.Join()
859
860
861 def _SymbolizeMac(self, symfile):
862 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))
863
864 # 16 for the address, 2 for "0x", 1 for the space
865 chars_per_address = 19
866
867 load_address = (symbolize_trace_macho_reader.
868 ReadMachOTextLoadAddress(symfile.symbolizable_path))
869 assert load_address is not None
870
871 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',
872 '0x%x' % load_address, '-o',
873 symfile.symbolizable_path]
874 chars_for_other_arguments = len(' '.join(cmd_base)) + 1
875
876 # The maximum number of inputs that can be processed at once is limited by
877 # ARG_MAX. This currently evalutes to ~13000 on macOS.
878 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address
879
880 all_keys = symfile.frames_by_address.keys()
881 processed_keys_count = 0
882 while len(all_keys):
883 input_count = min(len(all_keys), max_inputs)
884 keys_to_process = all_keys[0:input_count]
885 cmd = list(cmd_base)
886 cmd.extend([hex(int(x) + load_address)
887 for x in keys_to_process])
888 output_array = subprocess.check_output(cmd).split('\n')
889 for i in range(len(keys_to_process)):
890 for frame in (symfile.frames_by_address.values()
891 [i + processed_keys_count]):
892 frame.name = self._matcher.Match(output_array[i])
893 processed_keys_count += len(keys_to_process)
894 all_keys = all_keys[input_count:]
895
896 def _SymbolizeWin(self, symfile):
897 """Invoke symbolizer binary on windows and write all input in one go.
898
899 Unlike linux, on windows, symbolization talks through a shared system
900 service that handles communication with the NT symbol servers. This
901 creates an explicit serialization (and therefor lock contention) of
902 any process using the symbol API for files do not have a local PDB.
903
904 Thus, even though the windows symbolizer binary can be make command line
905 compatible with the POSIX addr2line interface, paralellizing the
906 symbolization does not yield the same performance effects. Running
907 just one symbolizer seems good enough for now. Can optimize later
908 if this becomes a bottleneck.
909 """
910 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',
911 symfile.symbolizable_path]
912
913 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,
914 stderr=sys.stderr)
915 addrs = ["%x" % relative_pc for relative_pc in
916 symfile.frames_by_address.keys()]
917 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))
918 stdout_data = stdout_data.split('\n')
919
920 # This is known to be in the same order as stderr_data.
921 for i, addr in enumerate(addrs):
922 for frame in symfile.frames_by_address[int(addr, 16)]:
923 # Output of addr2line with --functions is always 2 outputs per
924 # symbol, function name followed by source line number. Only grab
925 # the function name as line info is not always available.
926 frame.name = stdout_data[i * 2]
927
928 def Symbolize(self, symfile, unsymbolized_name):
929 if self.is_mac:
930 self._SymbolizeMac(symfile)
931 elif self.is_win:
932 self._SymbolizeWin(symfile)
933 else:
934 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)
935
936 def IsSymbolizableFile(self, file_path):
937 if self.is_win:
938 extension = os.path.splitext(file_path)[1].lower()
939 return extension in ['.dll', '.exe']
940 else:
941 result = subprocess.check_output(['file', '-0', file_path])
942 type_string = result[result.find('\0') + 1:]
943 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',
944 type_string, re.DOTALL))
945
946
400 def SymbolizeFiles(symfiles, symbolizer): 947 def SymbolizeFiles(symfiles, symbolizer):
401 """Symbolizes each file in the given list of SymbolizableFiles 948 """Symbolizes each file in the given list of SymbolizableFiles
402 and updates stack frames with symbolization results.""" 949 and updates stack frames with symbolization results."""
950
951 if not symfiles:
952 print 'Nothing to symbolize.'
953 return
954
403 print 'Symbolizing...' 955 print 'Symbolizing...'
404 956
405 def _SubPrintf(message, *args): 957 def _SubPrintf(message, *args):
406 print (' ' + message).format(*args) 958 print (' ' + message).format(*args)
407 959
408 symbolized = False
409 for symfile in symfiles: 960 for symfile in symfiles:
410 unsymbolized_name = '<{}>'.format( 961 unsymbolized_name = '<{}>'.format(
411 symfile.path if symfile.path else 'unnamed') 962 symfile.path if symfile.path else 'unnamed')
412 963
413 problem = None 964 problem = None
414 if not os.path.isabs(symfile.symbolizable_path): 965 if not os.path.isabs(symfile.symbolizable_path):
415 problem = 'not a file' 966 problem = 'not a file'
416 elif not os.path.isfile(symfile.symbolizable_path): 967 elif not os.path.isfile(symfile.symbolizable_path):
417 problem = "file doesn't exist" 968 problem = "file doesn't exist"
418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): 969 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):
419 problem = 'file is not symbolizable' 970 problem = 'file is not symbolizable'
420 if problem: 971 if problem:
421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", 972 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",
422 len(symfile.frames_by_address), 973 len(symfile.frames_by_address),
423 symfile.symbolizable_path, 974 symfile.symbolizable_path,
424 problem) 975 problem)
425 for frames in symfile.frames_by_address.itervalues(): 976 for frames in symfile.frames_by_address.itervalues():
426 for frame in frames: 977 for frame in frames:
427 frame.name = unsymbolized_name 978 frame.name = unsymbolized_name
428 continue 979 continue
429 980
430 _SubPrintf('Symbolizing {} PCs from {}...', 981 _SubPrintf('Symbolizing {} PCs from {}...',
431 len(symfile.frames_by_address), 982 len(symfile.frames_by_address),
432 symfile.path) 983 symfile.path)
433 984
434 symbolizer.Symbolize(symfile, unsymbolized_name) 985 symbolizer.Symbolize(symfile, unsymbolized_name)
435 symbolized = True
436 986
437 return symbolized 987
988 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)
989 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available
990 # via 'name' group.
991 ANDROID_PATH_MATCHER = re.compile(
992 r'^/data/(?:'
993 r'app/[^/]+/lib/[^/]+/|'
994 r'app-lib/[^/]+/|'
995 r'data/[^/]+/incremental-install-files/lib/'
996 r')(?P<name>.*\.so)')
997
998 # Subpath of output path where unstripped libraries are stored.
999 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'
438 1000
439 1001
440 def HaveFilesFromAndroid(symfiles): 1002 def HaveFilesFromAndroid(symfiles):
441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) 1003 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)
442 1004
443 1005
444 def RemapAndroidFiles(symfiles, output_path): 1006 def RemapAndroidFiles(symfiles, output_path):
445 for symfile in symfiles: 1007 for symfile in symfiles:
446 match = ANDROID_PATH_MATCHER.match(symfile.path) 1008 match = ANDROID_PATH_MATCHER.match(symfile.path)
447 if match: 1009 if match:
448 name = match.group('name') 1010 name = match.group('name')
449 symfile.symbolizable_path = os.path.join( 1011 symfile.symbolizable_path = os.path.join(
450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) 1012 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)
451 else: 1013 else:
452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). 1014 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().
453 # Without this, files won't be symbolized with "file not found" problem, 1015 # Without this, files won't be symbolized with "file not found" problem,
454 # which is not accurate. 1016 # which is not accurate.
455 symfile.symbolizable_path = 'android://{}'.format(symfile.path) 1017 symfile.symbolizable_path = 'android://{}'.format(symfile.path)
456 1018
457 1019
1020 def Symbolize(options, trace, symbolizer):
1021 symfiles = ResolveSymbolizableFiles(trace.processes)
1022
1023 # Android trace files don't have any indication they are from Android.
1024 # So we're checking for Android-specific paths.
1025 if HaveFilesFromAndroid(symfiles):
1026 if not options.output_directory:
1027 sys.exit('The trace file appears to be from Android. Please '
1028 'specify output directory to properly symbolize it.')
1029 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
1030
1031 SymbolizeFiles(symfiles, symbolizer)
1032
1033
1034 def OpenTraceFile(file_path, mode):
1035 if file_path.endswith('.gz'):
1036 return gzip.open(file_path, mode + 'b')
1037 else:
1038 return open(file_path, mode + 't')
1039
1040
458 # Suffix used for backup files. 1041 # Suffix used for backup files.
459 BACKUP_FILE_TAG = '.BACKUP' 1042 BACKUP_FILE_TAG = '.BACKUP'
460 1043
461 def main(): 1044 def main():
462 parser = argparse.ArgumentParser() 1045 class MultilineHelpFormatter(argparse.HelpFormatter):
463 parser.add_argument('file', 1046 def _split_lines(self, text, width):
464 help='Trace file to symbolize (.json or .json.gz)') 1047 extra_lines = []
465 parser.add_argument('--no-backup', 1048 if '\n' in text:
466 dest='backup', default='true', action='store_false', 1049 lines = text.splitlines()
467 help="Don't create {} files".format(BACKUP_FILE_TAG)) 1050 text = lines[0]
468 parser.add_argument('--output-directory', 1051 extra_lines = lines[1:]
469 help='The path to the build output directory, such ' + 1052 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \
470 'as out/Debug. Only needed for Android.') 1053 extra_lines
471 options = parser.parse_args()
472 1054
473 trace_file_path = options.file 1055 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter)
474 def _OpenTraceFile(mode): 1056 parser.add_argument(
475 if trace_file_path.endswith('.gz'): 1057 'file',
476 return gzip.open(trace_file_path, mode + 'b') 1058 help='Trace file to symbolize (.json or .json.gz)')
477 else: 1059
478 return open(trace_file_path, mode + 't') 1060 parser.add_argument(
1061 '--no-backup', dest='backup', default='true', action='store_false',
1062 help="Don't create {} files".format(BACKUP_FILE_TAG))
1063
1064 parser.add_argument(
1065 '--output-directory',
1066 help='The path to the build output directory, such as out/Debug.')
479 1067
480 symbolizer = Symbolizer() 1068 symbolizer = Symbolizer()
481 if symbolizer.symbolizer_path is None: 1069 if symbolizer.symbolizer_path is None:
482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) 1070 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)
483 1071
1072 options = parser.parse_args()
1073
1074 trace_file_path = options.file
1075
484 print 'Reading trace file...' 1076 print 'Reading trace file...'
485 with _OpenTraceFile('r') as trace_file: 1077 with OpenTraceFile(trace_file_path, 'r') as trace_file:
486 trace = json.load(trace_file) 1078 trace = Trace(json.load(trace_file))
487 1079
488 processes = CollectProcesses(trace) 1080 Symbolize(options, trace, symbolizer)
489 symfiles = ResolveSymbolizableFiles(processes)
490 1081
491 # Android trace files don't have any indication they are from Android. 1082 if trace.modified:
492 # So we're checking for Android-specific paths. 1083 trace.ApplyModifications()
493 if HaveFilesFromAndroid(symfiles):
494 if not options.output_directory:
495 parser.error('The trace file appears to be from Android. Please '
496 "specify output directory (e.g. 'out/Debug') to properly "
497 'symbolize it.')
498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))
499 1084
500 if SymbolizeFiles(symfiles, symbolizer):
501 if options.backup: 1085 if options.backup:
502 backup_file_path = trace_file_path + BACKUP_FILE_TAG 1086 backup_file_path = trace_file_path + BACKUP_FILE_TAG
503 print 'Backing up trace file to {}...'.format(backup_file_path) 1087 if os.path.exists(backup_file_path):
1088 for i in itertools.count(1):
1089 unique_file_path = '{}{}'.format(backup_file_path, i)
1090 if not os.path.exists(unique_file_path):
1091 backup_file_path = unique_file_path
1092 break
1093 print 'Backing up trace file to {}'.format(backup_file_path)
504 os.rename(trace_file_path, backup_file_path) 1094 os.rename(trace_file_path, backup_file_path)
505 1095
506 print 'Updating trace file...' 1096 print 'Updating the trace file...'
507 with _OpenTraceFile('w') as trace_file: 1097 with OpenTraceFile(trace_file_path, 'w') as trace_file:
508 json.dump(trace, trace_file) 1098 json.dump(trace.node, trace_file)
509 else: 1099 else:
510 print 'No PCs symbolized - not updating trace file.' 1100 print 'No modifications were made - not updating the trace file.'
511 1101
512 1102
513 if __name__ == '__main__': 1103 if __name__ == '__main__':
514 main() 1104 main()
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698