tracing/bin/symbolize_trace - Issue 2810523002: symbolize_trace: support new heap dump format.

Side by Side Diff: tracing/bin/symbolize_trace

Issue 2810523002: symbolize_trace: support new heap dump format. (Closed)

Patch Set: We need to go deeper Created 3 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 #!/usr/bin/env python	1 #!/usr/bin/env python

2 # Copyright 2016 The Chromium Authors. All rights reserved.	2 # Copyright 2016 The Chromium Authors. All rights reserved.

3 # Use of this source code is governed by a BSD-style license that can be	3 # Use of this source code is governed by a BSD-style license that can be

4 # found in the LICENSE file.	4 # found in the LICENSE file.

5	5

	6 """

	7 This script processes trace files and symbolizes stack frames generated by

	8 Chrome's native heap profiler.

	9

	10 === Overview ===

	11

	12 Trace file is essentially a giant JSON array of dictionaries (events).

	13 Events have some predefined keys (e.g. 'pid'), but otherwise are free to

	14 have anything inside. Trace file contains events from all Chrome processes

	15 that were sampled during tracing period.

	16

	17 This script cares only about memory dump events generated with memory-infra

	18 category enabled.

	19

	20 When Chrome native heap profiling is enabled, some memory dump events

	21 include the following extra information:

	22

	23 * (Per allocator) Information about live allocations at the moment of the

	24 memory dump (the information includes backtraces, types / categories,

	25 sizes, and counts of allocations). There are several allocators in

	26 Chrome: malloc, blink_gc, and partition_alloc.

	27

	28 * (Per process) Stack frame tree of all functions that called allocators

	29 above.

	30

	31 This script does the following:

	32

	33 1. Parses the given trace file (loads JSON).

	34 2. Finds memory dump events and parses stack frame tree for each process.

	35 3. Finds stack frames that have PC addresses instead of function names.

	36 4. Symbolizes PCs and modifies loaded JSON.

	37 5. Writes modified JSON back to the file.

	38

	39 === Details ===

	40

	41 There are two formats of heap profiler information: legacy and modern. The

	42 main differences relevant to this script are:

	43

	44 * In the modern format stack frame tree, type name mapping, and string
	Wez 2017/05/03 00:17:10 nit: " ... modern format the stack frame ..." Oth nit: " ... modern format the stack frame ..." Otherwise it reads as though you mean "In the (modern format stack frame tree).." DmitrySkiba 2017/05/04 00:30:55 Done. Show quoted text On 2017/05/03 00:17:10, Wez wrote: > nit: " ... modern format the stack frame ..." > > Otherwise it reads as though you mean "In the (modern format stack frame > tree).." Done.
	45 mapping nodes are dumped incrementally. These nodes are dumped in each

	46 memory dump event and carry updates that occurred since the last event.

	47

	48 For example, let's say that when the first memory dump event is generated

	49 we only know about a function foo() (called from main()) allocating objects

	50 of type "int":

	51

	52 {

	53 "args": {

	54 "dumps": {

	55 "heaps_v2": {

	56 "maps": {

	57 "nodes": [

	58 { "id": 1, "name_sid": 1 },

	59 { "id": 2, "parent": 1, "name_sid": 3 },

	60 ],

	61 "types": [

	62 { "id": 1, "name_sid": 2 },

	63 ],

	64 "strings": [

	65 { "id": 1, "string": "main()" },

	66 { "id": 2, "string": "int" },

	67 { "id": 3, "string": "foo()" },

	68 ]

	69 },

	70 "allocators": { ...live allocations per allocator... },

	71 ...

	72 },

	73 ...

	74 }

	75 },

	76 ...

	77 }

	78

	79 Here:

	80 * 'nodes' node encodes stack frame tree

	81 * 'types' node encodes type name mappings

	82 * 'strings' node encodes string mapping (explained below)

	83

	84 Then, by the time second memory dump even is generated, we learn about

	85 bar() (called from main()), which also allocated "int" objects. Only the

	86 new information is dumped, i.e. bar() stack frame:

	87

	88 {

	89 "args": {

	90 "dumps": {

	91 "heaps_v2": {

	92 "maps": {

	93 "nodes": [

	94 { "id": 2, "parent": 1, "name_sid": 4 },

	95 ],

	96 "types": [],

	97 "strings": [

	98 { "id": 4, "string": "bar()" },

	99 ]

	100 },

	101 "allocators": { ...live allocations per allocator... },

	102 ...

	103 },

	104 ...

	105 }

	106 },

	107 ...

	108 }

	109

	110 Note that 'types' node is empty, since there were no updates. All three

	111 nodes ('nodes', types', and 'strings') can be empty if there were no updates

	112 to them.

	113

	114 For simplicity, when the script updates incremental nodes, it puts updated

	115 content in the first node, and clears all others. I.e. the following stack

	116 frame nodes:

	117

	118 'nodes': [

	119 { "id": 1, "name_sid": 1 },

	120 { "id": 2, "parent": 1, "name_sid": 2 },

	121 ]

	122 'nodes': [

	123 { "id": 3, "parent": 2, "name_sid": 3 },

	124 ]

	125 'nodes': [

	126 { "id": 4, "parent": 3, "name_sid": 4 },

	127 { "id": 5, "parent": 1, "name_sid": 5 },

	128 ]

	129

	130 After symbolization are written as:

	131

	132 'nodes': [

	133 { "id": 1, "name_sid": 1 },

	134 { "id": 2, "parent": 1, "name_sid": 2 },

	135 { "id": 3, "parent": 2, "name_sid": 3 },

	136 { "id": 4, "parent": 3, "name_sid": 4 },

	137 { "id": 5, "parent": 1, "name_sid": 5 },

	138 ]

	139 'nodes': []

	140 'nodes': []

	141

	142

	143 * In contrast, in the legacy format stack frame tree and type mappings are

	144 dumped separately from memory dump events, once per process.

	145

	146 Here is how trace file with two memory dump events looks like in the

	147 legacy format:

	148

	149 {

	150 "args": {

	151 "dumps": {

	152 "heaps": { ...live allocations per allocator... },

	153 ...

	154 }

	155 },

	156 ...

	157 }

	158

	159 {

	160 "args": {

	161 "dumps": {

	162 "heaps": { ...live allocations per allocator... },

	163 ...

	164 }

	165 },

	166 ...

	167 }

	168

	169 {

	170 "args": {

	171 "typeNames": {

	172 1: "int",

	173 }

	174 },

	175 "cat": "__metadata",

	176 "name": "typeNames",

	177 ...

	178 }

	179

	180 {

	181 "args": {

	182 "stackFrames": {

	183 1: { "name": "main" },

	184 2: { "name": "foo", "parent": 1 },

	185 3: { "name": "bar", "parent": 1 },

	186 }

	187 },

	188 "cat": "__metadata",

	189 "name": "stackFrames",

	190 ...

	191 }

	192

	193

	194 * Another change in the modern format is 'strings' node, which was added

	195 to deduplicate stack frame names (mainly for trace file size reduction).

	196 For consistency 'types' node also uses string mappings.

	197

	198

	199 See crbug.com/708930 for more information about the modern format.

	200 """

	201

6 import argparse	202 import argparse

7 import bisect	203 import bisect

8 import collections	204 import collections

9 import gzip	205 import gzip

	206 import itertools

10 import json	207 import json

11 import os	208 import os

12 import re	209 import re

13 import subprocess	210 import subprocess

14 import sys	211 import sys

15	212

16 _SYMBOLS_PATH = os.path.abspath(os.path.join(	213 _SYMBOLS_PATH = os.path.abspath(os.path.join(

17 os.path.dirname(os.path.realpath(__file__)),	214 os.path.dirname(os.path.realpath(__file__)),

18 '..',	215 '..',

19 'third_party',	216 'third_party',

20 'symbols'))	217 'symbols'))

21 sys.path.append(_SYMBOLS_PATH)	218 sys.path.append(_SYMBOLS_PATH)

22 # pylint: disable=import-error	219 # pylint: disable=import-error

23 import symbols.elf_symbolizer as elf_symbolizer	220 import symbols.elf_symbolizer as elf_symbolizer

24	221

25 import symbolize_trace_atos_regex	222 import symbolize_trace_atos_regex

26 import symbolize_trace_macho_reader	223 import symbolize_trace_macho_reader

27	224

28	225

29 # Relevant trace event phases from Chromium's	226 class NodeWrapper(object):

30 # src/base/trace_event/common/trace_event_common.h.	227 """Wraps an event data node(s).

31 TRACE_EVENT_PHASE_METADATA = 'M'	228

32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v'	229 A node is a reference into a trace event JSON. Wrappers parse nodes to

	230 provide convenient APIs and update nodes when asked to propagate changes

	231 back (see ApplyModifications() below).

	232

	233 Here is an example of legacy metadata event that contains stack frame tree:

	234

	235 {

	236 "args": {

	237 "stackFrames": { ... }

	238 },

	239 "cat": "__metadata",

	240 "name": "stackFrames",

	241 "ph": "M",

	242 ...

	243 }

	244

	245 When this event is encountered, a reference to the "stackFrames" dictionary

	246 is obtained and passed down to a specific wrapped class, which knows how to

	247 parse / update the dictionary.

	248

	249 There are two parsing patterns depending on whether node is serialized

	250 incrementally:

	251

	252 * If node is not incremental, then parsing is done by __init__(),

	253 see MemoryMap for an example.

	254

	255 * If node is incremental, then __init__() does nothing, and ParseNext()

	256 is called when next node (from a next event) is encountered.

	257

	258 Some wrappers can also modify nodes they parsed. In such cases they have

	259 additional APIs:

	260

	261 * 'modified' flag, which indicates whether the wrapper was changed.

	262

	263 * 'ApplyModifications' method, which propagates changes made to the wrapper

	264 back to nodes. Successful invocation of ApplyModifications() resets

	265 'modified' flag.

	266

	267 """

	268

	269 # def __init__(self, node):
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 Are these commented lines intentional ? I think th Are these commented lines intentional ? I think the pattern is to define them and raise NotImplementedError, and mark them as @override below DmitrySkiba 2017/05/04 00:30:56 The thing is that their exact shape is not determi Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > Are these commented lines intentional ? I think the pattern is to define them > and raise NotImplementedError, and mark them as @override below The thing is that their exact shape is not determined with respect to arguments. I guess I'll just remove them.
	270 # ...

	271

	272 # def ParseNext(self, node, ...):

	273 # ...

	274

	275 # @property

	276 # def modified(self):

	277 # ...

	278

	279 # def ApplyModifications(self, ...):

	280 # ...

	281

	282 pass

33	283

34	284

35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)	285 class MemoryMap(NodeWrapper):

36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available	286 """Wraps 'process_mmaps' node.

37 # via 'name' group.

38 ANDROID_PATH_MATCHER = re.compile(

39 r'^/data/(?:'

40 r'app/[^/]+/lib/[^/]+/\|'

41 r'app-lib/[^/]+/\|'

42 r'data/[^/]+/incremental-install-files/lib/'

43 r')(?P<name>.*\.so)')

44	287

45 # Subpath of output path where unstripped libraries are stored.	288 'process_mmaps' node contains information about file mappings.

46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'

47	289

48	290 "process_mmaps": {

49 def FindInSystemPath(binary_name):	291 "vm_regions": [

50 paths = os.environ['PATH'].split(os.pathsep)	292 {

51 for path in paths:	293 "mf": "<file_path>",

52 binary_path = os.path.join(path, binary_name)	294 "sa": "<start_address>",

53 if os.path.isfile(binary_path):	295 "sz": "<size>",

54 return binary_path	296 ...

55 return None	297 },

56	298 ...

57	299 ]

58 class Symbolizer(object):	300 }

59 # Encapsulates platform-specific symbolization logic.	301 """

60 def __init__(self):

61 self.is_mac = sys.platform == 'darwin'

62 self.is_win = sys.platform == 'win32'

63 if self.is_mac:

64 self.binary = 'atos'

65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()

66 elif self.is_win:

67 self.binary = 'addr2line-pdb.exe'

68 else:

69 self.binary = 'addr2line'

70 self.symbolizer_path = FindInSystemPath(self.binary)

71

72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):

73 def _SymbolizerCallback(sym_info, frames):

74 # Unwind inline chain to the top.

75 while sym_info.inlined_by:

76 sym_info = sym_info.inlined_by

77

78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name

79 for frame in frames:

80 frame.name = symbolized_name

81

82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,

83 self.symbolizer_path,

84 _SymbolizerCallback,

85 inlines=True)

86

87 for address, frames in symfile.frames_by_address.iteritems():

88 # SymbolizeAsync() asserts that the type of address is int. We operate

89 # on longs (since they are raw pointers possibly from 64-bit processes).

90 # It's OK to cast here because we're passing relative PC, which should

91 # always fit into int.

92 symbolizer.SymbolizeAsync(int(address), frames)

93

94 symbolizer.Join()

95

96

97 def _SymbolizeMac(self, symfile):

98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))

99

100 # 16 for the address, 2 for "0x", 1 for the space

101 chars_per_address = 19

102

103 load_address = (symbolize_trace_macho_reader.

104 ReadMachOTextLoadAddress(symfile.symbolizable_path))

105 assert load_address is not None

106

107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',

108 '0x%x' % load_address, '-o',

109 symfile.symbolizable_path]

110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1

111

112 # The maximum number of inputs that can be processed at once is limited by

113 # ARG_MAX. This currently evalutes to ~13000 on macOS.

114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address

115

116 all_keys = symfile.frames_by_address.keys()

117 processed_keys_count = 0

118 while len(all_keys):

119 input_count = min(len(all_keys), max_inputs)

120 keys_to_process = all_keys[0:input_count]

121

122 cmd = list(cmd_base)

123 cmd.extend([hex(int(x) + load_address)

124 for x in keys_to_process])

125 output_array = subprocess.check_output(cmd).split('\n')

126 for i in range(len(keys_to_process)):

127 for frame in (symfile.frames_by_address.values()

128 [i + processed_keys_count]):

129 frame.name = self._matcher.Match(output_array[i])

130 processed_keys_count += len(keys_to_process)

131 all_keys = all_keys[input_count:]

132

133

134 def _SymbolizeWin(self, symfile):

135 """Invoke symbolizer binary on windows and write all input in one go.

136

137 Unlike linux, on windows, symbolization talks through a shared system

138 service that handles communication with the NT symbol servers. This

139 creates an explicit serialization (and therefor lock contention) of

140 any process using the symbol API for files do not have a local PDB.

141

142 Thus, even though the windows symbolizer binary can be make command line

143 compatible with the POSIX addr2line interface, paralellizing the

144 symbolization does not yield the same performance effects. Running

145 just one symbolizer seems good enough for now. Can optimize later

146 if this becomes a bottleneck.

147 """

148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',

149 symfile.symbolizable_path]

150

151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,

152 stderr=sys.stderr)

153 addrs = ["%x" % relative_pc for relative_pc in

154 symfile.frames_by_address.keys()]

155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))

156 stdout_data = stdout_data.split('\n')

157

158 # This is known to be in the same order as stderr_data.

159 for i, addr in enumerate(addrs):

160 for frame in symfile.frames_by_address[int(addr, 16)]:

161 # Output of addr2line with --functions is always 2 outputs per

162 # symbol, function name followed by source line number. Only grab

163 # the function name as line info is not always available.

164 frame.name = stdout_data[i * 2]

165

166

167 def Symbolize(self, symfile, unsymbolized_name):

168 if self.is_mac:

169 self._SymbolizeMac(symfile)

170 if self.is_win:

171 self._SymbolizeWin(symfile)

172 else:

173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)

174

175

176 def IsSymbolizableFile(self, file_path):

177 if self.is_win:

178 extension = os.path.splitext(file_path)[1].lower()

179 return extension in ['.dll', '.exe']

180 else:

181 result = subprocess.check_output(['file', '-0', file_path])

182 type_string = result[result.find('\0') + 1:]

183 return bool(re.match(r'.(ELF\|Mach-O) (32\|64)-bit\b.',

184 type_string, re.DOTALL))

185

186

187 class ProcessMemoryMaps(object):

188 """Represents 'process_mmaps' trace file entry."""

189	302

190 class Region(object):	303 class Region(object):

191 def __init__(self, start_address, size, file_path):	304 def __init__(self, start_address, size, file_path):

192 self._start_address = start_address	305 self._start_address = start_address

193 self._size = size	306 self._size = size

194 self._file_path = file_path	307 self._file_path = file_path

195	308

196 @property	309 @property

197 def start_address(self):	310 def start_address(self):

198 return self._start_address	311 return self._start_address

(...skipping 15 matching lines...) Expand all Loading...
214 return long(self._start_address).__cmp__(long(other._start_address))	327 return long(self._start_address).__cmp__(long(other._start_address))

215 elif isinstance(other, (long, int)):	328 elif isinstance(other, (long, int)):

216 return long(self._start_address).__cmp__(long(other))	329 return long(self._start_address).__cmp__(long(other))

217 else:	330 else:

218 raise Exception('Cannot compare with %s' % type(other))	331 raise Exception('Cannot compare with %s' % type(other))

219	332

220 def __repr__(self):	333 def __repr__(self):

221 return 'Region(0x{:X} - 0x{:X}, {})'.format(	334 return 'Region(0x{:X} - 0x{:X}, {})'.format(

222 self.start_address, self.end_address, self.file_path)	335 self.start_address, self.end_address, self.file_path)

223	336

224 def __init__(self, process_mmaps):	337 def __init__(self, process_mmaps_node):

225 """Parses 'process_mmaps' dictionary."""

226

227 regions = []	338 regions = []

228 for region_value in process_mmaps['vm_regions']:	339 for region_node in process_mmaps_node['vm_regions']:

229 regions.append(self.Region(	340 regions.append(self.Region(

230 long(region_value['sa'], 16),	341 long(region_node['sa'], 16),

231 long(region_value['sz'], 16),	342 long(region_node['sz'], 16),

232 region_value['mf']))	343 region_node['mf']))

233 regions.sort()	344 regions.sort()

234	345

235 # Copy regions without duplicates and check for overlaps.	346 # Copy regions without duplicates and check for overlaps.

236 self._regions = []	347 self._regions = []

237 previous_region = None	348 previous_region = None

238 for region in regions:	349 for region in regions:

239 if previous_region is not None:	350 if previous_region is not None:

240 if region == previous_region:	351 if region == previous_region:

241 continue	352 continue

242 assert region.start_address >= previous_region.end_address, \	353 assert region.start_address >= previous_region.end_address, \

243 'Regions {} and {} overlap.'.format(previous_region, region)	354 'Regions {} and {} overlap.'.format(previous_region, region)

244 previous_region = region	355 previous_region = region

245 self._regions.append(region)	356 self._regions.append(region)

246	357

247 @property	358 @property

248 def regions(self):	359 def regions(self):

249 return self._regions	360 return self._regions

250	361

251 def FindRegion(self, address):	362 def FindRegion(self, address):

252 """Finds region containing \|address\|. Returns None if none found."""	363 """Finds region containing \|address\|. Returns None if none found."""

253	364

254 region_index = bisect.bisect_right(self._regions, address) - 1	365 region_index = bisect.bisect_right(self._regions, address) - 1

255 if region_index >= 0:	366 if region_index >= 0:

256 region = self._regions[region_index]	367 region = self._regions[region_index]

257 if address >= region.start_address and address < region.end_address:	368 if address >= region.start_address and address < region.end_address:

258 return region	369 return region

259 return None	370 return None

260	371

261	372

262 class StackFrames(object):	373 class UnsupportedHeapDumpVersionError(Exception):
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 No need to change it now, but for the future I hon No need to change it now, but for the future I honestly think that all this specialized exception doesn't add anything more than a assert(...), "Unsupported version") down below where you use this. DmitrySkiba 2017/05/04 00:30:56 I wanted to surface the version that caused the er Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > No need to change it now, but for the future I honestly think that all this > specialized exception doesn't add anything more than a assert(...), "Unsupported > version") down below where you use this. I wanted to surface the version that caused the error - so either I copy-paste same assert in several places, or I incapsulate the assert in a method, or I create a specific exception class that does the formatting.
263 """Represents 'stackFrames' trace file entry."""	374 """Helper exception class to signal unsupported heap dump version."""

264	375

265 class PCFrame(object):	376 def __init__(self, version):

266 def __init__(self, pc, frame):	377 message = 'Unsupported heap dump version: {}'.format(version)

	378 super(UnsupportedHeapDumpVersionError, self).__init__(message)

	379

	380

	381 class StringMap(NodeWrapper):

	382 """Wraps all 'strings' nodes for a process.

	383

	384 'strings' node contains incremental mappings between integer ids and strings.

	385

	386 "strings": [

	387 {

	388 "id": <string_id>,

	389 "string": <string>

	390 },

	391 ...

	392 ]

	393 """

	394

	395 def __init__(self):

	396 self._modified = False

	397 self._strings_nodes = []

	398 self._string_by_id = {}

	399 self._id_by_string = {}

	400 self._max_string_id = 0

	401

	402 @property

	403 def modified(self):

	404 """Returns True if the wrapper was modified (see NodeWrapper)."""

	405 return self._modified

	406

	407 @property

	408 def string_by_id(self):

	409 return self._string_by_id

	410

	411 def ParseNext(self, heap_dump_version, strings_node):

	412 """Parses and interns next node (see NodeWrapper)."""

	413

	414 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 Here I would have just done assert(heap_dump_vers Here I would have just done assert(heap_dump_version != Trace.HEAP_DUMP_VERSION_1) DmitrySkiba 2017/05/04 00:30:55 Acknowledged. Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > Here I would have just done > assert(heap_dump_version != Trace.HEAP_DUMP_VERSION_1) Acknowledged.
	415 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	416

	417 self._strings_nodes.append(strings_node)

	418 for string_node in strings_node:

	419 self._Insert(string_node['id'], string_node['string'])

	420

	421 def Clear(self):

	422 """Clears all string mappings."""

	423 if self._string_by_id:

	424 self._modified = True

	425 self._string_by_id = {}

	426 self._id_by_string = {}

	427 self._Insert(0, '[null]')
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 is it intentional that clear does this _Insert and is it intentional that clear does this _Insert and __init__ does not? should you maybe call clear() from __init__ ? DmitrySkiba 2017/05/04 00:30:55 __init__() (or rather ParseNext) wraps existing no Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > is it intentional that clear does this _Insert and __init__ does not? > should you maybe call clear() from __init__ ? __init__() (or rather ParseNext) wraps existing nodes, which always have ID #0. When we're clearing all mappings we need to carry string #0 over. I added a comment.
	428 self._max_string_id = 0

	429

	430 def AddString(self, string):

	431 """Adds a string (if it doesn't exist) and returns its integer id."""

	432 string_id = self._id_by_string.get(string)

	433 if string_id is None:

	434 string_id = self._max_string_id + 1

	435 self._Insert(string_id, string)

	436 self._modified = True

	437 return string_id

	438

	439 def ApplyModifications(self):

	440 """Propagates modifications back to nodes (see NodeWrapper)."""

	441 if not self.modified:

	442 return

	443

	444 assert self._strings_nodes, 'no nodes'

	445

	446 # Serialize into the first node, and clear all others.

	447

	448 for strings_node in self._strings_nodes:
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 maybe when you do this add a comment explaining th maybe when you do this add a comment explaining that you want to actually empty the content of the node but keep the existing references valid. otherwise this looks a bit mysterious. DmitrySkiba 2017/05/04 00:30:55 See comments at the top of the file. "Details" exp Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > maybe when you do this add a comment explaining that you want to actually empty > the content of the node but keep the existing references valid. otherwise this > looks a bit mysterious. See comments at the top of the file. "Details" explains how we're updating incremental nodes, NodeWrapper explains what "wrapper" is (reference to a node in a larger JSON).
	449 del strings_node[:]

	450 strings_node = self._strings_nodes[0]

	451 for string_id, string in self._string_by_id.iteritems():

	452 strings_node.append({'id': string_id, 'string': string})

	453

	454 self._modified = False

	455

	456 def _Insert(self, string_id, string):

	457 self._id_by_string[string] = string_id

	458 self._string_by_id[string_id] = string

	459 self._max_string_id = max(self._max_string_id, string_id)

	460

	461

	462 class TypeNameMap(NodeWrapper):

	463 """Wraps all 'types' nodes for a process.

	464

	465 'types' nodes encode mappings between integer type ids and integer

	466 string ids (from 'strings' nodes).

	467

	468 "types": [

	469 {

	470 "id": <type_id>,

	471 "name_sid": <name_string_id>

	472 }

	473 ...

	474 ]

	475

	476 For simplicity string ids are translated into strings during parsing,

	477 and then translated back to ids in ApplyModifications().

	478 """

	479 def __init__(self):

	480 self._modified = False

	481 self._type_name_nodes = []

	482 self._name_by_id = {}

	483 self._id_by_name = {}

	484 self._max_type_id = 0

	485

	486 @property

	487 def modified(self):

	488 """Returns True if the wrapper was modified (see NodeWrapper)."""

	489 return self._modified

	490

	491 @property

	492 def name_by_id(self):

	493 """Returns {id -> name} dict (must not be changed directly)."""

	494 return self._name_by_id

	495

	496 def ParseNext(self, heap_dump_version, type_name_node, string_map):

	497 """Parses and interns next node (see NodeWrapper).

	498

	499 \|string_map\| - A StringMap object to use to translate string ids

	500 to strings.

	501 """

	502 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

	503 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	504

	505 self._type_name_nodes.append(type_name_node)

	506 for type_node in type_name_node:

	507 self._Insert(type_node['id'],

	508 string_map.string_by_id[type_node['name_sid']])

	509

	510 def AddType(self, type_name):

	511 """Adds a type name (if it doesn't exist) and returns its id."""

	512 type_id = self._id_by_name.get(type_name)

	513 if type_id is None:

	514 type_id = self._max_type_id + 1

	515 self._Insert(type_id, type_name)

	516 self._modified = True

	517 return type_id

	518

	519 def ApplyModifications(self, string_map, force=False):

	520 """Propagates modifications back to nodes.

	521

	522 \|string_map\| - A StringMap object to use to translate strings to ids.

	523 \|force\| - Whether to propagate changes regardless of 'modified' flag.

	524 """

	525 if not self.modified and not force:

	526 return

	527

	528 assert self._type_name_nodes, 'no nodes'

	529

	530 # Serialize into the first node, and clear all others.

	531

	532 for types_node in self._type_name_nodes:

	533 del types_node[:]

	534 types_node = self._type_name_nodes[0]

	535 for type_id, type_name in self._name_by_id.iteritems():

	536 types_node.append({

	537 'id': type_id,

	538 'name_sid': string_map.AddString(type_name)})

	539

	540 self._modified = False

	541

	542 def _Insert(self, type_id, type_name):

	543 self._id_by_name[type_name] = type_id

	544 self._name_by_id[type_id] = type_name

	545 self._max_type_id = max(self._max_type_id, type_id)

	546

	547

	548 class StackFrameMap(NodeWrapper):

	549 """ Wraps stack frame tree nodes for a process.

	550

	551 For the legacy format this wrapper expects a single 'stackFrames' node

	552 (which comes from metadata event):

	553

	554 "stackFrames": {

	555 "<frame_id>": {

	556 "name": "<frame_name>"

	557 "parent": "<parent_frame_id>"

	558 },

	559 ...

	560 }

	561

	562 For the modern format this wrapper expects several 'nodes' nodes:

	563

	564 "nodes": [

	565 {

	566 "id": <frame_id>,

	567 "parent": <parent_frame_id>,

	568 "name_sid": <name_string_id>

	569 },

	570 ...

	571 ]

	572

	573 In both formats frame name is a string. Native heap profiler generates

	574 specially formatted frame names (e.g. "pc:10eb78dba") for function

	575 addresses (PCs). Inner Frame class below parses name and extracts PC,

	576 if it's there.

	577 """

	578 class Frame(object):

	579 def __init__(self, frame_id, name, parent_frame_id):

267 self._modified = False	580 self._modified = False

268 self._pc = pc	581 self._id = frame_id

269 self._frame = frame	582 self._name = name

	583 self._pc = self._ParsePC(name)

	584 self._parent_id = parent_frame_id

	585 self._ext = None

270	586

271 @property	587 @property

272 def modified(self):	588 def modified(self):

	589 """Returns True if the frame was modified.

	590

	591 For example changing frame's name sets this flag (since the change

	592 needs to be propagated back to nodes).

	593 """

273 return self._modified	594 return self._modified

274	595

275 @property	596 @property

	597 def id(self):

	598 """Frame id (integer)."""

	599 return self._id

	600

	601 @property

276 def pc(self):	602 def pc(self):

	603 """Parsed (integer) PC of the frame, or None."""

277 return self._pc	604 return self._pc

278	605

279 @property	606 @property

280 def name(self):	607 def name(self):

281 return self._frame['name']	608 """Name of the frame (see above)."""

	609 return self._name

282	610

283 @name.setter	611 @name.setter

284 def name(self, value):	612 def name(self, value):

	613 """Changes the name. Doesn't affect value of \|pc\|."""

285 self._modified = True	614 self._modified = True

286 self._frame['name'] = value	615 self._name = value

287	616

288 def __init__(self, stack_frames):	617 @property

289 """Constructs object using 'stackFrames' dictionary."""	618 def parent_id(self):

290 self._pc_frames = []	619 """Parent frame id (integer)."""

291 for frame in stack_frames.itervalues():	620 return self._parent_id

292 pc_frame = self._ParsePCFrame(frame)	621

293 if pc_frame:	622 _PC_TAG = 'pc:'

294 self._pc_frames.append(pc_frame)	623

295	624 def _ParsePC(self, name):

296 @property	625 if not name.startswith(self._PC_TAG):

297 def pc_frames(self):	626 return None

298 return self._pc_frames	627 return long(name[len(self._PC_TAG):], 16)

	628

	629 def _ClearModified(self):

	630 self._modified = False

	631

	632 def __init__(self):

	633 self._modified = False

	634 self._heap_dump_version = None

	635 self._stack_frames_nodes = []

	636 self._frame_by_id = {}

299	637

300 @property	638 @property

301 def modified(self):	639 def modified(self):

302 return any(f.modified for f in self._pc_frames)	640 """Returns True if the wrapper or any of its frames were modified."""

303	641 return (self._modified or

304 _PC_TAG = 'pc:'	642 any(f.modified for f in self._frame_by_id.itervalues()))

305	643

306 @classmethod	644 @property

307 def _ParsePCFrame(self, frame):	645 def frame_by_id(self):

308 name = frame['name']	646 """Returns {id -> frame} dict (must not be modified directly)."""

309 if not name.startswith(self._PC_TAG):	647 return self._frame_by_id

310 return None	648

311 pc = long(name[len(self._PC_TAG):], 16)	649 def ParseNext(self, heap_dump_version, stack_frames_node, string_map):

312 return self.PCFrame(pc, frame)	650 """Parses the next stack frames node (see NodeWrapper).

313	651

314	652 For the modern format \|string_map\| is used to translate string ids

315 class Process(object):	653 to strings.

316 """Holds various bits of information about a process in a trace file."""	654 """

317	655

318 def __init__(self, pid):	656 frame_by_id = {}

319 self.pid = pid	657 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

320 self.name = None	658 if self._stack_frames_nodes:

321 self.mmaps = None	659 raise Exception('Legacy stack frames node is expected only once.')

322 self.stack_frames = None	660 for frame_id, frame_node in stack_frames_node.iteritems():

323	661 frame = self.Frame(frame_id,

324	662 frame_node['name'],

325 def CollectProcesses(trace):	663 frame_node.get('parent'))

326 """Parses trace dictionary and returns pid->Process map of all processes	664 frame_by_id[frame.id] = frame

327 suitable for symbolization (which have both mmaps and stack_frames).	665 else:

	666 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

	667 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	668 for frame_node in stack_frames_node:

	669 frame = self.Frame(frame_node['id'],

	670 string_map.string_by_id[frame_node['name_sid']],

	671 frame_node.get('parent'))

	672 frame_by_id[frame.id] = frame

	673

	674 self._heap_dump_version = heap_dump_version

	675 self._stack_frames_nodes.append(stack_frames_node)

	676

	677 self._frame_by_id = frame_by_id

	678

	679 def ApplyModifications(self, string_map, force=False):

	680 """Applies modifications back to nodes (see NodeWrapper)."""

	681

	682 if not self.modified and not force:

	683 return

	684

	685 assert self._stack_frames_nodes, 'no nodes'

	686 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	687 assert string_map is None, \

	688 'string_map should not be used with the legacy format'

	689

	690 # Serialize frames into the first node, clear all others.

	691

	692 for frames_node in self._stack_frames_nodes:

	693 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	694 frames_node.clear()

	695 else:

	696 del frames_node[:]

	697

	698 frames_node = self._stack_frames_nodes[0]

	699 for frame in self._frame_by_id.itervalues():

	700 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	701 frame_node = {'name': frame.name}

	702 frames_node[frame.id] = frame_node

	703 else:

	704 frame_node = {

	705 'id': frame.id,

	706 'name_sid': string_map.AddString(frame.name)

	707 }

	708 frames_node.append(frame_node)

	709 if frame.parent_id is not None:

	710 frame_node['parent'] = frame.parent_id

	711 frame._ClearModified()

	712

	713 self._modified = False

	714

	715

	716 class Trace(NodeWrapper):

	717 """Wrapper for the root trace node (i.e. the trace JSON itself).

	718

	719 This wrapper parses select nodes from memory-infra events and groups

	720 parsed data per-process (see inner Process class below).

328 """	721 """

329	722

330 process_map = {}	723 # Indicates legacy heap dump format.

331	724 HEAP_DUMP_VERSION_LEGACY = 'Legacy'

332 # Android traces produced via 'chrome://inspect/?tracing#devices' are	725

333 # just list of events.	726 # Indicates variation of a modern heap dump format.

334 events = trace if isinstance(trace, list) else trace['traceEvents']	727 HEAP_DUMP_VERSION_1 = 1

335 for event in events:	728

336 name = event.get('name')	729 class Process(object):

337 if not name:	730 """Collection of per-process data and wrappers."""

338 continue	731

339	732 def __init__(self, pid):

340 pid = event['pid']	733 self._pid = pid

341 process = process_map.get(pid)	734 self._name = None

342 if process is None:	735 self._memory_map = None

343 process = Process(pid)	736 self._stack_frame_map = StackFrameMap()

344 process_map[pid] = process	737 self._type_name_map = TypeNameMap()

345	738 self._string_map = StringMap()

346 phase = event['ph']	739 self._heap_dump_version = None

347 if phase == TRACE_EVENT_PHASE_METADATA:	740

348 if name == 'process_name':	741 @property

349 process.name = event['args']['name']	742 def modified(self):

350 elif name == 'stackFrames':	743 return self._stack_frame_map.modified or self._type_name_map.modified

351 process.stack_frames = StackFrames(event['args']['stackFrames'])	744

352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP:	745 @property

353 process_mmaps = event['args']['dumps'].get('process_mmaps')	746 def pid(self):

354 if process_mmaps:	747 return self._pid

355 # TODO(dskiba): this parses all process_mmaps, but retains only the	748

356 # last one. We need to parse only once (lazy parsing?).	749 @property

357 process.mmaps = ProcessMemoryMaps(process_mmaps)	750 def name(self):

358	751 return self._name

359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames]	752

	753 @property

	754 def unique_name(self):

	755 """Returns string that includes both process name and its pid."""

	756 name = self._name if self._name else 'UnnamedProcess'

	757 return '{}({})'.format(name, self._pid)

	758

	759 @property

	760 def memory_map(self):

	761 return self._memory_map

	762

	763 @property

	764 def stack_frame_map(self):

	765 return self._stack_frame_map

	766

	767 @property

	768 def type_name_map(self):

	769 return self._type_name_map

	770

	771 def ApplyModifications(self):

	772 """Calls ApplyModifications() on contained wrappers."""

	773 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	774 self._stack_frame_map.ApplyModifications(None)

	775 else:

	776 if self._stack_frame_map.modified or self._type_name_map.modified:

	777 self._string_map.Clear()

	778 self._stack_frame_map.ApplyModifications(self._string_map, force=True)

	779 self._type_name_map.ApplyModifications(self._string_map, force=True)

	780 self._string_map.ApplyModifications()

	781

	782 def __init__(self, trace_node):

	783 self._trace_node = trace_node

	784 self._processes = []

	785 self._heap_dump_version = None

	786

	787 # Misc per-process information needed only during parsing.

	788 class ProcessExt(object):

	789 def __init__(self, pid):

	790 self.process = Trace.Process(pid)

	791 self.mapped_entry_names = set()

	792 self.process_mmaps_node = None

	793 self.seen_strings_node = False

	794

	795 process_ext_by_pid = {}

	796

	797 # Android traces produced via 'chrome://inspect/?tracing#devices' are

	798 # just list of events.

	799 events = trace_node if isinstance(trace_node, list) \

	800 else trace_node['traceEvents']

	801 for event in events:

	802 name = event.get('name')

	803 if not name:

	804 continue

	805

	806 pid = event['pid']

	807 process_ext = process_ext_by_pid.get(pid)

	808 if process_ext is None:

	809 process_ext = ProcessExt(pid)

	810 process_ext_by_pid[pid] = process_ext

	811 process = process_ext.process

	812

	813 phase = event['ph']

	814 if phase == self._EVENT_PHASE_METADATA:

	815 if name == 'process_name':

	816 process._name = event['args']['name']

	817 elif name == 'stackFrames':

	818 process._stack_frame_map.ParseNext(

	819 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),

	820 event['args']['stackFrames'],

	821 process._string_map)

	822 elif phase == self._EVENT_PHASE_MEMORY_DUMP:

	823 dumps = event['args']['dumps']

	824 process_mmaps = dumps.get('process_mmaps')

	825 if process_mmaps:

	826 # We want the most recent memory map, so parsing happens later

	827 # once we finished reading all events.

	828 process_ext.process_mmaps_node = process_mmaps

	829 heaps = dumps.get('heaps_v2')

	830 if heaps:

	831 version = self._UseHeapDumpVersion(heaps['version'])

	832 maps = heaps.get('maps')

	833 if maps:

	834 process_ext.mapped_entry_names.update(maps.iterkeys())

	835 types = maps.get('types')

	836 stack_frames = maps.get('nodes')

	837 strings = maps.get('strings')

	838 if (strings is None and (types or stack_frames)

	839 and not process_ext.seen_strings_node):

	840 # ApplyModifications() for TypeNameMap and StackFrameMap puts

	841 # everything into the first node and depends on StringMap. So

	842 # we need to make sure that 'strings' node is there if any of

	843 # other two nodes present.

	844 strings = []

	845 maps['strings'] = strings

	846 if strings is not None:

	847 process_ext.seen_strings_node = True

	848 process._string_map.ParseNext(version, strings)

	849 if types:

	850 process._type_name_map.ParseNext(

	851 version, types, process._string_map)

	852 if stack_frames:

	853 process._stack_frame_map.ParseNext(

	854 version, stack_frames, process._string_map)

	855

	856 self._processes = []

	857 for pe in process_ext_by_pid.itervalues():

	858 pe.process._heap_dump_version = self._heap_dump_version

	859 if pe.process_mmaps_node:

	860 # Now parse the most recent memory map.

	861 pe.process._memory_map = MemoryMap(pe.process_mmaps_node)

	862 self._processes.append(pe.process)

	863

	864 @property

	865 def node(self):

	866 """Root node (that was passed to the __init__)."""

	867 return self._trace_node

	868

	869 @property

	870 def modified(self):

	871 """Returns True if trace file needs to be updated.

	872

	873 Before writing trace JSON back to a file ApplyModifications() needs

	874 to be called.

	875 """

	876 return any(p.modified for p in self._processes)

	877

	878 @property

	879 def processes(self):

	880 return self._processes

	881

	882 @property

	883 def heap_dump_version(self):

	884 return self._heap_dump_version

	885

	886 def ApplyModifications(self):

	887 """Propagates modifications back to the trace JSON."""

	888 for process in self._processes:

	889 process.ApplyModifications()

	890 assert not self.modified, 'still modified'

	891

	892 # Relevant trace event phases from Chromium's

	893 # src/base/trace_event/common/trace_event_common.h.

	894 _EVENT_PHASE_METADATA = 'M'

	895 _EVENT_PHASE_MEMORY_DUMP = 'v'

	896

	897 def _UseHeapDumpVersion(self, version):

	898 if self._heap_dump_version is None:

	899 self._heap_dump_version = version

	900 return version

	901 elif self._heap_dump_version != version:

	902 raise Exception(

	903 ("Inconsistent trace file: first saw '{}' heap dump version, "

	904 "then '{}'.").format(self._heap_dump_version, version))

	905 else:

	906 return version

360	907

361	908

362 class SymbolizableFile(object):	909 class SymbolizableFile(object):

363 """Holds file path, addresses to symbolize and stack frames to update.	910 """Holds file path, addresses to symbolize and stack frames to update.

364	911

365 This class is a link between ELFSymbolizer and a trace file: it specifies	912 This class is a link between ELFSymbolizer and a trace file: it specifies

366 what to symbolize (addresses) and what to update with the symbolization	913 what to symbolize (addresses) and what to update with the symbolization

367 result (frames).	914 result (frames).

368 """	915 """

369 def __init__(self, file_path):	916 def __init__(self, file_path):

370 self.path = file_path	917 self.path = file_path

371 self.symbolizable_path = file_path # path to use for symbolization	918 self.symbolizable_path = file_path # path to use for symbolization

372 self.frames_by_address = collections.defaultdict(list)	919 self.frames_by_address = collections.defaultdict(list)

373	920

374	921

375 def ResolveSymbolizableFiles(processes):	922 def ResolveSymbolizableFiles(processes):

376 """Resolves and groups PCs into list of SymbolizableFiles.	923 """Resolves and groups PCs into list of SymbolizableFiles.

377	924

378 As part of the grouping process, this function resolves PC from each stack	925 As part of the grouping process, this function resolves PC from each stack

379 frame to the corresponding mmap region. Stack frames that failed to resolve	926 frame to the corresponding mmap region. Stack frames that failed to resolve

380 are symbolized with '<unresolved>'.	927 are symbolized with '<unresolved>'.

381 """	928 """

382 symfile_by_path = {}	929 symfile_by_path = {}

383 for process in processes:	930 for process in processes:

384 for frame in process.stack_frames.pc_frames:	931 if not process.memory_map:

385 region = process.mmaps.FindRegion(frame.pc)	932 continue

	933 for frame in process.stack_frame_map.frame_by_id.itervalues():

	934 if frame.pc is None:

	935 continue

	936 region = process.memory_map.FindRegion(frame.pc)

386 if region is None:	937 if region is None:

387 frame.name = '<unresolved>'	938 frame.name = '<unresolved>'

388 continue	939 continue

389	940

390 symfile = symfile_by_path.get(region.file_path)	941 symfile = symfile_by_path.get(region.file_path)

391 if symfile is None:	942 if symfile is None:

392 symfile = SymbolizableFile(region.file_path)	943 symfile = SymbolizableFile(region.file_path)

393 symfile_by_path[symfile.path] = symfile	944 symfile_by_path[symfile.path] = symfile

394	945

395 relative_pc = frame.pc - region.start_address	946 relative_pc = frame.pc - region.start_address

396 symfile.frames_by_address[relative_pc].append(frame)	947 symfile.frames_by_address[relative_pc].append(frame)

397 return symfile_by_path.values()	948 return symfile_by_path.values()

398	949

399	950

	951 def FindInSystemPath(binary_name):

	952 paths = os.environ['PATH'].split(os.pathsep)

	953 for path in paths:

	954 binary_path = os.path.join(path, binary_name)

	955 if os.path.isfile(binary_path):

	956 return binary_path

	957 return None

	958

	959

	960 class Symbolizer(object):

	961 """Encapsulates platform-specific symbolization logic."""

	962

	963 def __init__(self):

	964 self.is_mac = sys.platform == 'darwin'

	965 self.is_win = sys.platform == 'win32'

	966 if self.is_mac:

	967 self.binary = 'atos'

	968 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()

	969 elif self.is_win:

	970 self.binary = 'addr2line-pdb.exe'

	971 else:

	972 self.binary = 'addr2line'

	973 self.symbolizer_path = FindInSystemPath(self.binary)

	974

	975 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):

	976 def _SymbolizerCallback(sym_info, frames):

	977 # Unwind inline chain to the top.

	978 while sym_info.inlined_by:

	979 sym_info = sym_info.inlined_by

	980

	981 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name

	982 for frame in frames:

	983 frame.name = symbolized_name

	984 frame.ext.source_path = sym_info.source_path

	985

	986 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,

	987 self.symbolizer_path,

	988 _SymbolizerCallback,

	989 inlines=True)

	990

	991 for address, frames in symfile.frames_by_address.iteritems():

	992 # SymbolizeAsync() asserts that the type of address is int. We operate

	993 # on longs (since they are raw pointers possibly from 64-bit processes).

	994 # It's OK to cast here because we're passing relative PC, which should

	995 # always fit into int.

	996 symbolizer.SymbolizeAsync(int(address), frames)

	997

	998 symbolizer.Join()

	999

	1000

	1001 def _SymbolizeMac(self, symfile):

	1002 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))

	1003

	1004 # 16 for the address, 2 for "0x", 1 for the space

	1005 chars_per_address = 19

	1006

	1007 load_address = (symbolize_trace_macho_reader.

	1008 ReadMachOTextLoadAddress(symfile.symbolizable_path))

	1009 assert load_address is not None

	1010

	1011 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',

	1012 '0x%x' % load_address, '-o',

	1013 symfile.symbolizable_path]

	1014 chars_for_other_arguments = len(' '.join(cmd_base)) + 1

	1015

	1016 # The maximum number of inputs that can be processed at once is limited by

	1017 # ARG_MAX. This currently evalutes to ~13000 on macOS.

	1018 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address

	1019

	1020 all_keys = symfile.frames_by_address.keys()

	1021 processed_keys_count = 0

	1022 while len(all_keys):

	1023 input_count = min(len(all_keys), max_inputs)

	1024 keys_to_process = all_keys[0:input_count]

	1025 cmd = list(cmd_base)

	1026 cmd.extend([hex(int(x) + load_address)

	1027 for x in keys_to_process])

	1028 output_array = subprocess.check_output(cmd).split('\n')

	1029 for i in range(len(keys_to_process)):

	1030 for frame in (symfile.frames_by_address.values()

	1031 [i + processed_keys_count]):

	1032 frame.name = self._matcher.Match(output_array[i])

	1033 processed_keys_count += len(keys_to_process)

	1034 all_keys = all_keys[input_count:]

	1035

	1036 def _SymbolizeWin(self, symfile):

	1037 """Invoke symbolizer binary on windows and write all input in one go.

	1038

	1039 Unlike linux, on windows, symbolization talks through a shared system

	1040 service that handles communication with the NT symbol servers. This

	1041 creates an explicit serialization (and therefor lock contention) of

	1042 any process using the symbol API for files do not have a local PDB.

	1043

	1044 Thus, even though the windows symbolizer binary can be make command line

	1045 compatible with the POSIX addr2line interface, paralellizing the

	1046 symbolization does not yield the same performance effects. Running

	1047 just one symbolizer seems good enough for now. Can optimize later

	1048 if this becomes a bottleneck.

	1049 """

	1050 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',

	1051 symfile.symbolizable_path]

	1052

	1053 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,

	1054 stderr=sys.stderr)

	1055 addrs = ["%x" % relative_pc for relative_pc in

	1056 symfile.frames_by_address.keys()]

	1057 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))

	1058 stdout_data = stdout_data.split('\n')

	1059

	1060 # This is known to be in the same order as stderr_data.

	1061 for i, addr in enumerate(addrs):

	1062 for frame in symfile.frames_by_address[int(addr, 16)]:

	1063 # Output of addr2line with --functions is always 2 outputs per

	1064 # symbol, function name followed by source line number. Only grab

	1065 # the function name as line info is not always available.

	1066 frame.name = stdout_data[i * 2]

	1067

	1068 def Symbolize(self, symfile, unsymbolized_name):

	1069 if self.is_mac:

	1070 self._SymbolizeMac(symfile)

	1071 elif self.is_win:

	1072 self._SymbolizeWin(symfile)

	1073 else:

	1074 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)

	1075

	1076 def IsSymbolizableFile(self, file_path):

	1077 if self.is_win:

	1078 extension = os.path.splitext(file_path)[1].lower()

	1079 return extension in ['.dll', '.exe']

	1080 else:

	1081 result = subprocess.check_output(['file', '-0', file_path])

	1082 type_string = result[result.find('\0') + 1:]

	1083 return bool(re.match(r'.(ELF\|Mach-O) (32\|64)-bit\b.',

	1084 type_string, re.DOTALL))

	1085

	1086

400 def SymbolizeFiles(symfiles, symbolizer):	1087 def SymbolizeFiles(symfiles, symbolizer):

401 """Symbolizes each file in the given list of SymbolizableFiles	1088 """Symbolizes each file in the given list of SymbolizableFiles

402 and updates stack frames with symbolization results."""	1089 and updates stack frames with symbolization results."""

	1090

	1091 if not symfiles:

	1092 print 'Nothing to symbolize.'

	1093 return

	1094

403 print 'Symbolizing...'	1095 print 'Symbolizing...'

404	1096

405 def _SubPrintf(message, *args):	1097 def _SubPrintf(message, *args):

406 print (' ' + message).format(*args)	1098 print (' ' + message).format(*args)

407	1099

408 symbolized = False

409 for symfile in symfiles:	1100 for symfile in symfiles:

410 unsymbolized_name = '<{}>'.format(	1101 unsymbolized_name = '<{}>'.format(

411 symfile.path if symfile.path else 'unnamed')	1102 symfile.path if symfile.path else 'unnamed')

412	1103

413 problem = None	1104 problem = None

414 if not os.path.isabs(symfile.symbolizable_path):	1105 if not os.path.isabs(symfile.symbolizable_path):

415 problem = 'not a file'	1106 problem = 'not a file'

416 elif not os.path.isfile(symfile.symbolizable_path):	1107 elif not os.path.isfile(symfile.symbolizable_path):

417 problem = "file doesn't exist"	1108 problem = "file doesn't exist"

418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):	1109 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):

419 problem = 'file is not symbolizable'	1110 problem = 'file is not symbolizable'

420 if problem:	1111 if problem:

421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",	1112 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",

422 len(symfile.frames_by_address),	1113 len(symfile.frames_by_address),

423 symfile.symbolizable_path,	1114 symfile.symbolizable_path,

424 problem)	1115 problem)

425 for frames in symfile.frames_by_address.itervalues():	1116 for frames in symfile.frames_by_address.itervalues():

426 for frame in frames:	1117 for frame in frames:

427 frame.name = unsymbolized_name	1118 frame.name = unsymbolized_name

428 continue	1119 continue

429	1120

430 _SubPrintf('Symbolizing {} PCs from {}...',	1121 _SubPrintf('Symbolizing {} PCs from {}...',

431 len(symfile.frames_by_address),	1122 len(symfile.frames_by_address),

432 symfile.path)	1123 symfile.path)

433	1124

434 symbolizer.Symbolize(symfile, unsymbolized_name)	1125 symbolizer.Symbolize(symfile, unsymbolized_name)

435 symbolized = True

436	1126

437 return symbolized	1127

	1128 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)

	1129 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available

	1130 # via 'name' group.

	1131 ANDROID_PATH_MATCHER = re.compile(

	1132 r'^/data/(?:'

	1133 r'app/[^/]+/lib/[^/]+/\|'

	1134 r'app-lib/[^/]+/\|'

	1135 r'data/[^/]+/incremental-install-files/lib/'

	1136 r')(?P<name>.*\.so)')

	1137

	1138 # Subpath of output path where unstripped libraries are stored.

	1139 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'

438	1140

439	1141

440 def HaveFilesFromAndroid(symfiles):	1142 def HaveFilesFromAndroid(symfiles):

441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)	1143 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)

442	1144

443	1145

444 def RemapAndroidFiles(symfiles, output_path):	1146 def RemapAndroidFiles(symfiles, output_path):

445 for symfile in symfiles:	1147 for symfile in symfiles:

446 match = ANDROID_PATH_MATCHER.match(symfile.path)	1148 match = ANDROID_PATH_MATCHER.match(symfile.path)

447 if match:	1149 if match:

448 name = match.group('name')	1150 name = match.group('name')

449 symfile.symbolizable_path = os.path.join(	1151 symfile.symbolizable_path = os.path.join(

450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)	1152 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)

451 else:	1153 else:

452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().	1154 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().

453 # Without this, files won't be symbolized with "file not found" problem,	1155 # Without this, files won't be symbolized with "file not found" problem,

454 # which is not accurate.	1156 # which is not accurate.

455 symfile.symbolizable_path = 'android://{}'.format(symfile.path)	1157 symfile.symbolizable_path = 'android://{}'.format(symfile.path)

456	1158

457	1159

	1160 def Symbolize(options, trace, symbolizer):

	1161 symfiles = ResolveSymbolizableFiles(trace.processes)

	1162

	1163 # Android trace files don't have any indication they are from Android.
	Primiano Tucci (use gerrit) 2017/05/03 17:25:04 As per discussion offline, maybe specify: traces c As per discussion offline, maybe specify: traces captured on Android via chrome://inspect?tracing don't seem to have all the metadata necessary to figure out the platform.... fmeawad 2017/05/03 18:19:34 look for os-name in the metadata Show quoted text On 2017/05/03 17:25:04, Primiano Tucci wrote: > As per discussion offline, maybe specify: traces captured on Android via > chrome://inspect?tracing don't seem to have all the metadata necessary to figure > out the platform.... look for os-name in the metadata
	1164 # So we're checking for Android-specific paths.

	1165 if HaveFilesFromAndroid(symfiles):

	1166 if not options.output_directory:

	1167 sys.exit('The trace file appears to be from Android. Please '

	1168 'specify output directory to properly symbolize it.')

	1169 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))

	1170

	1171 SymbolizeFiles(symfiles, symbolizer)

	1172

	1173

	1174 def OpenTraceFile(file_path, mode):

	1175 if file_path.endswith('.gz'):

	1176 return gzip.open(file_path, mode + 'b')

	1177 else:

	1178 return open(file_path, mode + 't')

	1179

	1180

458 # Suffix used for backup files.	1181 # Suffix used for backup files.

459 BACKUP_FILE_TAG = '.BACKUP'	1182 BACKUP_FILE_TAG = '.BACKUP'

460	1183

461 def main():	1184 def main():

462 parser = argparse.ArgumentParser()	1185 class MultilineHelpFormatter(argparse.HelpFormatter):
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 For a one file python script, having a custom form For a one file python script, having a custom formatter for argparse feels imho a bit overshooting . Is it really worth the extra complexity here? DmitrySkiba 2017/05/04 00:30:55 Hmm, actually this is a leftover from a version th Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > For a one file python script, having a custom formatter for argparse feels imho > a bit overshooting . Is it really worth the extra complexity here? Hmm, actually this is a leftover from a version that had categorization. Removed.
463 parser.add_argument('file',	1186 def _split_lines(self, text, width):

464 help='Trace file to symbolize (.json or .json.gz)')	1187 extra_lines = []

465 parser.add_argument('--no-backup',	1188 if '\n' in text:

466 dest='backup', default='true', action='store_false',	1189 lines = text.splitlines()

467 help="Don't create {} files".format(BACKUP_FILE_TAG))	1190 text = lines[0]

468 parser.add_argument('--output-directory',	1191 extra_lines = lines[1:]

469 help='The path to the build output directory, such ' +	1192 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \

470 'as out/Debug. Only needed for Android.')	1193 extra_lines

471 options = parser.parse_args()

472	1194

473 trace_file_path = options.file	1195 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter)

474 def _OpenTraceFile(mode):	1196 parser.add_argument(

475 if trace_file_path.endswith('.gz'):	1197 'file',

476 return gzip.open(trace_file_path, mode + 'b')	1198 help='Trace file to symbolize (.json or .json.gz)')

477 else:	1199

478 return open(trace_file_path, mode + 't')	1200 parser.add_argument(

	1201 '--no-backup', dest='backup', default='true', action='store_false',

	1202 help="Don't create {} files".format(BACKUP_FILE_TAG))

	1203

	1204 parser.add_argument(

	1205 '--output-directory',

	1206 help='The path to the build output directory, such as out/Debug.')

479	1207

480 symbolizer = Symbolizer()	1208 symbolizer = Symbolizer()

481 if symbolizer.symbolizer_path is None:	1209 if symbolizer.symbolizer_path is None:

482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)	1210 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)

483	1211

	1212 options = parser.parse_args()

	1213

	1214 trace_file_path = options.file

	1215

484 print 'Reading trace file...'	1216 print 'Reading trace file...'

485 with _OpenTraceFile('r') as trace_file:	1217 with OpenTraceFile(trace_file_path, 'r') as trace_file:

486 trace = json.load(trace_file)	1218 trace = Trace(json.load(trace_file))

487	1219

488 processes = CollectProcesses(trace)	1220 Symbolize(options, trace, symbolizer)

489 symfiles = ResolveSymbolizableFiles(processes)

490	1221

491 # Android trace files don't have any indication they are from Android.	1222 if trace.modified:

492 # So we're checking for Android-specific paths.	1223 trace.ApplyModifications()

493 if HaveFilesFromAndroid(symfiles):

494 if not options.output_directory:

495 parser.error('The trace file appears to be from Android. Please '

496 "specify output directory (e.g. 'out/Debug') to properly "

497 'symbolize it.')

498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))

499	1224

500 if SymbolizeFiles(symfiles, symbolizer):

501 if options.backup:	1225 if options.backup:

502 backup_file_path = trace_file_path + BACKUP_FILE_TAG	1226 backup_file_path = trace_file_path + BACKUP_FILE_TAG

503 print 'Backing up trace file to {}...'.format(backup_file_path)	1227 if os.path.exists(backup_file_path):
	Primiano Tucci (use gerrit) 2017/05/03 17:25:05 isn't this a bit too much and really worth the com isn't this a bit too much and really worth the complexity? I mean if the backup file exists already, shame, will be overwritten. I don't think we care about this special case so much to justify this extra complexity DmitrySkiba 2017/05/04 00:30:55 Also a leftover from a previous versions. Removed. Show quoted text On 2017/05/03 17:25:05, Primiano Tucci wrote: > isn't this a bit too much and really worth the complexity? > I mean if the backup file exists already, shame, will be overwritten. I don't > think we care about this special case so much to justify this extra complexity Also a leftover from a previous versions. Removed.
	1228 for i in itertools.count(1):

	1229 unique_file_path = '{}{}'.format(backup_file_path, i)

	1230 if not os.path.exists(unique_file_path):

	1231 backup_file_path = unique_file_path

	1232 break

	1233 print 'Backing up trace file to {}'.format(backup_file_path)

504 os.rename(trace_file_path, backup_file_path)	1234 os.rename(trace_file_path, backup_file_path)

505	1235

506 print 'Updating trace file...'	1236 print 'Updating the trace file...'

507 with _OpenTraceFile('w') as trace_file:	1237 with OpenTraceFile(trace_file_path, 'w') as trace_file:

508 json.dump(trace, trace_file)	1238 json.dump(trace.node, trace_file)

509 else:	1239 else:

510 print 'No PCs symbolized - not updating trace file.'	1240 print 'No modifications were made - not updating the trace file.'

511	1241

512	1242

513 if __name__ == '__main__':	1243 if __name__ == '__main__':

514 main()	1244 main()

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »