tracing/bin/symbolize_trace - Issue 2810523002: symbolize_trace: support new heap dump format.

Side by Side Diff: tracing/bin/symbolize_trace

Issue 2810523002: symbolize_trace: support new heap dump format. (Closed)

Patch Set: ParseMore -> ParseNext Created 3 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 #!/usr/bin/env python	1 #!/usr/bin/env python

2 # Copyright 2016 The Chromium Authors. All rights reserved.	2 # Copyright 2016 The Chromium Authors. All rights reserved.

3 # Use of this source code is governed by a BSD-style license that can be	3 # Use of this source code is governed by a BSD-style license that can be

4 # found in the LICENSE file.	4 # found in the LICENSE file.

5	5

	6 """

	7 This script processes trace files and symbolizes stack frames generated by

	8 Chrome's native heap profiler.

	9

	10 === Overview ===

	11

	12 Trace file is essentially a giant JSON array of dictionaries (events).

	13 Events have some predefined keys, but otherwise are free to have anything

	14 inside. Trace file contains events from all Chrome processes that were

	15 sampled during tracing period (and 'pid' is an example of a predefined key).
	Wez 2017/04/29 00:41:21 nit: "(and 'pid' is an example..." reads oddly her nit: "(and 'pid' is an example..." reads oddly here - it's not clear why you are mentioning it at this point. Is it just so you can use it in examples, or is it intended to give an idea of what the dictionaries contain? If the latter then I'd suggest moving the example to come after "have some predefined keys". DmitrySkiba 2017/05/02 06:19:59 Done. Show quoted text On 2017/04/29 00:41:21, Wez wrote: > nit: "(and 'pid' is an example..." reads oddly here - it's not clear why you are > mentioning it at this point. Is it just so you can use it in examples, or is it > intended to give an idea of what the dictionaries contain? If the latter then > I'd suggest moving the example to come after "have some predefined keys". Done.
	16

	17 This script cares only about memory dump events generated by memory-infra
	Wez 2017/04/29 00:41:21 nit: Suggest "...dump events in trace files genera nit: Suggest "...dump events in trace files generated with the memory-infra component enabled." DmitrySkiba 2017/05/02 06:19:59 Done. Show quoted text On 2017/04/29 00:41:21, Wez wrote: > nit: Suggest "...dump events in trace files generated with the memory-infra > component enabled." Done.
	18 component.

	19

	20 When Chrome native heap profiling is enabled, some memory dump events

	21 include the following extra information:

	22

	23 * (Per allocator) Information about live allocations at the moment of the

	24 memory dump (the information includes backtraces, types / categories,

	25 sizes, and counts of allocations). There are several allocators in

	26 Chrome: malloc, blink_gc, and partition_alloc.
	Wez 2017/04/29 00:41:21 nit: If these are examples, not an exhaustive list nit: If these are examples, not an exhaustive list then precede with "e.g." DmitrySkiba 2017/05/02 06:19:59 This is actually an exhaustive list. Show quoted text On 2017/04/29 00:41:21, Wez wrote: > nit: If these are examples, not an exhaustive list then precede with "e.g." This is actually an exhaustive list. Wez 2017/05/03 00:17:09 OK; in that case I would say "There are three allo Show quoted text On 2017/05/02 06:19:59, DmitrySkiba wrote: > On 2017/04/29 00:41:21, Wez wrote: > > nit: If these are examples, not an exhaustive list then precede with "e.g." > > This is actually an exhaustive list. OK; in that case I would say "There are three allocators in Chrome (malloc, ...)" - being specific about the number helps the reader understand that it's intended to be exhaustive. (And be aware that shortly after you commit this CL, someone will add the N+1th allocator ;) DmitrySkiba 2017/05/04 00:30:55 Acknowledged. Show quoted text On 2017/05/03 00:17:09, Wez wrote: > On 2017/05/02 06:19:59, DmitrySkiba wrote: > > On 2017/04/29 00:41:21, Wez wrote: > > > nit: If these are examples, not an exhaustive list then precede with "e.g." > > > > This is actually an exhaustive list. > > OK; in that case I would say "There are three allocators in Chrome (malloc, > ...)" - being specific about the number helps the reader understand that it's > intended to be exhaustive. > > (And be aware that shortly after you commit this CL, someone will add the N+1th > allocator ;) Acknowledged.
	27

	28 * (Per process) Stack frame tree of all functions that called allocators

	29 above.
	Wez 2017/04/29 00:41:21 nit: If we failed to trace all the way back to mai nit: If we failed to trace all the way back to main() then might it technically be multiple trees? DmitrySkiba 2017/05/02 06:19:59 It's still a single tree, just with an implicit ro Show quoted text On 2017/04/29 00:41:21, Wez wrote: > nit: If we failed to trace all the way back to main() then might it technically > be multiple trees? It's still a single tree, just with an implicit root node. Wez 2017/05/03 00:17:10 OK; you could add a brief note that effect here, f Show quoted text On 2017/05/02 06:19:59, DmitrySkiba wrote: > On 2017/04/29 00:41:21, Wez wrote: > > nit: If we failed to trace all the way back to main() then might it > technically > > be multiple trees? > > It's still a single tree, just with an implicit root node. OK; you could add a brief note that effect here, for clarity, but up to you. DmitrySkiba 2017/05/04 00:30:55 Acknowledged. Show quoted text On 2017/05/03 00:17:10, Wez wrote: > On 2017/05/02 06:19:59, DmitrySkiba wrote: > > On 2017/04/29 00:41:21, Wez wrote: > > > nit: If we failed to trace all the way back to main() then might it > > technically > > > be multiple trees? > > > > It's still a single tree, just with an implicit root node. > > OK; you could add a brief note that effect here, for clarity, but up to you. Acknowledged.
	30

	31 This script does the following:

	32

	33 1. Parses the given trace file.

	34 2. Finds memory dump events and parses stack frame tree for each process.

	35 3. Finds stack frames that have PC addresses instead of function names.

	36 4. Symbolizes these PCs.

	37 6. Rewrites stack frame names (this updates parts of memory dump events).
	Wez 2017/04/29 00:41:21 nit: You're missing #5 ;) It's also not clear wha nit: You're missing #5 ;) It's also not clear what the difference is between symbolizing and rewriting the stack frame name - is the distinction meaningful here? One detail that isn't obvious [to me] which may be worth mentioning: do we do anything clever if two PCs symbolize to the same thing? DmitrySkiba 2017/05/02 06:19:59 Done. Added note about script not coalescing such Show quoted text On 2017/04/29 00:41:21, Wez wrote: > nit: You're missing #5 ;) > > It's also not clear what the difference is between symbolizing and rewriting the > stack frame name - is the distinction meaningful here? > > One detail that isn't obvious [to me] which may be worth mentioning: do we do > anything clever if two PCs symbolize to the same thing? Done. Added note about script not coalescing such entries. Wez 2017/05/03 00:17:09 Acknowledged. Show quoted text On 2017/05/02 06:19:59, DmitrySkiba wrote: > On 2017/04/29 00:41:21, Wez wrote: > > nit: You're missing #5 ;) > > > > It's also not clear what the difference is between symbolizing and rewriting > the > > stack frame name - is the distinction meaningful here? > > > > One detail that isn't obvious [to me] which may be worth mentioning: do we do > > anything clever if two PCs symbolize to the same thing? > > Done. Added note about script not coalescing such entries. Acknowledged.
	38 7. Updates the trace file.

	39

	40 === Details ===

	41

	42 There are two formats of heap profiler information: legacy and modern. The

	43 main differences are:

	44

	45 * In the legacy format stack frame tree is not dumped in memory dump events,

	46 but in metadata events (one per process). I.e. it's sufficient to parse

	47 a single metadata event to get full stack frame tree for a process.
	Wez 2017/04/29 00:41:21 IIUC the point here is that every "event" in a leg IIUC the point here is that every "event" in a legacy dump is a complete dump of everything live at that point - I'm not sure that the memory dump vs metadata distinction is worth mentioning here? DmitrySkiba 2017/05/02 06:19:59 Both formats dump live objects per allocator in ea Show quoted text On 2017/04/29 00:41:21, Wez wrote: > IIUC the point here is that every "event" in a legacy dump is a complete dump of > everything live at that point - I'm not sure that the memory dump vs metadata > distinction is worth mentioning here? Both formats dump live objects per allocator in each memory dump event. The difference is that in modern format no information escapes memory dump event, hence the stack frame tree (which is constantly updated) is dumped incrementally. In the legacy format the information escaped in form of special metadata events, which were dumped once. I've added an extensive explanation. Wez 2017/05/03 00:17:10 Thanks for adding this detail, however it seems a Show quoted text On 2017/05/02 06:19:59, DmitrySkiba wrote: > On 2017/04/29 00:41:21, Wez wrote: > > IIUC the point here is that every "event" in a legacy dump is a complete dump > of > > everything live at that point - I'm not sure that the memory dump vs metadata > > distinction is worth mentioning here? > > Both formats dump live objects per allocator in each memory dump event. The > difference is that in modern format no information escapes memory dump event, > hence the stack frame tree (which is constantly updated) is dumped > incrementally. In the legacy format the information escaped in form of special > metadata events, which were dumped once. I've added an extensive explanation. Thanks for adding this detail, however it seems a lot of documentation for what could be expressed more concisely, if I understand correctly, e.g: -- In both the old and new formats each memory dump includes a complete list of all active allocations, each including references to metadata describing the allocation stack, object type, etc. In the old format the metadata is accumulated over the course of trace recording, and appended to it as a single metadata event, separate from the individual memory dumps. To process the dumps in a trace the metadata must first be processed, in its entirety. In the new format each memory dump includes any new metadata introduced by that dump, to add to any metadata accumulated from preceding dumps in the same recording. Each dump in the recording can therefore be processed as soon as it is encountered. For simplicity this script updates the metadata and writes it out as part of the first memory dump entry in the symbolized recording. -- IIUC the key thing to get across here is that this makes parsing for memory-dumps a single-pass operating where previously it was effectively two-pass (one to find the metadata, one to parse the dumps). DmitrySkiba 2017/05/04 00:30:55 Well, the section is named "Details", and details Show quoted text On 2017/05/03 00:17:10, Wez wrote: > On 2017/05/02 06:19:59, DmitrySkiba wrote: > > On 2017/04/29 00:41:21, Wez wrote: > > > IIUC the point here is that every "event" in a legacy dump is a complete > dump > > of > > > everything live at that point - I'm not sure that the memory dump vs > metadata > > > distinction is worth mentioning here? > > > > Both formats dump live objects per allocator in each memory dump event. The > > difference is that in modern format no information escapes memory dump event, > > hence the stack frame tree (which is constantly updated) is dumped > > incrementally. In the legacy format the information escaped in form of special > > metadata events, which were dumped once. I've added an extensive explanation. > > Thanks for adding this detail, however it seems a lot of documentation for what > could be expressed more concisely, if I understand correctly, e.g: > > -- > In both the old and new formats each memory dump includes a complete list of all > active allocations, each including references to metadata describing the > allocation stack, object type, etc. > > In the old format the metadata is accumulated over the course of trace > recording, and appended to it as a single metadata event, separate from the > individual memory dumps. To process the dumps in a trace the metadata must first > be processed, in its entirety. > > In the new format each memory dump includes any new metadata introduced by that > dump, to add to any metadata accumulated from preceding dumps in the same > recording. Each dump in the recording can therefore be processed as soon as it > is encountered. > > For simplicity this script updates the metadata and writes it out as part of the > first memory dump entry in the symbolized recording. > -- > > IIUC the key thing to get across here is that this makes parsing for > memory-dumps a single-pass operating where previously it was effectively > two-pass (one to find the metadata, one to parse the dumps). Well, the section is named "Details", and details it provides :) Your understanding is close to what happens, modulo naming. So I guess my explanation worked!
	48

	49 * In the modern format stack frame tree (also type name and string mappings)

	50 are dumped incrementally. I.e. each memory dump event carries additions to

	51 the stack frame tree that occurred since the previous memory dump event.
	Wez 2017/04/29 00:41:21 You might express this as each memory-infra event You might express this as each memory-infra event being incremental, and then separately talk about the type name and string mappings being handled the same way, for clarity. DmitrySkiba 2017/05/02 06:19:59 Done. Show quoted text On 2017/04/29 00:41:21, Wez wrote: > You might express this as each memory-infra event being incremental, and then > separately talk about the type name and string mappings being handled the same > way, for clarity. Done.
	52 To get the full stack frame tree for a process the script needs to parse

	53 all memory dump events. However, when wrappers update incremental nodes,

	54 they put everything in the first node, and clear all others.
	Wez 2017/04/29 00:41:21 Not sure what you mean about moving everything int Not sure what you mean about moving everything into the first node. DmitrySkiba 2017/05/02 06:19:59 Explained more. Show quoted text On 2017/04/29 00:41:21, Wez wrote: > Not sure what you mean about moving everything into the first node. Explained more.
	55

	56 * In the modern format stack frame tree doesn't reference name strings

	57 directly, but through a string mapping table.

	58

	59 See crbug.com/708930 for more information about the modern format.

	60 """

	61

6 import argparse	62 import argparse

7 import bisect	63 import bisect

8 import collections	64 import collections

9 import gzip	65 import gzip

	66 import itertools

10 import json	67 import json

11 import os	68 import os

12 import re	69 import re

13 import subprocess	70 import subprocess

14 import sys	71 import sys

15	72

16 _SYMBOLS_PATH = os.path.abspath(os.path.join(	73 _SYMBOLS_PATH = os.path.abspath(os.path.join(

17 os.path.dirname(os.path.realpath(__file__)),	74 os.path.dirname(os.path.realpath(__file__)),

18 '..',	75 '..',

19 'third_party',	76 'third_party',

20 'symbols'))	77 'symbols'))

21 sys.path.append(_SYMBOLS_PATH)	78 sys.path.append(_SYMBOLS_PATH)

22 # pylint: disable=import-error	79 # pylint: disable=import-error

23 import symbols.elf_symbolizer as elf_symbolizer	80 import symbols.elf_symbolizer as elf_symbolizer

24	81

25 import symbolize_trace_atos_regex	82 import symbolize_trace_atos_regex

26 import symbolize_trace_macho_reader	83 import symbolize_trace_macho_reader

27	84

28	85

29 # Relevant trace event phases from Chromium's	86 class NodeWrapper(object):

30 # src/base/trace_event/common/trace_event_common.h.	87 """Wraps an event data node(s).

31 TRACE_EVENT_PHASE_METADATA = 'M'	88

32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v'	89 A node is a reference into a trace event JSON. Wrappers parse nodes to

	90 provide convenient APIs and update nodes when asked to propagate changes

	91 back (see ApplyModifications() below).

	92

	93 Here is an example of legacy metadata event that contains stack frame tree:

	94

	95 {

	96 "args": {

	97 "stackFrames": { ... }

	98 },

	99 "cat": "__metadata",

	100 "name": "stackFrames",

	101 "ph": "M",

	102 ...

	103 }

	104

	105 When this event is encountered, a reference to the "stackFrames" dictionary

	106 is obtained and passed down to a specific wrapped class, which knows how to

	107 parse / update the dictionary.

	108

	109 There are two parsing patterns depending on whether node is serialized

	110 incrementally:

	111

	112 * If node is not incremental, then parsing is done by __init__(),

	113 see MemoryMap for an example.

	114

	115 * If node is incremental, then __init__() does nothing, and ParseNext()

	116 is called when next node (from a next event) is encountered.

	117

	118 Some wrappers can also modify nodes they parsed. In such cases they have

	119 additional APIs:

	120

	121 * 'modified' flag, which indicates whether the wrapper was changed.

	122

	123 * 'ApplyModifications' method, which propagates changes made to the wrapper

	124 back to nodes. Successful invocation of ApplyModifications() resets

	125 'modified' flag.

	126

	127 """

	128

	129 # def __init__(self, node):

	130 # ...

	131

	132 # def ParseNext(self, node, ...):

	133 # ...

	134

	135 # @property

	136 # def modified(self):

	137 # ...

	138

	139 # def ApplyModifications(self, ...):

	140 # ...

	141

	142 pass

33	143

34	144

35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)	145 class MemoryMap(NodeWrapper):

36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available	146 """Wraps 'process_mmaps' node.

37 # via 'name' group.

38 ANDROID_PATH_MATCHER = re.compile(

39 r'^/data/(?:'

40 r'app/[^/]+/lib/[^/]+/\|'

41 r'app-lib/[^/]+/\|'

42 r'data/[^/]+/incremental-install-files/lib/'

43 r')(?P<name>.*\.so)')

44	147

45 # Subpath of output path where unstripped libraries are stored.	148 'process_mmaps' node contains information about file mappings.

46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'

47	149

48	150 "process_mmaps": {

49 def FindInSystemPath(binary_name):	151 "vm_regions": [

50 paths = os.environ['PATH'].split(os.pathsep)	152 {

51 for path in paths:	153 "mf": "<file_path>",

52 binary_path = os.path.join(path, binary_name)	154 "sa": "<start_address>",

53 if os.path.isfile(binary_path):	155 "sz": "<size>",

54 return binary_path	156 ...

55 return None	157 },

56	158 ...

57	159 ]

58 class Symbolizer(object):	160 }

59 # Encapsulates platform-specific symbolization logic.	161 """

60 def __init__(self):

61 self.is_mac = sys.platform == 'darwin'

62 self.is_win = sys.platform == 'win32'

63 if self.is_mac:

64 self.binary = 'atos'

65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()

66 elif self.is_win:

67 self.binary = 'addr2line-pdb.exe'

68 else:

69 self.binary = 'addr2line'

70 self.symbolizer_path = FindInSystemPath(self.binary)

71

72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):

73 def _SymbolizerCallback(sym_info, frames):

74 # Unwind inline chain to the top.

75 while sym_info.inlined_by:

76 sym_info = sym_info.inlined_by

77

78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name

79 for frame in frames:

80 frame.name = symbolized_name

81

82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,

83 self.symbolizer_path,

84 _SymbolizerCallback,

85 inlines=True)

86

87 for address, frames in symfile.frames_by_address.iteritems():

88 # SymbolizeAsync() asserts that the type of address is int. We operate

89 # on longs (since they are raw pointers possibly from 64-bit processes).

90 # It's OK to cast here because we're passing relative PC, which should

91 # always fit into int.

92 symbolizer.SymbolizeAsync(int(address), frames)

93

94 symbolizer.Join()

95

96

97 def _SymbolizeMac(self, symfile):

98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))

99

100 # 16 for the address, 2 for "0x", 1 for the space

101 chars_per_address = 19

102

103 load_address = (symbolize_trace_macho_reader.

104 ReadMachOTextLoadAddress(symfile.symbolizable_path))

105 assert load_address is not None

106

107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',

108 '0x%x' % load_address, '-o',

109 symfile.symbolizable_path]

110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1

111

112 # The maximum number of inputs that can be processed at once is limited by

113 # ARG_MAX. This currently evalutes to ~13000 on macOS.

114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address

115

116 all_keys = symfile.frames_by_address.keys()

117 processed_keys_count = 0

118 while len(all_keys):

119 input_count = min(len(all_keys), max_inputs)

120 keys_to_process = all_keys[0:input_count]

121

122 cmd = list(cmd_base)

123 cmd.extend([hex(int(x) + load_address)

124 for x in keys_to_process])

125 output_array = subprocess.check_output(cmd).split('\n')

126 for i in range(len(keys_to_process)):

127 for frame in (symfile.frames_by_address.values()

128 [i + processed_keys_count]):

129 frame.name = self._matcher.Match(output_array[i])

130 processed_keys_count += len(keys_to_process)

131 all_keys = all_keys[input_count:]

132

133

134 def _SymbolizeWin(self, symfile):

135 """Invoke symbolizer binary on windows and write all input in one go.

136

137 Unlike linux, on windows, symbolization talks through a shared system

138 service that handles communication with the NT symbol servers. This

139 creates an explicit serialization (and therefor lock contention) of

140 any process using the symbol API for files do not have a local PDB.

141

142 Thus, even though the windows symbolizer binary can be make command line

143 compatible with the POSIX addr2line interface, paralellizing the

144 symbolization does not yield the same performance effects. Running

145 just one symbolizer seems good enough for now. Can optimize later

146 if this becomes a bottleneck.

147 """

148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',

149 symfile.symbolizable_path]

150

151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,

152 stderr=sys.stderr)

153 addrs = ["%x" % relative_pc for relative_pc in

154 symfile.frames_by_address.keys()]

155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))

156 stdout_data = stdout_data.split('\n')

157

158 # This is known to be in the same order as stderr_data.

159 for i, addr in enumerate(addrs):

160 for frame in symfile.frames_by_address[int(addr, 16)]:

161 # Output of addr2line with --functions is always 2 outputs per

162 # symbol, function name followed by source line number. Only grab

163 # the function name as line info is not always available.

164 frame.name = stdout_data[i * 2]

165

166

167 def Symbolize(self, symfile, unsymbolized_name):

168 if self.is_mac:

169 self._SymbolizeMac(symfile)

170 if self.is_win:

171 self._SymbolizeWin(symfile)

172 else:

173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)

174

175

176 def IsSymbolizableFile(self, file_path):

177 if self.is_win:

178 extension = os.path.splitext(file_path)[1].lower()

179 return extension in ['.dll', '.exe']

180 else:

181 result = subprocess.check_output(['file', '-0', file_path])

182 type_string = result[result.find('\0') + 1:]

183 return bool(re.match(r'.(ELF\|Mach-O) (32\|64)-bit\b.',

184 type_string, re.DOTALL))

185

186

187 class ProcessMemoryMaps(object):

188 """Represents 'process_mmaps' trace file entry."""

189	162

190 class Region(object):	163 class Region(object):

191 def __init__(self, start_address, size, file_path):	164 def __init__(self, start_address, size, file_path):

192 self._start_address = start_address	165 self._start_address = start_address

193 self._size = size	166 self._size = size

194 self._file_path = file_path	167 self._file_path = file_path

195	168

196 @property	169 @property

197 def start_address(self):	170 def start_address(self):

198 return self._start_address	171 return self._start_address

(...skipping 15 matching lines...) Expand all Loading...
214 return long(self._start_address).__cmp__(long(other._start_address))	187 return long(self._start_address).__cmp__(long(other._start_address))

215 elif isinstance(other, (long, int)):	188 elif isinstance(other, (long, int)):

216 return long(self._start_address).__cmp__(long(other))	189 return long(self._start_address).__cmp__(long(other))

217 else:	190 else:

218 raise Exception('Cannot compare with %s' % type(other))	191 raise Exception('Cannot compare with %s' % type(other))

219	192

220 def __repr__(self):	193 def __repr__(self):

221 return 'Region(0x{:X} - 0x{:X}, {})'.format(	194 return 'Region(0x{:X} - 0x{:X}, {})'.format(

222 self.start_address, self.end_address, self.file_path)	195 self.start_address, self.end_address, self.file_path)

223	196

224 def __init__(self, process_mmaps):	197 def __init__(self, process_mmaps_node):

225 """Parses 'process_mmaps' dictionary."""

226

227 regions = []	198 regions = []

228 for region_value in process_mmaps['vm_regions']:	199 for region_node in process_mmaps_node['vm_regions']:

229 regions.append(self.Region(	200 regions.append(self.Region(

230 long(region_value['sa'], 16),	201 long(region_node['sa'], 16),

231 long(region_value['sz'], 16),	202 long(region_node['sz'], 16),

232 region_value['mf']))	203 region_node['mf']))

233 regions.sort()	204 regions.sort()

234	205

235 # Copy regions without duplicates and check for overlaps.	206 # Copy regions without duplicates and check for overlaps.

236 self._regions = []	207 self._regions = []

237 previous_region = None	208 previous_region = None

238 for region in regions:	209 for region in regions:

239 if previous_region is not None:	210 if previous_region is not None:

240 if region == previous_region:	211 if region == previous_region:

241 continue	212 continue

242 assert region.start_address >= previous_region.end_address, \	213 assert region.start_address >= previous_region.end_address, \

243 'Regions {} and {} overlap.'.format(previous_region, region)	214 'Regions {} and {} overlap.'.format(previous_region, region)

244 previous_region = region	215 previous_region = region

245 self._regions.append(region)	216 self._regions.append(region)

246	217

247 @property	218 @property

248 def regions(self):	219 def regions(self):

249 return self._regions	220 return self._regions

250	221

251 def FindRegion(self, address):	222 def FindRegion(self, address):

252 """Finds region containing \|address\|. Returns None if none found."""	223 """Finds region containing \|address\|. Returns None if none found."""

253	224

254 region_index = bisect.bisect_right(self._regions, address) - 1	225 region_index = bisect.bisect_right(self._regions, address) - 1

255 if region_index >= 0:	226 if region_index >= 0:

256 region = self._regions[region_index]	227 region = self._regions[region_index]

257 if address >= region.start_address and address < region.end_address:	228 if address >= region.start_address and address < region.end_address:

258 return region	229 return region

259 return None	230 return None

260	231

261	232

262 class StackFrames(object):	233 class UnsupportedHeapDumpVersionError(Exception):

263 """Represents 'stackFrames' trace file entry."""	234 """Helper exception class to signal unsupported heap dump version."""

264	235

265 class PCFrame(object):	236 def __init__(self, version):

266 def __init__(self, pc, frame):	237 message = 'Unsupported heap dump version: {}'.format(version)

	238 super(UnsupportedHeapDumpVersionError, self).__init__(message)

	239

	240

	241 class StringMap(NodeWrapper):

	242 """Wraps all 'strings' nodes for a process.

	243

	244 'strings' node contains incremental mappings between integer ids and strings.

	245

	246 "strings": [

	247 {

	248 "id": <string_id>,

	249 "string": <string>

	250 },

	251 ...

	252 ]

	253 """

	254

	255 def __init__(self):

	256 self._modified = False

	257 self._strings_nodes = []

	258 self._string_by_id = {}

	259 self._id_by_string = {}

	260 self._max_string_id = 0

	261

	262 @property

	263 def modified(self):

	264 """Returns True if the wrapper was modified (see NodeWrapper)."""

	265 return self._modified

	266

	267 @property

	268 def string_by_id(self):

	269 return self._string_by_id

	270

	271 def ParseNext(self, heap_dump_version, strings_node):

	272 """Parses and interns next node (see NodeWrapper)."""

	273

	274 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

	275 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	276

	277 self._strings_nodes.append(strings_node)

	278 for string_node in strings_node:

	279 self._Insert(string_node['id'], string_node['string'])

	280

	281 def Clear(self):

	282 """Clears all string mappings."""

	283 if self._string_by_id:

	284 self._modified = True

	285 self._string_by_id = {}

	286 self._id_by_string = {}

	287 self._Insert(0, '[null]')

	288 self._max_string_id = 0

	289

	290 def AddString(self, string):

	291 """Adds a string (if it doesn't exist) and returns its integer id."""

	292 string_id = self._id_by_string.get(string)

	293 if string_id is None:

	294 string_id = self._max_string_id + 1

	295 self._Insert(string_id, string)

	296 self._modified = True

	297 return string_id

	298

	299 def ApplyModifications(self):

	300 """Propagates modifications back to nodes (see NodeWrapper)."""

	301 if not self.modified:

	302 return

	303

	304 assert self._strings_nodes, 'no nodes'

	305

	306 # Serialize into the first node, and clear all others.

	307

	308 for strings_node in self._strings_nodes:

	309 del strings_node[:]

	310 strings_node = self._strings_nodes[0]

	311 for string_id, string in self._string_by_id.iteritems():

	312 strings_node.append({'id': string_id, 'string': string})

	313

	314 self._modified = False

	315

	316 def _Insert(self, string_id, string):

	317 self._id_by_string[string] = string_id

	318 self._string_by_id[string_id] = string

	319 self._max_string_id = max(self._max_string_id, string_id)

	320

	321

	322 class TypeNameMap(NodeWrapper):

	323 """Wraps all 'types' nodes for a process.

	324

	325 'types' nodes encode mappings between integer type ids and integer

	326 string ids (from 'strings' nodes).

	327

	328 "types": [

	329 {

	330 "id": <type_id>,

	331 "name_sid": <name_string_id>

	332 }

	333 ...

	334 ]

	335

	336 For simplicity string ids are translated into strings during parsing,

	337 and then translated back to ids in ApplyModifications().

	338 """

	339 def __init__(self):

	340 self._modified = False

	341 self._type_name_nodes = []

	342 self._name_by_id = {}

	343 self._id_by_name = {}

	344 self._max_type_id = 0

	345

	346 @property

	347 def modified(self):

	348 """Returns True if the wrapper was modified (see NodeWrapper)."""

	349 return self._modified

	350

	351 @property

	352 def name_by_id(self):

	353 """Returns {id -> name} dict (must not be changed directly)."""

	354 return self._name_by_id

	355

	356 def ParseNext(self, heap_dump_version, type_name_node, string_map):

	357 """Parses and interns next node (see NodeWrapper).

	358

	359 \|string_map\| - A StringMap object to use to translate string ids

	360 to strings.

	361 """

	362 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

	363 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	364

	365 self._type_name_nodes.append(type_name_node)

	366 for type_node in type_name_node:

	367 self._Insert(type_node['id'],

	368 string_map.string_by_id[type_node['name_sid']])

	369

	370 def AddType(self, type_name):

	371 """Adds a type name (if it doesn't exist) and returns its id."""

	372 type_id = self._id_by_name.get(type_name)

	373 if type_id is None:

	374 type_id = self._max_type_id + 1

	375 self._Insert(type_id, type_name)

	376 self._modified = True

	377 return type_id

	378

	379 def ApplyModifications(self, string_map, force=False):

	380 """Propagates modifications back to nodes.

	381

	382 \|string_map\| - A StringMap object to use to translate strings to ids.

	383 \|force\| - Whether to propagate changes regardless of 'modified' flag.

	384 """

	385 if not self.modified and not force:

	386 return

	387

	388 assert self._type_name_nodes, 'no nodes'

	389

	390 # Serialize into the first node, and clear all others.

	391

	392 for types_node in self._type_name_nodes:

	393 del types_node[:]

	394 types_node = self._type_name_nodes[0]

	395 for type_id, type_name in self._name_by_id.iteritems():

	396 types_node.append({

	397 'id': type_id,

	398 'name_sid': string_map.AddString(type_name)})

	399

	400 self._modified = False

	401

	402 def _Insert(self, type_id, type_name):

	403 self._id_by_name[type_name] = type_id

	404 self._name_by_id[type_id] = type_name

	405 self._max_type_id = max(self._max_type_id, type_id)

	406

	407

	408 class StackFrameMap(NodeWrapper):

	409 """ Wraps stack frame tree nodes for a process.

	410

	411 For the legacy format this wrapper expects a single 'stackFrames' node

	412 (which comes from metadata event):

	413

	414 "stackFrames": {

	415 "<frame_id>": {

	416 "name": "<frame_name>"

	417 "parent": "<parent_frame_id>"

	418 },

	419 ...

	420 }

	421

	422 For the modern format this wrapper expects several 'nodes' nodes:

	423

	424 "nodes": [

	425 {

	426 "id": <frame_id>,

	427 "parent": <parent_frame_id>,

	428 "name_sid": <name_string_id>

	429 },

	430 ...

	431 ]

	432

	433 In both formats frame name is a string. Native heap profiler generates

	434 specially formatted frame names (e.g. "pc:10eb78dba") for function

	435 addresses (PCs). Inner Frame class below parses name and extracts PC,

	436 if it's there.

	437 """

	438 class Frame(object):

	439 def __init__(self, frame_id, name, parent_frame_id):

267 self._modified = False	440 self._modified = False

268 self._pc = pc	441 self._id = frame_id

269 self._frame = frame	442 self._name = name

	443 self._pc = self._ParsePC(name)

	444 self._parent_id = parent_frame_id

	445 self._ext = None

270	446

271 @property	447 @property

272 def modified(self):	448 def modified(self):

	449 """Returns True if the frame was modified.

	450

	451 For example changing frame's name sets this flag (since the change

	452 needs to be propagated back to nodes).

	453 """

273 return self._modified	454 return self._modified

274	455

275 @property	456 @property

	457 def id(self):

	458 """Frame id (integer)."""

	459 return self._id

	460

	461 @property

276 def pc(self):	462 def pc(self):

	463 """Parsed (integer) PC of the frame, or None."""

277 return self._pc	464 return self._pc

278	465

279 @property	466 @property

280 def name(self):	467 def name(self):

281 return self._frame['name']	468 """Name of the frame (see above)."""

	469 return self._name

282	470

283 @name.setter	471 @name.setter

284 def name(self, value):	472 def name(self, value):

	473 """Changes the name. Doesn't affect value of \|pc\|."""

285 self._modified = True	474 self._modified = True

286 self._frame['name'] = value	475 self._name = value

287	476

288 def __init__(self, stack_frames):	477 @property

289 """Constructs object using 'stackFrames' dictionary."""	478 def parent_id(self):

290 self._pc_frames = []	479 """Parent frame id (integer)."""

291 for frame in stack_frames.itervalues():	480 return self._parent_id

292 pc_frame = self._ParsePCFrame(frame)	481

293 if pc_frame:	482 _PC_TAG = 'pc:'

294 self._pc_frames.append(pc_frame)	483

295	484 def _ParsePC(self, name):

296 @property	485 if not name.startswith(self._PC_TAG):

297 def pc_frames(self):	486 return None

298 return self._pc_frames	487 return long(name[len(self._PC_TAG):], 16)

	488

	489 def _ClearModified(self):

	490 self._modified = False

	491

	492 def __init__(self):

	493 self._modified = False

	494 self._heap_dump_version = None

	495 self._stack_frames_nodes = []

	496 self._frame_by_id = {}

299	497

300 @property	498 @property

301 def modified(self):	499 def modified(self):

302 return any(f.modified for f in self._pc_frames)	500 """Returns True if the wrapper or any of its frames were modified."""

303	501 return (self._modified or

304 _PC_TAG = 'pc:'	502 any(f.modified for f in self._frame_by_id.itervalues()))

305	503

306 @classmethod	504 @property

307 def _ParsePCFrame(self, frame):	505 def frame_by_id(self):

308 name = frame['name']	506 """Returns {id -> frame} dict (must not be modified directly)."""

309 if not name.startswith(self._PC_TAG):	507 return self._frame_by_id

310 return None	508

311 pc = long(name[len(self._PC_TAG):], 16)	509 def ParseNext(self, heap_dump_version, stack_frames_node, string_map):

312 return self.PCFrame(pc, frame)	510 """Parses the next stack frames node (see NodeWrapper).

313	511

314	512 For the modern format \|string_map\| is used to translate string ids

315 class Process(object):	513 to strings.

316 """Holds various bits of information about a process in a trace file."""	514 """

317	515

318 def __init__(self, pid):	516 frame_by_id = {}

319 self.pid = pid	517 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

320 self.name = None	518 if self._stack_frames_nodes:

321 self.mmaps = None	519 raise Exception('Legacy stack frames node is expected only once.')

322 self.stack_frames = None	520 for frame_id, frame_node in stack_frames_node.iteritems():

323	521 frame = self.Frame(frame_id,

324	522 frame_node['name'],

325 def CollectProcesses(trace):	523 frame_node.get('parent'))

326 """Parses trace dictionary and returns pid->Process map of all processes	524 frame_by_id[frame.id] = frame

327 suitable for symbolization (which have both mmaps and stack_frames).	525 else:

	526 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

	527 raise UnsupportedHeapDumpVersionError(heap_dump_version)

	528 for frame_node in stack_frames_node:

	529 frame = self.Frame(frame_node['id'],

	530 string_map.string_by_id[frame_node['name_sid']],

	531 frame_node.get('parent'))

	532 frame_by_id[frame.id] = frame

	533

	534 self._heap_dump_version = heap_dump_version

	535 self._stack_frames_nodes.append(stack_frames_node)

	536

	537 self._frame_by_id = frame_by_id

	538

	539 def ApplyModifications(self, string_map, force=False):

	540 """Applies modifications back to nodes (see NodeWrapper)."""

	541

	542 if not self.modified and not force:

	543 return

	544

	545 assert self._stack_frames_nodes, 'no nodes'

	546 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	547 assert string_map is None, \

	548 'string_map should not be used with the legacy format'

	549

	550 # Serialize frames into the first node, clear all others.

	551

	552 for frames_node in self._stack_frames_nodes:

	553 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	554 frames_node.clear()

	555 else:

	556 del frames_node[:]

	557

	558 frames_node = self._stack_frames_nodes[0]

	559 for frame in self._frame_by_id.itervalues():

	560 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	561 frame_node = {'name': frame.name}

	562 frames_node[frame.id] = frame_node

	563 else:

	564 frame_node = {

	565 'id': frame.id,

	566 'name_sid': string_map.AddString(frame.name)

	567 }

	568 frames_node.append(frame_node)

	569 if frame.parent_id is not None:

	570 frame_node['parent'] = frame.parent_id

	571 frame._ClearModified()

	572

	573 self._modified = False

	574

	575

	576 class Trace(NodeWrapper):

	577 """Wrapper for the root trace node (i.e. the trace JSON itself).

	578

	579 This wrapper parses select nodes from memory-infra events and groups

	580 parsed data per-process (see inner Process class below).

328 """	581 """

329	582

330 process_map = {}	583 # Indicates legacy heap dump format.

331	584 HEAP_DUMP_VERSION_LEGACY = 'Legacy'

332 # Android traces produced via 'chrome://inspect/?tracing#devices' are	585

333 # just list of events.	586 # Indicates variation of a modern heap dump format.

334 events = trace if isinstance(trace, list) else trace['traceEvents']	587 HEAP_DUMP_VERSION_1 = 1

335 for event in events:	588

336 name = event.get('name')	589 class Process(object):

337 if not name:	590 """Collection of per-process data and wrappers."""

338 continue	591

339	592 def __init__(self, pid):

340 pid = event['pid']	593 self._pid = pid

341 process = process_map.get(pid)	594 self._name = None

342 if process is None:	595 self._memory_map = None

343 process = Process(pid)	596 self._stack_frame_map = StackFrameMap()

344 process_map[pid] = process	597 self._type_name_map = TypeNameMap()

345	598 self._string_map = StringMap()

346 phase = event['ph']	599 self._heap_dump_version = None

347 if phase == TRACE_EVENT_PHASE_METADATA:	600

348 if name == 'process_name':	601 @property

349 process.name = event['args']['name']	602 def modified(self):

350 elif name == 'stackFrames':	603 return self._stack_frame_map.modified or self._type_name_map.modified

351 process.stack_frames = StackFrames(event['args']['stackFrames'])	604

352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP:	605 @property

353 process_mmaps = event['args']['dumps'].get('process_mmaps')	606 def pid(self):

354 if process_mmaps:	607 return self._pid

355 # TODO(dskiba): this parses all process_mmaps, but retains only the	608

356 # last one. We need to parse only once (lazy parsing?).	609 @property

357 process.mmaps = ProcessMemoryMaps(process_mmaps)	610 def name(self):

358	611 return self._name

359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames]	612

	613 @property

	614 def unique_name(self):

	615 """Returns string that includes both process name and its pid."""

	616 name = self._name if self._name else 'UnnamedProcess'

	617 return '{}({})'.format(name, self._pid)

	618

	619 @property

	620 def memory_map(self):

	621 return self._memory_map

	622

	623 @property

	624 def stack_frame_map(self):

	625 return self._stack_frame_map

	626

	627 @property

	628 def type_name_map(self):

	629 return self._type_name_map

	630

	631 def ApplyModifications(self):

	632 """Calls ApplyModifications() on contained wrappers."""

	633 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

	634 self._stack_frame_map.ApplyModifications(None)

	635 else:

	636 if self._stack_frame_map.modified or self._type_name_map.modified:

	637 self._string_map.Clear()

	638 self._stack_frame_map.ApplyModifications(self._string_map, force=True)

	639 self._type_name_map.ApplyModifications(self._string_map, force=True)

	640 self._string_map.ApplyModifications()

	641

	642 def __init__(self, trace_node):

	643 self._trace_node = trace_node

	644 self._processes = []

	645 self._heap_dump_version = None

	646

	647 # Misc per-process information needed only during parsing.

	648 class ProcessExt(object):

	649 def __init__(self, pid):

	650 self.process = Trace.Process(pid)

	651 self.mapped_entry_names = set()

	652 self.process_mmaps_node = None

	653 self.seen_strings_node = False

	654

	655 process_ext_by_pid = {}

	656

	657 # Android traces produced via 'chrome://inspect/?tracing#devices' are

	658 # just list of events.

	659 events = trace_node if isinstance(trace_node, list) \

	660 else trace_node['traceEvents']

	661 for event in events:

	662 name = event.get('name')

	663 if not name:

	664 continue

	665

	666 pid = event['pid']

	667 process_ext = process_ext_by_pid.get(pid)

	668 if process_ext is None:

	669 process_ext = ProcessExt(pid)

	670 process_ext_by_pid[pid] = process_ext

	671 process = process_ext.process

	672

	673 phase = event['ph']

	674 if phase == self._EVENT_PHASE_METADATA:

	675 if name == 'process_name':

	676 process._name = event['args']['name']

	677 elif name == 'stackFrames':

	678 process._stack_frame_map.ParseNext(

	679 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),

	680 event['args']['stackFrames'],

	681 process._string_map)

	682 elif phase == self._EVENT_PHASE_MEMORY_DUMP:

	683 dumps = event['args']['dumps']

	684 process_mmaps = dumps.get('process_mmaps')

	685 if process_mmaps:

	686 # We want the most recent memory map, so parsing happens later

	687 # once we finished reading all events.

	688 process_ext.process_mmaps_node = process_mmaps

	689 heaps = dumps.get('heaps_v2')

	690 if heaps:

	691 version = self._UseHeapDumpVersion(heaps['version'])

	692 maps = heaps.get('maps')

	693 if maps:

	694 process_ext.mapped_entry_names.update(maps.iterkeys())

	695 types = maps.get('types')

	696 stack_frames = maps.get('nodes')

	697 strings = maps.get('strings')

	698 if (strings is None and (types or stack_frames)

	699 and not process_ext.seen_strings_node):

	700 # ApplyModifications() for TypeNameMap and StackFrameMap puts

	701 # everything into the first node and depends on StringMap. So

	702 # we need to make sure that 'strings' node is there if any of

	703 # other two nodes present.

	704 strings = []

	705 maps['strings'] = strings

	706 if strings is not None:

	707 process_ext.seen_strings_node = True

	708 process._string_map.ParseNext(version, strings)

	709 if types:

	710 process._type_name_map.ParseNext(

	711 version, types, process._string_map)

	712 if stack_frames:

	713 process._stack_frame_map.ParseNext(

	714 version, stack_frames, process._string_map)

	715

	716 self._processes = []

	717 for pe in process_ext_by_pid.itervalues():

	718 pe.process._heap_dump_version = self._heap_dump_version

	719 if pe.process_mmaps_node:

	720 # Now parse the most recent memory map.

	721 pe.process._memory_map = MemoryMap(pe.process_mmaps_node)

	722 self._processes.append(pe.process)

	723

	724 @property

	725 def node(self):

	726 """Root node (that was passed to the __init__)."""

	727 return self._trace_node

	728

	729 @property

	730 def modified(self):

	731 """Returns True if trace file needs to be updated.

	732

	733 Before writing trace JSON back to a file ApplyModifications() needs

	734 to be called.

	735 """

	736 return any(p.modified for p in self._processes)

	737

	738 @property

	739 def processes(self):

	740 return self._processes

	741

	742 @property

	743 def heap_dump_version(self):

	744 return self._heap_dump_version

	745

	746 def ApplyModifications(self):

	747 """Propagates modifications back to the trace JSON."""

	748 for process in self._processes:

	749 process.ApplyModifications()

	750 assert not self.modified, 'still modified'

	751

	752 # Relevant trace event phases from Chromium's

	753 # src/base/trace_event/common/trace_event_common.h.

	754 _EVENT_PHASE_METADATA = 'M'

	755 _EVENT_PHASE_MEMORY_DUMP = 'v'

	756

	757 def _UseHeapDumpVersion(self, version):

	758 if self._heap_dump_version is None:

	759 self._heap_dump_version = version

	760 return version

	761 elif self._heap_dump_version != version:

	762 raise Exception(

	763 ("Inconsistent trace file: first saw '{}' heap dump version, "

	764 "then '{}'.").format(self._heap_dump_version, version))

	765 else:

	766 return version

360	767

361	768

362 class SymbolizableFile(object):	769 class SymbolizableFile(object):

363 """Holds file path, addresses to symbolize and stack frames to update.	770 """Holds file path, addresses to symbolize and stack frames to update.

364	771

365 This class is a link between ELFSymbolizer and a trace file: it specifies	772 This class is a link between ELFSymbolizer and a trace file: it specifies

366 what to symbolize (addresses) and what to update with the symbolization	773 what to symbolize (addresses) and what to update with the symbolization

367 result (frames).	774 result (frames).

368 """	775 """

369 def __init__(self, file_path):	776 def __init__(self, file_path):

370 self.path = file_path	777 self.path = file_path

371 self.symbolizable_path = file_path # path to use for symbolization	778 self.symbolizable_path = file_path # path to use for symbolization

372 self.frames_by_address = collections.defaultdict(list)	779 self.frames_by_address = collections.defaultdict(list)

373	780

374	781

375 def ResolveSymbolizableFiles(processes):	782 def ResolveSymbolizableFiles(processes):

376 """Resolves and groups PCs into list of SymbolizableFiles.	783 """Resolves and groups PCs into list of SymbolizableFiles.

377	784

378 As part of the grouping process, this function resolves PC from each stack	785 As part of the grouping process, this function resolves PC from each stack

379 frame to the corresponding mmap region. Stack frames that failed to resolve	786 frame to the corresponding mmap region. Stack frames that failed to resolve

380 are symbolized with '<unresolved>'.	787 are symbolized with '<unresolved>'.

381 """	788 """

382 symfile_by_path = {}	789 symfile_by_path = {}

383 for process in processes:	790 for process in processes:

384 for frame in process.stack_frames.pc_frames:	791 if not process.memory_map:

385 region = process.mmaps.FindRegion(frame.pc)	792 continue

	793 for frame in process.stack_frame_map.frame_by_id.itervalues():

	794 if frame.pc is None:

	795 continue

	796 region = process.memory_map.FindRegion(frame.pc)

386 if region is None:	797 if region is None:

387 frame.name = '<unresolved>'	798 frame.name = '<unresolved>'

388 continue	799 continue

389	800

390 symfile = symfile_by_path.get(region.file_path)	801 symfile = symfile_by_path.get(region.file_path)

391 if symfile is None:	802 if symfile is None:

392 symfile = SymbolizableFile(region.file_path)	803 symfile = SymbolizableFile(region.file_path)

393 symfile_by_path[symfile.path] = symfile	804 symfile_by_path[symfile.path] = symfile

394	805

395 relative_pc = frame.pc - region.start_address	806 relative_pc = frame.pc - region.start_address

396 symfile.frames_by_address[relative_pc].append(frame)	807 symfile.frames_by_address[relative_pc].append(frame)

397 return symfile_by_path.values()	808 return symfile_by_path.values()

398	809

399	810

	811 def FindInSystemPath(binary_name):

	812 paths = os.environ['PATH'].split(os.pathsep)

	813 for path in paths:

	814 binary_path = os.path.join(path, binary_name)

	815 if os.path.isfile(binary_path):

	816 return binary_path

	817 return None

	818

	819

	820 class Symbolizer(object):

	821 """Encapsulates platform-specific symbolization logic."""

	822

	823 def __init__(self):

	824 self.is_mac = sys.platform == 'darwin'

	825 self.is_win = sys.platform == 'win32'

	826 if self.is_mac:

	827 self.binary = 'atos'

	828 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()

	829 elif self.is_win:

	830 self.binary = 'addr2line-pdb.exe'

	831 else:

	832 self.binary = 'addr2line'

	833 self.symbolizer_path = FindInSystemPath(self.binary)

	834

	835 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):

	836 def _SymbolizerCallback(sym_info, frames):

	837 # Unwind inline chain to the top.

	838 while sym_info.inlined_by:

	839 sym_info = sym_info.inlined_by

	840

	841 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name

	842 for frame in frames:

	843 frame.name = symbolized_name

	844 frame.ext.source_path = sym_info.source_path

	845

	846 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,

	847 self.symbolizer_path,

	848 _SymbolizerCallback,

	849 inlines=True)

	850

	851 for address, frames in symfile.frames_by_address.iteritems():

	852 # SymbolizeAsync() asserts that the type of address is int. We operate

	853 # on longs (since they are raw pointers possibly from 64-bit processes).

	854 # It's OK to cast here because we're passing relative PC, which should

	855 # always fit into int.

	856 symbolizer.SymbolizeAsync(int(address), frames)

	857

	858 symbolizer.Join()

	859

	860

	861 def _SymbolizeMac(self, symfile):

	862 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True))

	863

	864 # 16 for the address, 2 for "0x", 1 for the space

	865 chars_per_address = 19

	866

	867 load_address = (symbolize_trace_macho_reader.

	868 ReadMachOTextLoadAddress(symfile.symbolizable_path))

	869 assert load_address is not None

	870

	871 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l',

	872 '0x%x' % load_address, '-o',

	873 symfile.symbolizable_path]

	874 chars_for_other_arguments = len(' '.join(cmd_base)) + 1

	875

	876 # The maximum number of inputs that can be processed at once is limited by

	877 # ARG_MAX. This currently evalutes to ~13000 on macOS.

	878 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address

	879

	880 all_keys = symfile.frames_by_address.keys()

	881 processed_keys_count = 0

	882 while len(all_keys):

	883 input_count = min(len(all_keys), max_inputs)

	884 keys_to_process = all_keys[0:input_count]

	885 cmd = list(cmd_base)

	886 cmd.extend([hex(int(x) + load_address)

	887 for x in keys_to_process])

	888 output_array = subprocess.check_output(cmd).split('\n')

	889 for i in range(len(keys_to_process)):

	890 for frame in (symfile.frames_by_address.values()

	891 [i + processed_keys_count]):

	892 frame.name = self._matcher.Match(output_array[i])

	893 processed_keys_count += len(keys_to_process)

	894 all_keys = all_keys[input_count:]

	895

	896 def _SymbolizeWin(self, symfile):

	897 """Invoke symbolizer binary on windows and write all input in one go.

	898

	899 Unlike linux, on windows, symbolization talks through a shared system

	900 service that handles communication with the NT symbol servers. This

	901 creates an explicit serialization (and therefor lock contention) of

	902 any process using the symbol API for files do not have a local PDB.

	903

	904 Thus, even though the windows symbolizer binary can be make command line

	905 compatible with the POSIX addr2line interface, paralellizing the

	906 symbolization does not yield the same performance effects. Running

	907 just one symbolizer seems good enough for now. Can optimize later

	908 if this becomes a bottleneck.

	909 """

	910 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',

	911 symfile.symbolizable_path]

	912

	913 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,

	914 stderr=sys.stderr)

	915 addrs = ["%x" % relative_pc for relative_pc in

	916 symfile.frames_by_address.keys()]

	917 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))

	918 stdout_data = stdout_data.split('\n')

	919

	920 # This is known to be in the same order as stderr_data.

	921 for i, addr in enumerate(addrs):

	922 for frame in symfile.frames_by_address[int(addr, 16)]:

	923 # Output of addr2line with --functions is always 2 outputs per

	924 # symbol, function name followed by source line number. Only grab

	925 # the function name as line info is not always available.

	926 frame.name = stdout_data[i * 2]

	927

	928 def Symbolize(self, symfile, unsymbolized_name):

	929 if self.is_mac:

	930 self._SymbolizeMac(symfile)

	931 elif self.is_win:

	932 self._SymbolizeWin(symfile)

	933 else:

	934 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)

	935

	936 def IsSymbolizableFile(self, file_path):

	937 if self.is_win:

	938 extension = os.path.splitext(file_path)[1].lower()

	939 return extension in ['.dll', '.exe']

	940 else:

	941 result = subprocess.check_output(['file', '-0', file_path])

	942 type_string = result[result.find('\0') + 1:]

	943 return bool(re.match(r'.(ELF\|Mach-O) (32\|64)-bit\b.',

	944 type_string, re.DOTALL))

	945

	946

400 def SymbolizeFiles(symfiles, symbolizer):	947 def SymbolizeFiles(symfiles, symbolizer):

401 """Symbolizes each file in the given list of SymbolizableFiles	948 """Symbolizes each file in the given list of SymbolizableFiles

402 and updates stack frames with symbolization results."""	949 and updates stack frames with symbolization results."""

	950

	951 if not symfiles:

	952 print 'Nothing to symbolize.'

	953 return

	954

403 print 'Symbolizing...'	955 print 'Symbolizing...'

404	956

405 def _SubPrintf(message, *args):	957 def _SubPrintf(message, *args):

406 print (' ' + message).format(*args)	958 print (' ' + message).format(*args)

407	959

408 symbolized = False

409 for symfile in symfiles:	960 for symfile in symfiles:

410 unsymbolized_name = '<{}>'.format(	961 unsymbolized_name = '<{}>'.format(

411 symfile.path if symfile.path else 'unnamed')	962 symfile.path if symfile.path else 'unnamed')

412	963

413 problem = None	964 problem = None

414 if not os.path.isabs(symfile.symbolizable_path):	965 if not os.path.isabs(symfile.symbolizable_path):

415 problem = 'not a file'	966 problem = 'not a file'

416 elif not os.path.isfile(symfile.symbolizable_path):	967 elif not os.path.isfile(symfile.symbolizable_path):

417 problem = "file doesn't exist"	968 problem = "file doesn't exist"

418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):	969 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):

419 problem = 'file is not symbolizable'	970 problem = 'file is not symbolizable'

420 if problem:	971 if problem:

421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",	972 _SubPrintf("Won't symbolize {} PCs for '{}': {}.",

422 len(symfile.frames_by_address),	973 len(symfile.frames_by_address),

423 symfile.symbolizable_path,	974 symfile.symbolizable_path,

424 problem)	975 problem)

425 for frames in symfile.frames_by_address.itervalues():	976 for frames in symfile.frames_by_address.itervalues():

426 for frame in frames:	977 for frame in frames:

427 frame.name = unsymbolized_name	978 frame.name = unsymbolized_name

428 continue	979 continue

429	980

430 _SubPrintf('Symbolizing {} PCs from {}...',	981 _SubPrintf('Symbolizing {} PCs from {}...',

431 len(symfile.frames_by_address),	982 len(symfile.frames_by_address),

432 symfile.path)	983 symfile.path)

433	984

434 symbolizer.Symbolize(symfile, unsymbolized_name)	985 symbolizer.Symbolize(symfile, unsymbolized_name)

435 symbolized = True

436	986

437 return symbolized	987

	988 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)

	989 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available

	990 # via 'name' group.

	991 ANDROID_PATH_MATCHER = re.compile(

	992 r'^/data/(?:'

	993 r'app/[^/]+/lib/[^/]+/\|'

	994 r'app-lib/[^/]+/\|'

	995 r'data/[^/]+/incremental-install-files/lib/'

	996 r')(?P<name>.*\.so)')

	997

	998 # Subpath of output path where unstripped libraries are stored.

	999 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'

438	1000

439	1001

440 def HaveFilesFromAndroid(symfiles):	1002 def HaveFilesFromAndroid(symfiles):

441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)	1003 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)

442	1004

443	1005

444 def RemapAndroidFiles(symfiles, output_path):	1006 def RemapAndroidFiles(symfiles, output_path):

445 for symfile in symfiles:	1007 for symfile in symfiles:

446 match = ANDROID_PATH_MATCHER.match(symfile.path)	1008 match = ANDROID_PATH_MATCHER.match(symfile.path)

447 if match:	1009 if match:

448 name = match.group('name')	1010 name = match.group('name')

449 symfile.symbolizable_path = os.path.join(	1011 symfile.symbolizable_path = os.path.join(

450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)	1012 output_path, ANDROID_UNSTRIPPED_SUBPATH, name)

451 else:	1013 else:

452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().	1014 # Clobber file path to trigger "not a file" problem in SymbolizeFiles().

453 # Without this, files won't be symbolized with "file not found" problem,	1015 # Without this, files won't be symbolized with "file not found" problem,

454 # which is not accurate.	1016 # which is not accurate.

455 symfile.symbolizable_path = 'android://{}'.format(symfile.path)	1017 symfile.symbolizable_path = 'android://{}'.format(symfile.path)

456	1018

457	1019

	1020 def Symbolize(options, trace, symbolizer):

	1021 symfiles = ResolveSymbolizableFiles(trace.processes)

	1022

	1023 # Android trace files don't have any indication they are from Android.

	1024 # So we're checking for Android-specific paths.

	1025 if HaveFilesFromAndroid(symfiles):

	1026 if not options.output_directory:

	1027 sys.exit('The trace file appears to be from Android. Please '

	1028 'specify output directory to properly symbolize it.')

	1029 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))

	1030

	1031 SymbolizeFiles(symfiles, symbolizer)

	1032

	1033

	1034 def OpenTraceFile(file_path, mode):

	1035 if file_path.endswith('.gz'):

	1036 return gzip.open(file_path, mode + 'b')

	1037 else:

	1038 return open(file_path, mode + 't')

	1039

	1040

458 # Suffix used for backup files.	1041 # Suffix used for backup files.

459 BACKUP_FILE_TAG = '.BACKUP'	1042 BACKUP_FILE_TAG = '.BACKUP'

460	1043

461 def main():	1044 def main():

462 parser = argparse.ArgumentParser()	1045 class MultilineHelpFormatter(argparse.HelpFormatter):

463 parser.add_argument('file',	1046 def _split_lines(self, text, width):

464 help='Trace file to symbolize (.json or .json.gz)')	1047 extra_lines = []

465 parser.add_argument('--no-backup',	1048 if '\n' in text:

466 dest='backup', default='true', action='store_false',	1049 lines = text.splitlines()

467 help="Don't create {} files".format(BACKUP_FILE_TAG))	1050 text = lines[0]

468 parser.add_argument('--output-directory',	1051 extra_lines = lines[1:]

469 help='The path to the build output directory, such ' +	1052 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \

470 'as out/Debug. Only needed for Android.')	1053 extra_lines

471 options = parser.parse_args()

472	1054

473 trace_file_path = options.file	1055 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter)

474 def _OpenTraceFile(mode):	1056 parser.add_argument(

475 if trace_file_path.endswith('.gz'):	1057 'file',

476 return gzip.open(trace_file_path, mode + 'b')	1058 help='Trace file to symbolize (.json or .json.gz)')

477 else:	1059

478 return open(trace_file_path, mode + 't')	1060 parser.add_argument(

	1061 '--no-backup', dest='backup', default='true', action='store_false',

	1062 help="Don't create {} files".format(BACKUP_FILE_TAG))

	1063

	1064 parser.add_argument(

	1065 '--output-directory',

	1066 help='The path to the build output directory, such as out/Debug.')

479	1067

480 symbolizer = Symbolizer()	1068 symbolizer = Symbolizer()

481 if symbolizer.symbolizer_path is None:	1069 if symbolizer.symbolizer_path is None:

482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)	1070 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)

483	1071

	1072 options = parser.parse_args()

	1073

	1074 trace_file_path = options.file

	1075

484 print 'Reading trace file...'	1076 print 'Reading trace file...'

485 with _OpenTraceFile('r') as trace_file:	1077 with OpenTraceFile(trace_file_path, 'r') as trace_file:

486 trace = json.load(trace_file)	1078 trace = Trace(json.load(trace_file))

487	1079

488 processes = CollectProcesses(trace)	1080 Symbolize(options, trace, symbolizer)

489 symfiles = ResolveSymbolizableFiles(processes)

490	1081

491 # Android trace files don't have any indication they are from Android.	1082 if trace.modified:

492 # So we're checking for Android-specific paths.	1083 trace.ApplyModifications()

493 if HaveFilesFromAndroid(symfiles):

494 if not options.output_directory:

495 parser.error('The trace file appears to be from Android. Please '

496 "specify output directory (e.g. 'out/Debug') to properly "

497 'symbolize it.')

498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))

499	1084

500 if SymbolizeFiles(symfiles, symbolizer):

501 if options.backup:	1085 if options.backup:

502 backup_file_path = trace_file_path + BACKUP_FILE_TAG	1086 backup_file_path = trace_file_path + BACKUP_FILE_TAG

503 print 'Backing up trace file to {}...'.format(backup_file_path)	1087 if os.path.exists(backup_file_path):

	1088 for i in itertools.count(1):

	1089 unique_file_path = '{}{}'.format(backup_file_path, i)

	1090 if not os.path.exists(unique_file_path):

	1091 backup_file_path = unique_file_path

	1092 break

	1093 print 'Backing up trace file to {}'.format(backup_file_path)

504 os.rename(trace_file_path, backup_file_path)	1094 os.rename(trace_file_path, backup_file_path)

505	1095

506 print 'Updating trace file...'	1096 print 'Updating the trace file...'

507 with _OpenTraceFile('w') as trace_file:	1097 with OpenTraceFile(trace_file_path, 'w') as trace_file:

508 json.dump(trace, trace_file)	1098 json.dump(trace.node, trace_file)

509 else:	1099 else:

510 print 'No PCs symbolized - not updating trace file.'	1100 print 'No modifications were made - not updating the trace file.'

511	1101

512	1102

513 if __name__ == '__main__':	1103 if __name__ == '__main__':

514 main()	1104 main()

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »