 Chromium Code Reviews
 Chromium Code Reviews Issue 2810523002:
  symbolize_trace: support new heap dump format.  (Closed)
    
  
    Issue 2810523002:
  symbolize_trace: support new heap dump format.  (Closed) 
  | OLD | NEW | 
|---|---|
| 1 #!/usr/bin/env python | 1 #!/usr/bin/env python | 
| 2 # Copyright 2016 The Chromium Authors. All rights reserved. | 2 # Copyright 2016 The Chromium Authors. All rights reserved. | 
| 3 # Use of this source code is governed by a BSD-style license that can be | 3 # Use of this source code is governed by a BSD-style license that can be | 
| 4 # found in the LICENSE file. | 4 # found in the LICENSE file. | 
| 5 | 5 | 
| 6 """ | |
| 7 This script processes trace files and symbolizes stack frames generated by | |
| 8 Chrome's native heap profiler. | |
| 9 | |
| 10 === Overview === | |
| 11 | |
| 12 Trace file is essentially a giant JSON array of dictionaries (events). | |
| 13 Events have some predefined keys, but otherwise are free to have anything | |
| 14 inside. Trace file contains events from all Chrome processes that were | |
| 15 sampled during tracing period (and 'pid' is an example of a predefined key). | |
| 
Wez
2017/04/29 00:41:21
nit: "(and 'pid' is an example..." reads oddly her
 
DmitrySkiba
2017/05/02 06:19:59
Done.
 | |
| 16 | |
| 17 This script cares only about memory dump events generated by memory-infra | |
| 
Wez
2017/04/29 00:41:21
nit: Suggest "...dump events in trace files genera
 
DmitrySkiba
2017/05/02 06:19:59
Done.
 | |
| 18 component. | |
| 19 | |
| 20 When Chrome native heap profiling is enabled, some memory dump events | |
| 21 include the following extra information: | |
| 22 | |
| 23 * (Per allocator) Information about live allocations at the moment of the | |
| 24 memory dump (the information includes backtraces, types / categories, | |
| 25 sizes, and counts of allocations). There are several allocators in | |
| 26 Chrome: malloc, blink_gc, and partition_alloc. | |
| 
Wez
2017/04/29 00:41:21
nit: If these are examples, not an exhaustive list
 
DmitrySkiba
2017/05/02 06:19:59
This is actually an exhaustive list.
 
Wez
2017/05/03 00:17:09
OK; in that case I would say "There are three allo
 
DmitrySkiba
2017/05/04 00:30:55
Acknowledged.
 | |
| 27 | |
| 28 * (Per process) Stack frame tree of all functions that called allocators | |
| 29 above. | |
| 
Wez
2017/04/29 00:41:21
nit: If we failed to trace all the way back to mai
 
DmitrySkiba
2017/05/02 06:19:59
It's still a single tree, just with an implicit ro
 
Wez
2017/05/03 00:17:10
OK; you could add a brief note that effect here, f
 
DmitrySkiba
2017/05/04 00:30:55
Acknowledged.
 | |
| 30 | |
| 31 This script does the following: | |
| 32 | |
| 33 1. Parses the given trace file. | |
| 34 2. Finds memory dump events and parses stack frame tree for each process. | |
| 35 3. Finds stack frames that have PC addresses instead of function names. | |
| 36 4. Symbolizes these PCs. | |
| 37 6. Rewrites stack frame names (this updates parts of memory dump events). | |
| 
Wez
2017/04/29 00:41:21
nit: You're missing #5 ;)
It's also not clear wha
 
DmitrySkiba
2017/05/02 06:19:59
Done. Added note about script not coalescing such
 
Wez
2017/05/03 00:17:09
Acknowledged.
 | |
| 38 7. Updates the trace file. | |
| 39 | |
| 40 === Details === | |
| 41 | |
| 42 There are two formats of heap profiler information: legacy and modern. The | |
| 43 main differences are: | |
| 44 | |
| 45 * In the legacy format stack frame tree is not dumped in memory dump events, | |
| 46 but in metadata events (one per process). I.e. it's sufficient to parse | |
| 47 a single metadata event to get full stack frame tree for a process. | |
| 
Wez
2017/04/29 00:41:21
IIUC the point here is that every "event" in a leg
 
DmitrySkiba
2017/05/02 06:19:59
Both formats dump live objects per allocator in ea
 
Wez
2017/05/03 00:17:10
Thanks for adding this detail, however it seems a
 
DmitrySkiba
2017/05/04 00:30:55
Well, the section is named "Details", and details
 | |
| 48 | |
| 49 * In the modern format stack frame tree (also type name and string mappings) | |
| 50 are dumped incrementally. I.e. each memory dump event carries additions to | |
| 51 the stack frame tree that occurred since the previous memory dump event. | |
| 
Wez
2017/04/29 00:41:21
You might express this as each memory-infra event
 
DmitrySkiba
2017/05/02 06:19:59
Done.
 | |
| 52 To get the full stack frame tree for a process the script needs to parse | |
| 53 all memory dump events. However, when wrappers update incremental nodes, | |
| 54 they put everything in the first node, and clear all others. | |
| 
Wez
2017/04/29 00:41:21
Not sure what you mean about moving everything int
 
DmitrySkiba
2017/05/02 06:19:59
Explained more.
 | |
| 55 | |
| 56 * In the modern format stack frame tree doesn't reference name strings | |
| 57 directly, but through a string mapping table. | |
| 58 | |
| 59 See crbug.com/708930 for more information about the modern format. | |
| 60 """ | |
| 61 | |
| 6 import argparse | 62 import argparse | 
| 7 import bisect | 63 import bisect | 
| 8 import collections | 64 import collections | 
| 9 import gzip | 65 import gzip | 
| 66 import itertools | |
| 10 import json | 67 import json | 
| 11 import os | 68 import os | 
| 12 import re | 69 import re | 
| 13 import subprocess | 70 import subprocess | 
| 14 import sys | 71 import sys | 
| 15 | 72 | 
| 16 _SYMBOLS_PATH = os.path.abspath(os.path.join( | 73 _SYMBOLS_PATH = os.path.abspath(os.path.join( | 
| 17 os.path.dirname(os.path.realpath(__file__)), | 74 os.path.dirname(os.path.realpath(__file__)), | 
| 18 '..', | 75 '..', | 
| 19 'third_party', | 76 'third_party', | 
| 20 'symbols')) | 77 'symbols')) | 
| 21 sys.path.append(_SYMBOLS_PATH) | 78 sys.path.append(_SYMBOLS_PATH) | 
| 22 # pylint: disable=import-error | 79 # pylint: disable=import-error | 
| 23 import symbols.elf_symbolizer as elf_symbolizer | 80 import symbols.elf_symbolizer as elf_symbolizer | 
| 24 | 81 | 
| 25 import symbolize_trace_atos_regex | 82 import symbolize_trace_atos_regex | 
| 26 import symbolize_trace_macho_reader | 83 import symbolize_trace_macho_reader | 
| 27 | 84 | 
| 28 | 85 | 
| 29 # Relevant trace event phases from Chromium's | 86 class NodeWrapper(object): | 
| 30 # src/base/trace_event/common/trace_event_common.h. | 87 """Wraps an event data node(s). | 
| 31 TRACE_EVENT_PHASE_METADATA = 'M' | 88 | 
| 32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v' | 89 A node is a reference into a trace event JSON. Wrappers parse nodes to | 
| 90 provide convenient APIs and update nodes when asked to propagate changes | |
| 91 back (see ApplyModifications() below). | |
| 92 | |
| 93 Here is an example of legacy metadata event that contains stack frame tree: | |
| 94 | |
| 95 { | |
| 96 "args": { | |
| 97 "stackFrames": { ... } | |
| 98 }, | |
| 99 "cat": "__metadata", | |
| 100 "name": "stackFrames", | |
| 101 "ph": "M", | |
| 102 ... | |
| 103 } | |
| 104 | |
| 105 When this event is encountered, a reference to the "stackFrames" dictionary | |
| 106 is obtained and passed down to a specific wrapped class, which knows how to | |
| 107 parse / update the dictionary. | |
| 108 | |
| 109 There are two parsing patterns depending on whether node is serialized | |
| 110 incrementally: | |
| 111 | |
| 112 * If node is not incremental, then parsing is done by __init__(), | |
| 113 see MemoryMap for an example. | |
| 114 | |
| 115 * If node is incremental, then __init__() does nothing, and ParseNext() | |
| 116 is called when next node (from a next event) is encountered. | |
| 117 | |
| 118 Some wrappers can also modify nodes they parsed. In such cases they have | |
| 119 additional APIs: | |
| 120 | |
| 121 * 'modified' flag, which indicates whether the wrapper was changed. | |
| 122 | |
| 123 * 'ApplyModifications' method, which propagates changes made to the wrapper | |
| 124 back to nodes. Successful invocation of ApplyModifications() resets | |
| 125 'modified' flag. | |
| 126 | |
| 127 """ | |
| 128 | |
| 129 # def __init__(self, node): | |
| 130 # ... | |
| 131 | |
| 132 # def ParseNext(self, node, ...): | |
| 133 # ... | |
| 134 | |
| 135 # @property | |
| 136 # def modified(self): | |
| 137 # ... | |
| 138 | |
| 139 # def ApplyModifications(self, ...): | |
| 140 # ... | |
| 141 | |
| 142 pass | |
| 33 | 143 | 
| 34 | 144 | 
| 35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) | 145 class MemoryMap(NodeWrapper): | 
| 36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available | 146 """Wraps 'process_mmaps' node. | 
| 37 # via 'name' group. | |
| 38 ANDROID_PATH_MATCHER = re.compile( | |
| 39 r'^/data/(?:' | |
| 40 r'app/[^/]+/lib/[^/]+/|' | |
| 41 r'app-lib/[^/]+/|' | |
| 42 r'data/[^/]+/incremental-install-files/lib/' | |
| 43 r')(?P<name>.*\.so)') | |
| 44 | 147 | 
| 45 # Subpath of output path where unstripped libraries are stored. | 148 'process_mmaps' node contains information about file mappings. | 
| 46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped' | |
| 47 | 149 | 
| 48 | 150 "process_mmaps": { | 
| 49 def FindInSystemPath(binary_name): | 151 "vm_regions": [ | 
| 50 paths = os.environ['PATH'].split(os.pathsep) | 152 { | 
| 51 for path in paths: | 153 "mf": "<file_path>", | 
| 52 binary_path = os.path.join(path, binary_name) | 154 "sa": "<start_address>", | 
| 53 if os.path.isfile(binary_path): | 155 "sz": "<size>", | 
| 54 return binary_path | 156 ... | 
| 55 return None | 157 }, | 
| 56 | 158 ... | 
| 57 | 159 ] | 
| 58 class Symbolizer(object): | 160 } | 
| 59 # Encapsulates platform-specific symbolization logic. | 161 """ | 
| 60 def __init__(self): | |
| 61 self.is_mac = sys.platform == 'darwin' | |
| 62 self.is_win = sys.platform == 'win32' | |
| 63 if self.is_mac: | |
| 64 self.binary = 'atos' | |
| 65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher() | |
| 66 elif self.is_win: | |
| 67 self.binary = 'addr2line-pdb.exe' | |
| 68 else: | |
| 69 self.binary = 'addr2line' | |
| 70 self.symbolizer_path = FindInSystemPath(self.binary) | |
| 71 | |
| 72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name): | |
| 73 def _SymbolizerCallback(sym_info, frames): | |
| 74 # Unwind inline chain to the top. | |
| 75 while sym_info.inlined_by: | |
| 76 sym_info = sym_info.inlined_by | |
| 77 | |
| 78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name | |
| 79 for frame in frames: | |
| 80 frame.name = symbolized_name | |
| 81 | |
| 82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path, | |
| 83 self.symbolizer_path, | |
| 84 _SymbolizerCallback, | |
| 85 inlines=True) | |
| 86 | |
| 87 for address, frames in symfile.frames_by_address.iteritems(): | |
| 88 # SymbolizeAsync() asserts that the type of address is int. We operate | |
| 89 # on longs (since they are raw pointers possibly from 64-bit processes). | |
| 90 # It's OK to cast here because we're passing relative PC, which should | |
| 91 # always fit into int. | |
| 92 symbolizer.SymbolizeAsync(int(address), frames) | |
| 93 | |
| 94 symbolizer.Join() | |
| 95 | |
| 96 | |
| 97 def _SymbolizeMac(self, symfile): | |
| 98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True)) | |
| 99 | |
| 100 # 16 for the address, 2 for "0x", 1 for the space | |
| 101 chars_per_address = 19 | |
| 102 | |
| 103 load_address = (symbolize_trace_macho_reader. | |
| 104 ReadMachOTextLoadAddress(symfile.symbolizable_path)) | |
| 105 assert load_address is not None | |
| 106 | |
| 107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l', | |
| 108 '0x%x' % load_address, '-o', | |
| 109 symfile.symbolizable_path] | |
| 110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1 | |
| 111 | |
| 112 # The maximum number of inputs that can be processed at once is limited by | |
| 113 # ARG_MAX. This currently evalutes to ~13000 on macOS. | |
| 114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address | |
| 115 | |
| 116 all_keys = symfile.frames_by_address.keys() | |
| 117 processed_keys_count = 0 | |
| 118 while len(all_keys): | |
| 119 input_count = min(len(all_keys), max_inputs) | |
| 120 keys_to_process = all_keys[0:input_count] | |
| 121 | |
| 122 cmd = list(cmd_base) | |
| 123 cmd.extend([hex(int(x) + load_address) | |
| 124 for x in keys_to_process]) | |
| 125 output_array = subprocess.check_output(cmd).split('\n') | |
| 126 for i in range(len(keys_to_process)): | |
| 127 for frame in (symfile.frames_by_address.values() | |
| 128 [i + processed_keys_count]): | |
| 129 frame.name = self._matcher.Match(output_array[i]) | |
| 130 processed_keys_count += len(keys_to_process) | |
| 131 all_keys = all_keys[input_count:] | |
| 132 | |
| 133 | |
| 134 def _SymbolizeWin(self, symfile): | |
| 135 """Invoke symbolizer binary on windows and write all input in one go. | |
| 136 | |
| 137 Unlike linux, on windows, symbolization talks through a shared system | |
| 138 service that handles communication with the NT symbol servers. This | |
| 139 creates an explicit serialization (and therefor lock contention) of | |
| 140 any process using the symbol API for files do not have a local PDB. | |
| 141 | |
| 142 Thus, even though the windows symbolizer binary can be make command line | |
| 143 compatible with the POSIX addr2line interface, paralellizing the | |
| 144 symbolization does not yield the same performance effects. Running | |
| 145 just one symbolizer seems good enough for now. Can optimize later | |
| 146 if this becomes a bottleneck. | |
| 147 """ | |
| 148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe', | |
| 149 symfile.symbolizable_path] | |
| 150 | |
| 151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, | |
| 152 stderr=sys.stderr) | |
| 153 addrs = ["%x" % relative_pc for relative_pc in | |
| 154 symfile.frames_by_address.keys()] | |
| 155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs)) | |
| 156 stdout_data = stdout_data.split('\n') | |
| 157 | |
| 158 # This is known to be in the same order as stderr_data. | |
| 159 for i, addr in enumerate(addrs): | |
| 160 for frame in symfile.frames_by_address[int(addr, 16)]: | |
| 161 # Output of addr2line with --functions is always 2 outputs per | |
| 162 # symbol, function name followed by source line number. Only grab | |
| 163 # the function name as line info is not always available. | |
| 164 frame.name = stdout_data[i * 2] | |
| 165 | |
| 166 | |
| 167 def Symbolize(self, symfile, unsymbolized_name): | |
| 168 if self.is_mac: | |
| 169 self._SymbolizeMac(symfile) | |
| 170 if self.is_win: | |
| 171 self._SymbolizeWin(symfile) | |
| 172 else: | |
| 173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name) | |
| 174 | |
| 175 | |
| 176 def IsSymbolizableFile(self, file_path): | |
| 177 if self.is_win: | |
| 178 extension = os.path.splitext(file_path)[1].lower() | |
| 179 return extension in ['.dll', '.exe'] | |
| 180 else: | |
| 181 result = subprocess.check_output(['file', '-0', file_path]) | |
| 182 type_string = result[result.find('\0') + 1:] | |
| 183 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*', | |
| 184 type_string, re.DOTALL)) | |
| 185 | |
| 186 | |
| 187 class ProcessMemoryMaps(object): | |
| 188 """Represents 'process_mmaps' trace file entry.""" | |
| 189 | 162 | 
| 190 class Region(object): | 163 class Region(object): | 
| 191 def __init__(self, start_address, size, file_path): | 164 def __init__(self, start_address, size, file_path): | 
| 192 self._start_address = start_address | 165 self._start_address = start_address | 
| 193 self._size = size | 166 self._size = size | 
| 194 self._file_path = file_path | 167 self._file_path = file_path | 
| 195 | 168 | 
| 196 @property | 169 @property | 
| 197 def start_address(self): | 170 def start_address(self): | 
| 198 return self._start_address | 171 return self._start_address | 
| (...skipping 15 matching lines...) Expand all Loading... | |
| 214 return long(self._start_address).__cmp__(long(other._start_address)) | 187 return long(self._start_address).__cmp__(long(other._start_address)) | 
| 215 elif isinstance(other, (long, int)): | 188 elif isinstance(other, (long, int)): | 
| 216 return long(self._start_address).__cmp__(long(other)) | 189 return long(self._start_address).__cmp__(long(other)) | 
| 217 else: | 190 else: | 
| 218 raise Exception('Cannot compare with %s' % type(other)) | 191 raise Exception('Cannot compare with %s' % type(other)) | 
| 219 | 192 | 
| 220 def __repr__(self): | 193 def __repr__(self): | 
| 221 return 'Region(0x{:X} - 0x{:X}, {})'.format( | 194 return 'Region(0x{:X} - 0x{:X}, {})'.format( | 
| 222 self.start_address, self.end_address, self.file_path) | 195 self.start_address, self.end_address, self.file_path) | 
| 223 | 196 | 
| 224 def __init__(self, process_mmaps): | 197 def __init__(self, process_mmaps_node): | 
| 225 """Parses 'process_mmaps' dictionary.""" | |
| 226 | |
| 227 regions = [] | 198 regions = [] | 
| 228 for region_value in process_mmaps['vm_regions']: | 199 for region_node in process_mmaps_node['vm_regions']: | 
| 229 regions.append(self.Region( | 200 regions.append(self.Region( | 
| 230 long(region_value['sa'], 16), | 201 long(region_node['sa'], 16), | 
| 231 long(region_value['sz'], 16), | 202 long(region_node['sz'], 16), | 
| 232 region_value['mf'])) | 203 region_node['mf'])) | 
| 233 regions.sort() | 204 regions.sort() | 
| 234 | 205 | 
| 235 # Copy regions without duplicates and check for overlaps. | 206 # Copy regions without duplicates and check for overlaps. | 
| 236 self._regions = [] | 207 self._regions = [] | 
| 237 previous_region = None | 208 previous_region = None | 
| 238 for region in regions: | 209 for region in regions: | 
| 239 if previous_region is not None: | 210 if previous_region is not None: | 
| 240 if region == previous_region: | 211 if region == previous_region: | 
| 241 continue | 212 continue | 
| 242 assert region.start_address >= previous_region.end_address, \ | 213 assert region.start_address >= previous_region.end_address, \ | 
| 243 'Regions {} and {} overlap.'.format(previous_region, region) | 214 'Regions {} and {} overlap.'.format(previous_region, region) | 
| 244 previous_region = region | 215 previous_region = region | 
| 245 self._regions.append(region) | 216 self._regions.append(region) | 
| 246 | 217 | 
| 247 @property | 218 @property | 
| 248 def regions(self): | 219 def regions(self): | 
| 249 return self._regions | 220 return self._regions | 
| 250 | 221 | 
| 251 def FindRegion(self, address): | 222 def FindRegion(self, address): | 
| 252 """Finds region containing |address|. Returns None if none found.""" | 223 """Finds region containing |address|. Returns None if none found.""" | 
| 253 | 224 | 
| 254 region_index = bisect.bisect_right(self._regions, address) - 1 | 225 region_index = bisect.bisect_right(self._regions, address) - 1 | 
| 255 if region_index >= 0: | 226 if region_index >= 0: | 
| 256 region = self._regions[region_index] | 227 region = self._regions[region_index] | 
| 257 if address >= region.start_address and address < region.end_address: | 228 if address >= region.start_address and address < region.end_address: | 
| 258 return region | 229 return region | 
| 259 return None | 230 return None | 
| 260 | 231 | 
| 261 | 232 | 
| 262 class StackFrames(object): | 233 class UnsupportedHeapDumpVersionError(Exception): | 
| 263 """Represents 'stackFrames' trace file entry.""" | 234 """Helper exception class to signal unsupported heap dump version.""" | 
| 264 | 235 | 
| 265 class PCFrame(object): | 236 def __init__(self, version): | 
| 266 def __init__(self, pc, frame): | 237 message = 'Unsupported heap dump version: {}'.format(version) | 
| 238 super(UnsupportedHeapDumpVersionError, self).__init__(message) | |
| 239 | |
| 240 | |
| 241 class StringMap(NodeWrapper): | |
| 242 """Wraps all 'strings' nodes for a process. | |
| 243 | |
| 244 'strings' node contains incremental mappings between integer ids and strings. | |
| 245 | |
| 246 "strings": [ | |
| 247 { | |
| 248 "id": <string_id>, | |
| 249 "string": <string> | |
| 250 }, | |
| 251 ... | |
| 252 ] | |
| 253 """ | |
| 254 | |
| 255 def __init__(self): | |
| 256 self._modified = False | |
| 257 self._strings_nodes = [] | |
| 258 self._string_by_id = {} | |
| 259 self._id_by_string = {} | |
| 260 self._max_string_id = 0 | |
| 261 | |
| 262 @property | |
| 263 def modified(self): | |
| 264 """Returns True if the wrapper was modified (see NodeWrapper).""" | |
| 265 return self._modified | |
| 266 | |
| 267 @property | |
| 268 def string_by_id(self): | |
| 269 return self._string_by_id | |
| 270 | |
| 271 def ParseNext(self, heap_dump_version, strings_node): | |
| 272 """Parses and interns next node (see NodeWrapper).""" | |
| 273 | |
| 274 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
| 275 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 276 | |
| 277 self._strings_nodes.append(strings_node) | |
| 278 for string_node in strings_node: | |
| 279 self._Insert(string_node['id'], string_node['string']) | |
| 280 | |
| 281 def Clear(self): | |
| 282 """Clears all string mappings.""" | |
| 283 if self._string_by_id: | |
| 284 self._modified = True | |
| 285 self._string_by_id = {} | |
| 286 self._id_by_string = {} | |
| 287 self._Insert(0, '[null]') | |
| 288 self._max_string_id = 0 | |
| 289 | |
| 290 def AddString(self, string): | |
| 291 """Adds a string (if it doesn't exist) and returns its integer id.""" | |
| 292 string_id = self._id_by_string.get(string) | |
| 293 if string_id is None: | |
| 294 string_id = self._max_string_id + 1 | |
| 295 self._Insert(string_id, string) | |
| 296 self._modified = True | |
| 297 return string_id | |
| 298 | |
| 299 def ApplyModifications(self): | |
| 300 """Propagates modifications back to nodes (see NodeWrapper).""" | |
| 301 if not self.modified: | |
| 302 return | |
| 303 | |
| 304 assert self._strings_nodes, 'no nodes' | |
| 305 | |
| 306 # Serialize into the first node, and clear all others. | |
| 307 | |
| 308 for strings_node in self._strings_nodes: | |
| 309 del strings_node[:] | |
| 310 strings_node = self._strings_nodes[0] | |
| 311 for string_id, string in self._string_by_id.iteritems(): | |
| 312 strings_node.append({'id': string_id, 'string': string}) | |
| 313 | |
| 314 self._modified = False | |
| 315 | |
| 316 def _Insert(self, string_id, string): | |
| 317 self._id_by_string[string] = string_id | |
| 318 self._string_by_id[string_id] = string | |
| 319 self._max_string_id = max(self._max_string_id, string_id) | |
| 320 | |
| 321 | |
| 322 class TypeNameMap(NodeWrapper): | |
| 323 """Wraps all 'types' nodes for a process. | |
| 324 | |
| 325 'types' nodes encode mappings between integer type ids and integer | |
| 326 string ids (from 'strings' nodes). | |
| 327 | |
| 328 "types": [ | |
| 329 { | |
| 330 "id": <type_id>, | |
| 331 "name_sid": <name_string_id> | |
| 332 } | |
| 333 ... | |
| 334 ] | |
| 335 | |
| 336 For simplicity string ids are translated into strings during parsing, | |
| 337 and then translated back to ids in ApplyModifications(). | |
| 338 """ | |
| 339 def __init__(self): | |
| 340 self._modified = False | |
| 341 self._type_name_nodes = [] | |
| 342 self._name_by_id = {} | |
| 343 self._id_by_name = {} | |
| 344 self._max_type_id = 0 | |
| 345 | |
| 346 @property | |
| 347 def modified(self): | |
| 348 """Returns True if the wrapper was modified (see NodeWrapper).""" | |
| 349 return self._modified | |
| 350 | |
| 351 @property | |
| 352 def name_by_id(self): | |
| 353 """Returns {id -> name} dict (must not be changed directly).""" | |
| 354 return self._name_by_id | |
| 355 | |
| 356 def ParseNext(self, heap_dump_version, type_name_node, string_map): | |
| 357 """Parses and interns next node (see NodeWrapper). | |
| 358 | |
| 359 |string_map| - A StringMap object to use to translate string ids | |
| 360 to strings. | |
| 361 """ | |
| 362 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
| 363 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 364 | |
| 365 self._type_name_nodes.append(type_name_node) | |
| 366 for type_node in type_name_node: | |
| 367 self._Insert(type_node['id'], | |
| 368 string_map.string_by_id[type_node['name_sid']]) | |
| 369 | |
| 370 def AddType(self, type_name): | |
| 371 """Adds a type name (if it doesn't exist) and returns its id.""" | |
| 372 type_id = self._id_by_name.get(type_name) | |
| 373 if type_id is None: | |
| 374 type_id = self._max_type_id + 1 | |
| 375 self._Insert(type_id, type_name) | |
| 376 self._modified = True | |
| 377 return type_id | |
| 378 | |
| 379 def ApplyModifications(self, string_map, force=False): | |
| 380 """Propagates modifications back to nodes. | |
| 381 | |
| 382 |string_map| - A StringMap object to use to translate strings to ids. | |
| 383 |force| - Whether to propagate changes regardless of 'modified' flag. | |
| 384 """ | |
| 385 if not self.modified and not force: | |
| 386 return | |
| 387 | |
| 388 assert self._type_name_nodes, 'no nodes' | |
| 389 | |
| 390 # Serialize into the first node, and clear all others. | |
| 391 | |
| 392 for types_node in self._type_name_nodes: | |
| 393 del types_node[:] | |
| 394 types_node = self._type_name_nodes[0] | |
| 395 for type_id, type_name in self._name_by_id.iteritems(): | |
| 396 types_node.append({ | |
| 397 'id': type_id, | |
| 398 'name_sid': string_map.AddString(type_name)}) | |
| 399 | |
| 400 self._modified = False | |
| 401 | |
| 402 def _Insert(self, type_id, type_name): | |
| 403 self._id_by_name[type_name] = type_id | |
| 404 self._name_by_id[type_id] = type_name | |
| 405 self._max_type_id = max(self._max_type_id, type_id) | |
| 406 | |
| 407 | |
| 408 class StackFrameMap(NodeWrapper): | |
| 409 """ Wraps stack frame tree nodes for a process. | |
| 410 | |
| 411 For the legacy format this wrapper expects a single 'stackFrames' node | |
| 412 (which comes from metadata event): | |
| 413 | |
| 414 "stackFrames": { | |
| 415 "<frame_id>": { | |
| 416 "name": "<frame_name>" | |
| 417 "parent": "<parent_frame_id>" | |
| 418 }, | |
| 419 ... | |
| 420 } | |
| 421 | |
| 422 For the modern format this wrapper expects several 'nodes' nodes: | |
| 423 | |
| 424 "nodes": [ | |
| 425 { | |
| 426 "id": <frame_id>, | |
| 427 "parent": <parent_frame_id>, | |
| 428 "name_sid": <name_string_id> | |
| 429 }, | |
| 430 ... | |
| 431 ] | |
| 432 | |
| 433 In both formats frame name is a string. Native heap profiler generates | |
| 434 specially formatted frame names (e.g. "pc:10eb78dba") for function | |
| 435 addresses (PCs). Inner Frame class below parses name and extracts PC, | |
| 436 if it's there. | |
| 437 """ | |
| 438 class Frame(object): | |
| 439 def __init__(self, frame_id, name, parent_frame_id): | |
| 267 self._modified = False | 440 self._modified = False | 
| 268 self._pc = pc | 441 self._id = frame_id | 
| 269 self._frame = frame | 442 self._name = name | 
| 443 self._pc = self._ParsePC(name) | |
| 444 self._parent_id = parent_frame_id | |
| 445 self._ext = None | |
| 270 | 446 | 
| 271 @property | 447 @property | 
| 272 def modified(self): | 448 def modified(self): | 
| 449 """Returns True if the frame was modified. | |
| 450 | |
| 451 For example changing frame's name sets this flag (since the change | |
| 452 needs to be propagated back to nodes). | |
| 453 """ | |
| 273 return self._modified | 454 return self._modified | 
| 274 | 455 | 
| 275 @property | 456 @property | 
| 457 def id(self): | |
| 458 """Frame id (integer).""" | |
| 459 return self._id | |
| 460 | |
| 461 @property | |
| 276 def pc(self): | 462 def pc(self): | 
| 463 """Parsed (integer) PC of the frame, or None.""" | |
| 277 return self._pc | 464 return self._pc | 
| 278 | 465 | 
| 279 @property | 466 @property | 
| 280 def name(self): | 467 def name(self): | 
| 281 return self._frame['name'] | 468 """Name of the frame (see above).""" | 
| 469 return self._name | |
| 282 | 470 | 
| 283 @name.setter | 471 @name.setter | 
| 284 def name(self, value): | 472 def name(self, value): | 
| 473 """Changes the name. Doesn't affect value of |pc|.""" | |
| 285 self._modified = True | 474 self._modified = True | 
| 286 self._frame['name'] = value | 475 self._name = value | 
| 287 | 476 | 
| 288 def __init__(self, stack_frames): | 477 @property | 
| 289 """Constructs object using 'stackFrames' dictionary.""" | 478 def parent_id(self): | 
| 290 self._pc_frames = [] | 479 """Parent frame id (integer).""" | 
| 291 for frame in stack_frames.itervalues(): | 480 return self._parent_id | 
| 292 pc_frame = self._ParsePCFrame(frame) | 481 | 
| 293 if pc_frame: | 482 _PC_TAG = 'pc:' | 
| 294 self._pc_frames.append(pc_frame) | 483 | 
| 295 | 484 def _ParsePC(self, name): | 
| 296 @property | 485 if not name.startswith(self._PC_TAG): | 
| 297 def pc_frames(self): | 486 return None | 
| 298 return self._pc_frames | 487 return long(name[len(self._PC_TAG):], 16) | 
| 488 | |
| 489 def _ClearModified(self): | |
| 490 self._modified = False | |
| 491 | |
| 492 def __init__(self): | |
| 493 self._modified = False | |
| 494 self._heap_dump_version = None | |
| 495 self._stack_frames_nodes = [] | |
| 496 self._frame_by_id = {} | |
| 299 | 497 | 
| 300 @property | 498 @property | 
| 301 def modified(self): | 499 def modified(self): | 
| 302 return any(f.modified for f in self._pc_frames) | 500 """Returns True if the wrapper or any of its frames were modified.""" | 
| 303 | 501 return (self._modified or | 
| 304 _PC_TAG = 'pc:' | 502 any(f.modified for f in self._frame_by_id.itervalues())) | 
| 305 | 503 | 
| 306 @classmethod | 504 @property | 
| 307 def _ParsePCFrame(self, frame): | 505 def frame_by_id(self): | 
| 308 name = frame['name'] | 506 """Returns {id -> frame} dict (must not be modified directly).""" | 
| 309 if not name.startswith(self._PC_TAG): | 507 return self._frame_by_id | 
| 310 return None | 508 | 
| 311 pc = long(name[len(self._PC_TAG):], 16) | 509 def ParseNext(self, heap_dump_version, stack_frames_node, string_map): | 
| 312 return self.PCFrame(pc, frame) | 510 """Parses the next stack frames node (see NodeWrapper). | 
| 313 | 511 | 
| 314 | 512 For the modern format |string_map| is used to translate string ids | 
| 315 class Process(object): | 513 to strings. | 
| 316 """Holds various bits of information about a process in a trace file.""" | 514 """ | 
| 317 | 515 | 
| 318 def __init__(self, pid): | 516 frame_by_id = {} | 
| 319 self.pid = pid | 517 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | 
| 320 self.name = None | 518 if self._stack_frames_nodes: | 
| 321 self.mmaps = None | 519 raise Exception('Legacy stack frames node is expected only once.') | 
| 322 self.stack_frames = None | 520 for frame_id, frame_node in stack_frames_node.iteritems(): | 
| 323 | 521 frame = self.Frame(frame_id, | 
| 324 | 522 frame_node['name'], | 
| 325 def CollectProcesses(trace): | 523 frame_node.get('parent')) | 
| 326 """Parses trace dictionary and returns pid->Process map of all processes | 524 frame_by_id[frame.id] = frame | 
| 327 suitable for symbolization (which have both mmaps and stack_frames). | 525 else: | 
| 526 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
| 527 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 528 for frame_node in stack_frames_node: | |
| 529 frame = self.Frame(frame_node['id'], | |
| 530 string_map.string_by_id[frame_node['name_sid']], | |
| 531 frame_node.get('parent')) | |
| 532 frame_by_id[frame.id] = frame | |
| 533 | |
| 534 self._heap_dump_version = heap_dump_version | |
| 535 self._stack_frames_nodes.append(stack_frames_node) | |
| 536 | |
| 537 self._frame_by_id = frame_by_id | |
| 538 | |
| 539 def ApplyModifications(self, string_map, force=False): | |
| 540 """Applies modifications back to nodes (see NodeWrapper).""" | |
| 541 | |
| 542 if not self.modified and not force: | |
| 543 return | |
| 544 | |
| 545 assert self._stack_frames_nodes, 'no nodes' | |
| 546 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 547 assert string_map is None, \ | |
| 548 'string_map should not be used with the legacy format' | |
| 549 | |
| 550 # Serialize frames into the first node, clear all others. | |
| 551 | |
| 552 for frames_node in self._stack_frames_nodes: | |
| 553 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 554 frames_node.clear() | |
| 555 else: | |
| 556 del frames_node[:] | |
| 557 | |
| 558 frames_node = self._stack_frames_nodes[0] | |
| 559 for frame in self._frame_by_id.itervalues(): | |
| 560 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 561 frame_node = {'name': frame.name} | |
| 562 frames_node[frame.id] = frame_node | |
| 563 else: | |
| 564 frame_node = { | |
| 565 'id': frame.id, | |
| 566 'name_sid': string_map.AddString(frame.name) | |
| 567 } | |
| 568 frames_node.append(frame_node) | |
| 569 if frame.parent_id is not None: | |
| 570 frame_node['parent'] = frame.parent_id | |
| 571 frame._ClearModified() | |
| 572 | |
| 573 self._modified = False | |
| 574 | |
| 575 | |
| 576 class Trace(NodeWrapper): | |
| 577 """Wrapper for the root trace node (i.e. the trace JSON itself). | |
| 578 | |
| 579 This wrapper parses select nodes from memory-infra events and groups | |
| 580 parsed data per-process (see inner Process class below). | |
| 328 """ | 581 """ | 
| 329 | 582 | 
| 330 process_map = {} | 583 # Indicates legacy heap dump format. | 
| 331 | 584 HEAP_DUMP_VERSION_LEGACY = 'Legacy' | 
| 332 # Android traces produced via 'chrome://inspect/?tracing#devices' are | 585 | 
| 333 # just list of events. | 586 # Indicates variation of a modern heap dump format. | 
| 334 events = trace if isinstance(trace, list) else trace['traceEvents'] | 587 HEAP_DUMP_VERSION_1 = 1 | 
| 335 for event in events: | 588 | 
| 336 name = event.get('name') | 589 class Process(object): | 
| 337 if not name: | 590 """Collection of per-process data and wrappers.""" | 
| 338 continue | 591 | 
| 339 | 592 def __init__(self, pid): | 
| 340 pid = event['pid'] | 593 self._pid = pid | 
| 341 process = process_map.get(pid) | 594 self._name = None | 
| 342 if process is None: | 595 self._memory_map = None | 
| 343 process = Process(pid) | 596 self._stack_frame_map = StackFrameMap() | 
| 344 process_map[pid] = process | 597 self._type_name_map = TypeNameMap() | 
| 345 | 598 self._string_map = StringMap() | 
| 346 phase = event['ph'] | 599 self._heap_dump_version = None | 
| 347 if phase == TRACE_EVENT_PHASE_METADATA: | 600 | 
| 348 if name == 'process_name': | 601 @property | 
| 349 process.name = event['args']['name'] | 602 def modified(self): | 
| 350 elif name == 'stackFrames': | 603 return self._stack_frame_map.modified or self._type_name_map.modified | 
| 351 process.stack_frames = StackFrames(event['args']['stackFrames']) | 604 | 
| 352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP: | 605 @property | 
| 353 process_mmaps = event['args']['dumps'].get('process_mmaps') | 606 def pid(self): | 
| 354 if process_mmaps: | 607 return self._pid | 
| 355 # TODO(dskiba): this parses all process_mmaps, but retains only the | 608 | 
| 356 # last one. We need to parse only once (lazy parsing?). | 609 @property | 
| 357 process.mmaps = ProcessMemoryMaps(process_mmaps) | 610 def name(self): | 
| 358 | 611 return self._name | 
| 359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames] | 612 | 
| 613 @property | |
| 614 def unique_name(self): | |
| 615 """Returns string that includes both process name and its pid.""" | |
| 616 name = self._name if self._name else 'UnnamedProcess' | |
| 617 return '{}({})'.format(name, self._pid) | |
| 618 | |
| 619 @property | |
| 620 def memory_map(self): | |
| 621 return self._memory_map | |
| 622 | |
| 623 @property | |
| 624 def stack_frame_map(self): | |
| 625 return self._stack_frame_map | |
| 626 | |
| 627 @property | |
| 628 def type_name_map(self): | |
| 629 return self._type_name_map | |
| 630 | |
| 631 def ApplyModifications(self): | |
| 632 """Calls ApplyModifications() on contained wrappers.""" | |
| 633 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 634 self._stack_frame_map.ApplyModifications(None) | |
| 635 else: | |
| 636 if self._stack_frame_map.modified or self._type_name_map.modified: | |
| 637 self._string_map.Clear() | |
| 638 self._stack_frame_map.ApplyModifications(self._string_map, force=True) | |
| 639 self._type_name_map.ApplyModifications(self._string_map, force=True) | |
| 640 self._string_map.ApplyModifications() | |
| 641 | |
| 642 def __init__(self, trace_node): | |
| 643 self._trace_node = trace_node | |
| 644 self._processes = [] | |
| 645 self._heap_dump_version = None | |
| 646 | |
| 647 # Misc per-process information needed only during parsing. | |
| 648 class ProcessExt(object): | |
| 649 def __init__(self, pid): | |
| 650 self.process = Trace.Process(pid) | |
| 651 self.mapped_entry_names = set() | |
| 652 self.process_mmaps_node = None | |
| 653 self.seen_strings_node = False | |
| 654 | |
| 655 process_ext_by_pid = {} | |
| 656 | |
| 657 # Android traces produced via 'chrome://inspect/?tracing#devices' are | |
| 658 # just list of events. | |
| 659 events = trace_node if isinstance(trace_node, list) \ | |
| 660 else trace_node['traceEvents'] | |
| 661 for event in events: | |
| 662 name = event.get('name') | |
| 663 if not name: | |
| 664 continue | |
| 665 | |
| 666 pid = event['pid'] | |
| 667 process_ext = process_ext_by_pid.get(pid) | |
| 668 if process_ext is None: | |
| 669 process_ext = ProcessExt(pid) | |
| 670 process_ext_by_pid[pid] = process_ext | |
| 671 process = process_ext.process | |
| 672 | |
| 673 phase = event['ph'] | |
| 674 if phase == self._EVENT_PHASE_METADATA: | |
| 675 if name == 'process_name': | |
| 676 process._name = event['args']['name'] | |
| 677 elif name == 'stackFrames': | |
| 678 process._stack_frame_map.ParseNext( | |
| 679 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY), | |
| 680 event['args']['stackFrames'], | |
| 681 process._string_map) | |
| 682 elif phase == self._EVENT_PHASE_MEMORY_DUMP: | |
| 683 dumps = event['args']['dumps'] | |
| 684 process_mmaps = dumps.get('process_mmaps') | |
| 685 if process_mmaps: | |
| 686 # We want the most recent memory map, so parsing happens later | |
| 687 # once we finished reading all events. | |
| 688 process_ext.process_mmaps_node = process_mmaps | |
| 689 heaps = dumps.get('heaps_v2') | |
| 690 if heaps: | |
| 691 version = self._UseHeapDumpVersion(heaps['version']) | |
| 692 maps = heaps.get('maps') | |
| 693 if maps: | |
| 694 process_ext.mapped_entry_names.update(maps.iterkeys()) | |
| 695 types = maps.get('types') | |
| 696 stack_frames = maps.get('nodes') | |
| 697 strings = maps.get('strings') | |
| 698 if (strings is None and (types or stack_frames) | |
| 699 and not process_ext.seen_strings_node): | |
| 700 # ApplyModifications() for TypeNameMap and StackFrameMap puts | |
| 701 # everything into the first node and depends on StringMap. So | |
| 702 # we need to make sure that 'strings' node is there if any of | |
| 703 # other two nodes present. | |
| 704 strings = [] | |
| 705 maps['strings'] = strings | |
| 706 if strings is not None: | |
| 707 process_ext.seen_strings_node = True | |
| 708 process._string_map.ParseNext(version, strings) | |
| 709 if types: | |
| 710 process._type_name_map.ParseNext( | |
| 711 version, types, process._string_map) | |
| 712 if stack_frames: | |
| 713 process._stack_frame_map.ParseNext( | |
| 714 version, stack_frames, process._string_map) | |
| 715 | |
| 716 self._processes = [] | |
| 717 for pe in process_ext_by_pid.itervalues(): | |
| 718 pe.process._heap_dump_version = self._heap_dump_version | |
| 719 if pe.process_mmaps_node: | |
| 720 # Now parse the most recent memory map. | |
| 721 pe.process._memory_map = MemoryMap(pe.process_mmaps_node) | |
| 722 self._processes.append(pe.process) | |
| 723 | |
| 724 @property | |
| 725 def node(self): | |
| 726 """Root node (that was passed to the __init__).""" | |
| 727 return self._trace_node | |
| 728 | |
| 729 @property | |
| 730 def modified(self): | |
| 731 """Returns True if trace file needs to be updated. | |
| 732 | |
| 733 Before writing trace JSON back to a file ApplyModifications() needs | |
| 734 to be called. | |
| 735 """ | |
| 736 return any(p.modified for p in self._processes) | |
| 737 | |
| 738 @property | |
| 739 def processes(self): | |
| 740 return self._processes | |
| 741 | |
| 742 @property | |
| 743 def heap_dump_version(self): | |
| 744 return self._heap_dump_version | |
| 745 | |
| 746 def ApplyModifications(self): | |
| 747 """Propagates modifications back to the trace JSON.""" | |
| 748 for process in self._processes: | |
| 749 process.ApplyModifications() | |
| 750 assert not self.modified, 'still modified' | |
| 751 | |
| 752 # Relevant trace event phases from Chromium's | |
| 753 # src/base/trace_event/common/trace_event_common.h. | |
| 754 _EVENT_PHASE_METADATA = 'M' | |
| 755 _EVENT_PHASE_MEMORY_DUMP = 'v' | |
| 756 | |
| 757 def _UseHeapDumpVersion(self, version): | |
| 758 if self._heap_dump_version is None: | |
| 759 self._heap_dump_version = version | |
| 760 return version | |
| 761 elif self._heap_dump_version != version: | |
| 762 raise Exception( | |
| 763 ("Inconsistent trace file: first saw '{}' heap dump version, " | |
| 764 "then '{}'.").format(self._heap_dump_version, version)) | |
| 765 else: | |
| 766 return version | |
| 360 | 767 | 
| 361 | 768 | 
| 362 class SymbolizableFile(object): | 769 class SymbolizableFile(object): | 
| 363 """Holds file path, addresses to symbolize and stack frames to update. | 770 """Holds file path, addresses to symbolize and stack frames to update. | 
| 364 | 771 | 
| 365 This class is a link between ELFSymbolizer and a trace file: it specifies | 772 This class is a link between ELFSymbolizer and a trace file: it specifies | 
| 366 what to symbolize (addresses) and what to update with the symbolization | 773 what to symbolize (addresses) and what to update with the symbolization | 
| 367 result (frames). | 774 result (frames). | 
| 368 """ | 775 """ | 
| 369 def __init__(self, file_path): | 776 def __init__(self, file_path): | 
| 370 self.path = file_path | 777 self.path = file_path | 
| 371 self.symbolizable_path = file_path # path to use for symbolization | 778 self.symbolizable_path = file_path # path to use for symbolization | 
| 372 self.frames_by_address = collections.defaultdict(list) | 779 self.frames_by_address = collections.defaultdict(list) | 
| 373 | 780 | 
| 374 | 781 | 
| 375 def ResolveSymbolizableFiles(processes): | 782 def ResolveSymbolizableFiles(processes): | 
| 376 """Resolves and groups PCs into list of SymbolizableFiles. | 783 """Resolves and groups PCs into list of SymbolizableFiles. | 
| 377 | 784 | 
| 378 As part of the grouping process, this function resolves PC from each stack | 785 As part of the grouping process, this function resolves PC from each stack | 
| 379 frame to the corresponding mmap region. Stack frames that failed to resolve | 786 frame to the corresponding mmap region. Stack frames that failed to resolve | 
| 380 are symbolized with '<unresolved>'. | 787 are symbolized with '<unresolved>'. | 
| 381 """ | 788 """ | 
| 382 symfile_by_path = {} | 789 symfile_by_path = {} | 
| 383 for process in processes: | 790 for process in processes: | 
| 384 for frame in process.stack_frames.pc_frames: | 791 if not process.memory_map: | 
| 385 region = process.mmaps.FindRegion(frame.pc) | 792 continue | 
| 793 for frame in process.stack_frame_map.frame_by_id.itervalues(): | |
| 794 if frame.pc is None: | |
| 795 continue | |
| 796 region = process.memory_map.FindRegion(frame.pc) | |
| 386 if region is None: | 797 if region is None: | 
| 387 frame.name = '<unresolved>' | 798 frame.name = '<unresolved>' | 
| 388 continue | 799 continue | 
| 389 | 800 | 
| 390 symfile = symfile_by_path.get(region.file_path) | 801 symfile = symfile_by_path.get(region.file_path) | 
| 391 if symfile is None: | 802 if symfile is None: | 
| 392 symfile = SymbolizableFile(region.file_path) | 803 symfile = SymbolizableFile(region.file_path) | 
| 393 symfile_by_path[symfile.path] = symfile | 804 symfile_by_path[symfile.path] = symfile | 
| 394 | 805 | 
| 395 relative_pc = frame.pc - region.start_address | 806 relative_pc = frame.pc - region.start_address | 
| 396 symfile.frames_by_address[relative_pc].append(frame) | 807 symfile.frames_by_address[relative_pc].append(frame) | 
| 397 return symfile_by_path.values() | 808 return symfile_by_path.values() | 
| 398 | 809 | 
| 399 | 810 | 
| 811 def FindInSystemPath(binary_name): | |
| 812 paths = os.environ['PATH'].split(os.pathsep) | |
| 813 for path in paths: | |
| 814 binary_path = os.path.join(path, binary_name) | |
| 815 if os.path.isfile(binary_path): | |
| 816 return binary_path | |
| 817 return None | |
| 818 | |
| 819 | |
| 820 class Symbolizer(object): | |
| 821 """Encapsulates platform-specific symbolization logic.""" | |
| 822 | |
| 823 def __init__(self): | |
| 824 self.is_mac = sys.platform == 'darwin' | |
| 825 self.is_win = sys.platform == 'win32' | |
| 826 if self.is_mac: | |
| 827 self.binary = 'atos' | |
| 828 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher() | |
| 829 elif self.is_win: | |
| 830 self.binary = 'addr2line-pdb.exe' | |
| 831 else: | |
| 832 self.binary = 'addr2line' | |
| 833 self.symbolizer_path = FindInSystemPath(self.binary) | |
| 834 | |
| 835 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name): | |
| 836 def _SymbolizerCallback(sym_info, frames): | |
| 837 # Unwind inline chain to the top. | |
| 838 while sym_info.inlined_by: | |
| 839 sym_info = sym_info.inlined_by | |
| 840 | |
| 841 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name | |
| 842 for frame in frames: | |
| 843 frame.name = symbolized_name | |
| 844 frame.ext.source_path = sym_info.source_path | |
| 845 | |
| 846 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path, | |
| 847 self.symbolizer_path, | |
| 848 _SymbolizerCallback, | |
| 849 inlines=True) | |
| 850 | |
| 851 for address, frames in symfile.frames_by_address.iteritems(): | |
| 852 # SymbolizeAsync() asserts that the type of address is int. We operate | |
| 853 # on longs (since they are raw pointers possibly from 64-bit processes). | |
| 854 # It's OK to cast here because we're passing relative PC, which should | |
| 855 # always fit into int. | |
| 856 symbolizer.SymbolizeAsync(int(address), frames) | |
| 857 | |
| 858 symbolizer.Join() | |
| 859 | |
| 860 | |
| 861 def _SymbolizeMac(self, symfile): | |
| 862 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True)) | |
| 863 | |
| 864 # 16 for the address, 2 for "0x", 1 for the space | |
| 865 chars_per_address = 19 | |
| 866 | |
| 867 load_address = (symbolize_trace_macho_reader. | |
| 868 ReadMachOTextLoadAddress(symfile.symbolizable_path)) | |
| 869 assert load_address is not None | |
| 870 | |
| 871 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l', | |
| 872 '0x%x' % load_address, '-o', | |
| 873 symfile.symbolizable_path] | |
| 874 chars_for_other_arguments = len(' '.join(cmd_base)) + 1 | |
| 875 | |
| 876 # The maximum number of inputs that can be processed at once is limited by | |
| 877 # ARG_MAX. This currently evalutes to ~13000 on macOS. | |
| 878 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address | |
| 879 | |
| 880 all_keys = symfile.frames_by_address.keys() | |
| 881 processed_keys_count = 0 | |
| 882 while len(all_keys): | |
| 883 input_count = min(len(all_keys), max_inputs) | |
| 884 keys_to_process = all_keys[0:input_count] | |
| 885 cmd = list(cmd_base) | |
| 886 cmd.extend([hex(int(x) + load_address) | |
| 887 for x in keys_to_process]) | |
| 888 output_array = subprocess.check_output(cmd).split('\n') | |
| 889 for i in range(len(keys_to_process)): | |
| 890 for frame in (symfile.frames_by_address.values() | |
| 891 [i + processed_keys_count]): | |
| 892 frame.name = self._matcher.Match(output_array[i]) | |
| 893 processed_keys_count += len(keys_to_process) | |
| 894 all_keys = all_keys[input_count:] | |
| 895 | |
| 896 def _SymbolizeWin(self, symfile): | |
| 897 """Invoke symbolizer binary on windows and write all input in one go. | |
| 898 | |
| 899 Unlike linux, on windows, symbolization talks through a shared system | |
| 900 service that handles communication with the NT symbol servers. This | |
| 901 creates an explicit serialization (and therefor lock contention) of | |
| 902 any process using the symbol API for files do not have a local PDB. | |
| 903 | |
| 904 Thus, even though the windows symbolizer binary can be make command line | |
| 905 compatible with the POSIX addr2line interface, paralellizing the | |
| 906 symbolization does not yield the same performance effects. Running | |
| 907 just one symbolizer seems good enough for now. Can optimize later | |
| 908 if this becomes a bottleneck. | |
| 909 """ | |
| 910 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe', | |
| 911 symfile.symbolizable_path] | |
| 912 | |
| 913 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, | |
| 914 stderr=sys.stderr) | |
| 915 addrs = ["%x" % relative_pc for relative_pc in | |
| 916 symfile.frames_by_address.keys()] | |
| 917 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs)) | |
| 918 stdout_data = stdout_data.split('\n') | |
| 919 | |
| 920 # This is known to be in the same order as stderr_data. | |
| 921 for i, addr in enumerate(addrs): | |
| 922 for frame in symfile.frames_by_address[int(addr, 16)]: | |
| 923 # Output of addr2line with --functions is always 2 outputs per | |
| 924 # symbol, function name followed by source line number. Only grab | |
| 925 # the function name as line info is not always available. | |
| 926 frame.name = stdout_data[i * 2] | |
| 927 | |
| 928 def Symbolize(self, symfile, unsymbolized_name): | |
| 929 if self.is_mac: | |
| 930 self._SymbolizeMac(symfile) | |
| 931 elif self.is_win: | |
| 932 self._SymbolizeWin(symfile) | |
| 933 else: | |
| 934 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name) | |
| 935 | |
| 936 def IsSymbolizableFile(self, file_path): | |
| 937 if self.is_win: | |
| 938 extension = os.path.splitext(file_path)[1].lower() | |
| 939 return extension in ['.dll', '.exe'] | |
| 940 else: | |
| 941 result = subprocess.check_output(['file', '-0', file_path]) | |
| 942 type_string = result[result.find('\0') + 1:] | |
| 943 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*', | |
| 944 type_string, re.DOTALL)) | |
| 945 | |
| 946 | |
| 400 def SymbolizeFiles(symfiles, symbolizer): | 947 def SymbolizeFiles(symfiles, symbolizer): | 
| 401 """Symbolizes each file in the given list of SymbolizableFiles | 948 """Symbolizes each file in the given list of SymbolizableFiles | 
| 402 and updates stack frames with symbolization results.""" | 949 and updates stack frames with symbolization results.""" | 
| 950 | |
| 951 if not symfiles: | |
| 952 print 'Nothing to symbolize.' | |
| 953 return | |
| 954 | |
| 403 print 'Symbolizing...' | 955 print 'Symbolizing...' | 
| 404 | 956 | 
| 405 def _SubPrintf(message, *args): | 957 def _SubPrintf(message, *args): | 
| 406 print (' ' + message).format(*args) | 958 print (' ' + message).format(*args) | 
| 407 | 959 | 
| 408 symbolized = False | |
| 409 for symfile in symfiles: | 960 for symfile in symfiles: | 
| 410 unsymbolized_name = '<{}>'.format( | 961 unsymbolized_name = '<{}>'.format( | 
| 411 symfile.path if symfile.path else 'unnamed') | 962 symfile.path if symfile.path else 'unnamed') | 
| 412 | 963 | 
| 413 problem = None | 964 problem = None | 
| 414 if not os.path.isabs(symfile.symbolizable_path): | 965 if not os.path.isabs(symfile.symbolizable_path): | 
| 415 problem = 'not a file' | 966 problem = 'not a file' | 
| 416 elif not os.path.isfile(symfile.symbolizable_path): | 967 elif not os.path.isfile(symfile.symbolizable_path): | 
| 417 problem = "file doesn't exist" | 968 problem = "file doesn't exist" | 
| 418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): | 969 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): | 
| 419 problem = 'file is not symbolizable' | 970 problem = 'file is not symbolizable' | 
| 420 if problem: | 971 if problem: | 
| 421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", | 972 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", | 
| 422 len(symfile.frames_by_address), | 973 len(symfile.frames_by_address), | 
| 423 symfile.symbolizable_path, | 974 symfile.symbolizable_path, | 
| 424 problem) | 975 problem) | 
| 425 for frames in symfile.frames_by_address.itervalues(): | 976 for frames in symfile.frames_by_address.itervalues(): | 
| 426 for frame in frames: | 977 for frame in frames: | 
| 427 frame.name = unsymbolized_name | 978 frame.name = unsymbolized_name | 
| 428 continue | 979 continue | 
| 429 | 980 | 
| 430 _SubPrintf('Symbolizing {} PCs from {}...', | 981 _SubPrintf('Symbolizing {} PCs from {}...', | 
| 431 len(symfile.frames_by_address), | 982 len(symfile.frames_by_address), | 
| 432 symfile.path) | 983 symfile.path) | 
| 433 | 984 | 
| 434 symbolizer.Symbolize(symfile, unsymbolized_name) | 985 symbolizer.Symbolize(symfile, unsymbolized_name) | 
| 435 symbolized = True | |
| 436 | 986 | 
| 437 return symbolized | 987 | 
| 988 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) | |
| 989 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available | |
| 990 # via 'name' group. | |
| 991 ANDROID_PATH_MATCHER = re.compile( | |
| 992 r'^/data/(?:' | |
| 993 r'app/[^/]+/lib/[^/]+/|' | |
| 994 r'app-lib/[^/]+/|' | |
| 995 r'data/[^/]+/incremental-install-files/lib/' | |
| 996 r')(?P<name>.*\.so)') | |
| 997 | |
| 998 # Subpath of output path where unstripped libraries are stored. | |
| 999 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped' | |
| 438 | 1000 | 
| 439 | 1001 | 
| 440 def HaveFilesFromAndroid(symfiles): | 1002 def HaveFilesFromAndroid(symfiles): | 
| 441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) | 1003 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) | 
| 442 | 1004 | 
| 443 | 1005 | 
| 444 def RemapAndroidFiles(symfiles, output_path): | 1006 def RemapAndroidFiles(symfiles, output_path): | 
| 445 for symfile in symfiles: | 1007 for symfile in symfiles: | 
| 446 match = ANDROID_PATH_MATCHER.match(symfile.path) | 1008 match = ANDROID_PATH_MATCHER.match(symfile.path) | 
| 447 if match: | 1009 if match: | 
| 448 name = match.group('name') | 1010 name = match.group('name') | 
| 449 symfile.symbolizable_path = os.path.join( | 1011 symfile.symbolizable_path = os.path.join( | 
| 450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) | 1012 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) | 
| 451 else: | 1013 else: | 
| 452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). | 1014 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). | 
| 453 # Without this, files won't be symbolized with "file not found" problem, | 1015 # Without this, files won't be symbolized with "file not found" problem, | 
| 454 # which is not accurate. | 1016 # which is not accurate. | 
| 455 symfile.symbolizable_path = 'android://{}'.format(symfile.path) | 1017 symfile.symbolizable_path = 'android://{}'.format(symfile.path) | 
| 456 | 1018 | 
| 457 | 1019 | 
| 1020 def Symbolize(options, trace, symbolizer): | |
| 1021 symfiles = ResolveSymbolizableFiles(trace.processes) | |
| 1022 | |
| 1023 # Android trace files don't have any indication they are from Android. | |
| 1024 # So we're checking for Android-specific paths. | |
| 1025 if HaveFilesFromAndroid(symfiles): | |
| 1026 if not options.output_directory: | |
| 1027 sys.exit('The trace file appears to be from Android. Please ' | |
| 1028 'specify output directory to properly symbolize it.') | |
| 1029 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory)) | |
| 1030 | |
| 1031 SymbolizeFiles(symfiles, symbolizer) | |
| 1032 | |
| 1033 | |
| 1034 def OpenTraceFile(file_path, mode): | |
| 1035 if file_path.endswith('.gz'): | |
| 1036 return gzip.open(file_path, mode + 'b') | |
| 1037 else: | |
| 1038 return open(file_path, mode + 't') | |
| 1039 | |
| 1040 | |
| 458 # Suffix used for backup files. | 1041 # Suffix used for backup files. | 
| 459 BACKUP_FILE_TAG = '.BACKUP' | 1042 BACKUP_FILE_TAG = '.BACKUP' | 
| 460 | 1043 | 
| 461 def main(): | 1044 def main(): | 
| 462 parser = argparse.ArgumentParser() | 1045 class MultilineHelpFormatter(argparse.HelpFormatter): | 
| 463 parser.add_argument('file', | 1046 def _split_lines(self, text, width): | 
| 464 help='Trace file to symbolize (.json or .json.gz)') | 1047 extra_lines = [] | 
| 465 parser.add_argument('--no-backup', | 1048 if '\n' in text: | 
| 466 dest='backup', default='true', action='store_false', | 1049 lines = text.splitlines() | 
| 467 help="Don't create {} files".format(BACKUP_FILE_TAG)) | 1050 text = lines[0] | 
| 468 parser.add_argument('--output-directory', | 1051 extra_lines = lines[1:] | 
| 469 help='The path to the build output directory, such ' + | 1052 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \ | 
| 470 'as out/Debug. Only needed for Android.') | 1053 extra_lines | 
| 471 options = parser.parse_args() | |
| 472 | 1054 | 
| 473 trace_file_path = options.file | 1055 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter) | 
| 474 def _OpenTraceFile(mode): | 1056 parser.add_argument( | 
| 475 if trace_file_path.endswith('.gz'): | 1057 'file', | 
| 476 return gzip.open(trace_file_path, mode + 'b') | 1058 help='Trace file to symbolize (.json or .json.gz)') | 
| 477 else: | 1059 | 
| 478 return open(trace_file_path, mode + 't') | 1060 parser.add_argument( | 
| 1061 '--no-backup', dest='backup', default='true', action='store_false', | |
| 1062 help="Don't create {} files".format(BACKUP_FILE_TAG)) | |
| 1063 | |
| 1064 parser.add_argument( | |
| 1065 '--output-directory', | |
| 1066 help='The path to the build output directory, such as out/Debug.') | |
| 479 | 1067 | 
| 480 symbolizer = Symbolizer() | 1068 symbolizer = Symbolizer() | 
| 481 if symbolizer.symbolizer_path is None: | 1069 if symbolizer.symbolizer_path is None: | 
| 482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) | 1070 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) | 
| 483 | 1071 | 
| 1072 options = parser.parse_args() | |
| 1073 | |
| 1074 trace_file_path = options.file | |
| 1075 | |
| 484 print 'Reading trace file...' | 1076 print 'Reading trace file...' | 
| 485 with _OpenTraceFile('r') as trace_file: | 1077 with OpenTraceFile(trace_file_path, 'r') as trace_file: | 
| 486 trace = json.load(trace_file) | 1078 trace = Trace(json.load(trace_file)) | 
| 487 | 1079 | 
| 488 processes = CollectProcesses(trace) | 1080 Symbolize(options, trace, symbolizer) | 
| 489 symfiles = ResolveSymbolizableFiles(processes) | |
| 490 | 1081 | 
| 491 # Android trace files don't have any indication they are from Android. | 1082 if trace.modified: | 
| 492 # So we're checking for Android-specific paths. | 1083 trace.ApplyModifications() | 
| 493 if HaveFilesFromAndroid(symfiles): | |
| 494 if not options.output_directory: | |
| 495 parser.error('The trace file appears to be from Android. Please ' | |
| 496 "specify output directory (e.g. 'out/Debug') to properly " | |
| 497 'symbolize it.') | |
| 498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory)) | |
| 499 | 1084 | 
| 500 if SymbolizeFiles(symfiles, symbolizer): | |
| 501 if options.backup: | 1085 if options.backup: | 
| 502 backup_file_path = trace_file_path + BACKUP_FILE_TAG | 1086 backup_file_path = trace_file_path + BACKUP_FILE_TAG | 
| 503 print 'Backing up trace file to {}...'.format(backup_file_path) | 1087 if os.path.exists(backup_file_path): | 
| 1088 for i in itertools.count(1): | |
| 1089 unique_file_path = '{}{}'.format(backup_file_path, i) | |
| 1090 if not os.path.exists(unique_file_path): | |
| 1091 backup_file_path = unique_file_path | |
| 1092 break | |
| 1093 print 'Backing up trace file to {}'.format(backup_file_path) | |
| 504 os.rename(trace_file_path, backup_file_path) | 1094 os.rename(trace_file_path, backup_file_path) | 
| 505 | 1095 | 
| 506 print 'Updating trace file...' | 1096 print 'Updating the trace file...' | 
| 507 with _OpenTraceFile('w') as trace_file: | 1097 with OpenTraceFile(trace_file_path, 'w') as trace_file: | 
| 508 json.dump(trace, trace_file) | 1098 json.dump(trace.node, trace_file) | 
| 509 else: | 1099 else: | 
| 510 print 'No PCs symbolized - not updating trace file.' | 1100 print 'No modifications were made - not updating the trace file.' | 
| 511 | 1101 | 
| 512 | 1102 | 
| 513 if __name__ == '__main__': | 1103 if __name__ == '__main__': | 
| 514 main() | 1104 main() | 
| OLD | NEW |