Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 #!/usr/bin/env python | 1 #!/usr/bin/env python |
| 2 # Copyright 2016 The Chromium Authors. All rights reserved. | 2 # Copyright 2016 The Chromium Authors. All rights reserved. |
| 3 # Use of this source code is governed by a BSD-style license that can be | 3 # Use of this source code is governed by a BSD-style license that can be |
| 4 # found in the LICENSE file. | 4 # found in the LICENSE file. |
| 5 | 5 |
| 6 """ | |
| 7 This script processes trace files and symbolizes stack frames generated by | |
| 8 Chrome's native heap profiler. | |
| 9 | |
| 10 === Overview === | |
| 11 | |
| 12 Trace file is essentially a giant JSON array of dictionaries (events). | |
| 13 Events have some predefined keys (e.g. 'pid'), but otherwise are free to | |
| 14 have anything inside. Trace file contains events from all Chrome processes | |
| 15 that were sampled during tracing period. | |
| 16 | |
| 17 This script cares only about memory dump events generated with memory-infra | |
| 18 category enabled. | |
| 19 | |
| 20 When Chrome native heap profiling is enabled, some memory dump events | |
| 21 include the following extra information: | |
| 22 | |
| 23 * (Per allocator) Information about live allocations at the moment of the | |
| 24 memory dump (the information includes backtraces, types / categories, | |
| 25 sizes, and counts of allocations). There are several allocators in | |
| 26 Chrome: malloc, blink_gc, and partition_alloc. | |
| 27 | |
| 28 * (Per process) Stack frame tree of all functions that called allocators | |
| 29 above. | |
| 30 | |
| 31 This script does the following: | |
| 32 | |
| 33 1. Parses the given trace file (loads JSON). | |
| 34 2. Finds memory dump events and parses stack frame tree for each process. | |
| 35 3. Finds stack frames that have PC addresses instead of function names. | |
| 36 4. Symbolizes PCs and modifies loaded JSON. | |
| 37 5. Writes modified JSON back to the file. | |
| 38 | |
| 39 === Details === | |
| 40 | |
| 41 There are two formats of heap profiler information: legacy and modern. The | |
| 42 main differences relevant to this script are: | |
| 43 | |
| 44 * In the modern format stack frame tree, type name mapping, and string | |
|
Wez
2017/05/03 00:17:10
nit: " ... modern format the stack frame ..."
Oth
DmitrySkiba
2017/05/04 00:30:55
Done.
| |
| 45 mapping nodes are dumped incrementally. These nodes are dumped in each | |
| 46 memory dump event and carry updates that occurred since the last event. | |
| 47 | |
| 48 For example, let's say that when the first memory dump event is generated | |
| 49 we only know about a function foo() (called from main()) allocating objects | |
| 50 of type "int": | |
| 51 | |
| 52 { | |
| 53 "args": { | |
| 54 "dumps": { | |
| 55 "heaps_v2": { | |
| 56 "maps": { | |
| 57 "nodes": [ | |
| 58 { "id": 1, "name_sid": 1 }, | |
| 59 { "id": 2, "parent": 1, "name_sid": 3 }, | |
| 60 ], | |
| 61 "types": [ | |
| 62 { "id": 1, "name_sid": 2 }, | |
| 63 ], | |
| 64 "strings": [ | |
| 65 { "id": 1, "string": "main()" }, | |
| 66 { "id": 2, "string": "int" }, | |
| 67 { "id": 3, "string": "foo()" }, | |
| 68 ] | |
| 69 }, | |
| 70 "allocators": { ...live allocations per allocator... }, | |
| 71 ... | |
| 72 }, | |
| 73 ... | |
| 74 } | |
| 75 }, | |
| 76 ... | |
| 77 } | |
| 78 | |
| 79 Here: | |
| 80 * 'nodes' node encodes stack frame tree | |
| 81 * 'types' node encodes type name mappings | |
| 82 * 'strings' node encodes string mapping (explained below) | |
| 83 | |
| 84 Then, by the time second memory dump even is generated, we learn about | |
| 85 bar() (called from main()), which also allocated "int" objects. Only the | |
| 86 new information is dumped, i.e. bar() stack frame: | |
| 87 | |
| 88 { | |
| 89 "args": { | |
| 90 "dumps": { | |
| 91 "heaps_v2": { | |
| 92 "maps": { | |
| 93 "nodes": [ | |
| 94 { "id": 2, "parent": 1, "name_sid": 4 }, | |
| 95 ], | |
| 96 "types": [], | |
| 97 "strings": [ | |
| 98 { "id": 4, "string": "bar()" }, | |
| 99 ] | |
| 100 }, | |
| 101 "allocators": { ...live allocations per allocator... }, | |
| 102 ... | |
| 103 }, | |
| 104 ... | |
| 105 } | |
| 106 }, | |
| 107 ... | |
| 108 } | |
| 109 | |
| 110 Note that 'types' node is empty, since there were no updates. All three | |
| 111 nodes ('nodes', types', and 'strings') can be empty if there were no updates | |
| 112 to them. | |
| 113 | |
| 114 For simplicity, when the script updates incremental nodes, it puts updated | |
| 115 content in the first node, and clears all others. I.e. the following stack | |
| 116 frame nodes: | |
| 117 | |
| 118 'nodes': [ | |
| 119 { "id": 1, "name_sid": 1 }, | |
| 120 { "id": 2, "parent": 1, "name_sid": 2 }, | |
| 121 ] | |
| 122 'nodes': [ | |
| 123 { "id": 3, "parent": 2, "name_sid": 3 }, | |
| 124 ] | |
| 125 'nodes': [ | |
| 126 { "id": 4, "parent": 3, "name_sid": 4 }, | |
| 127 { "id": 5, "parent": 1, "name_sid": 5 }, | |
| 128 ] | |
| 129 | |
| 130 After symbolization are written as: | |
| 131 | |
| 132 'nodes': [ | |
| 133 { "id": 1, "name_sid": 1 }, | |
| 134 { "id": 2, "parent": 1, "name_sid": 2 }, | |
| 135 { "id": 3, "parent": 2, "name_sid": 3 }, | |
| 136 { "id": 4, "parent": 3, "name_sid": 4 }, | |
| 137 { "id": 5, "parent": 1, "name_sid": 5 }, | |
| 138 ] | |
| 139 'nodes': [] | |
| 140 'nodes': [] | |
| 141 | |
| 142 | |
| 143 * In contrast, in the legacy format stack frame tree and type mappings are | |
| 144 dumped separately from memory dump events, once per process. | |
| 145 | |
| 146 Here is how trace file with two memory dump events looks like in the | |
| 147 legacy format: | |
| 148 | |
| 149 { | |
| 150 "args": { | |
| 151 "dumps": { | |
| 152 "heaps": { ...live allocations per allocator... }, | |
| 153 ... | |
| 154 } | |
| 155 }, | |
| 156 ... | |
| 157 } | |
| 158 | |
| 159 { | |
| 160 "args": { | |
| 161 "dumps": { | |
| 162 "heaps": { ...live allocations per allocator... }, | |
| 163 ... | |
| 164 } | |
| 165 }, | |
| 166 ... | |
| 167 } | |
| 168 | |
| 169 { | |
| 170 "args": { | |
| 171 "typeNames": { | |
| 172 1: "int", | |
| 173 } | |
| 174 }, | |
| 175 "cat": "__metadata", | |
| 176 "name": "typeNames", | |
| 177 ... | |
| 178 } | |
| 179 | |
| 180 { | |
| 181 "args": { | |
| 182 "stackFrames": { | |
| 183 1: { "name": "main" }, | |
| 184 2: { "name": "foo", "parent": 1 }, | |
| 185 3: { "name": "bar", "parent": 1 }, | |
| 186 } | |
| 187 }, | |
| 188 "cat": "__metadata", | |
| 189 "name": "stackFrames", | |
| 190 ... | |
| 191 } | |
| 192 | |
| 193 | |
| 194 * Another change in the modern format is 'strings' node, which was added | |
| 195 to deduplicate stack frame names (mainly for trace file size reduction). | |
| 196 For consistency 'types' node also uses string mappings. | |
| 197 | |
| 198 | |
| 199 See crbug.com/708930 for more information about the modern format. | |
| 200 """ | |
| 201 | |
| 6 import argparse | 202 import argparse |
| 7 import bisect | 203 import bisect |
| 8 import collections | 204 import collections |
| 9 import gzip | 205 import gzip |
| 206 import itertools | |
| 10 import json | 207 import json |
| 11 import os | 208 import os |
| 12 import re | 209 import re |
| 13 import subprocess | 210 import subprocess |
| 14 import sys | 211 import sys |
| 15 | 212 |
| 16 _SYMBOLS_PATH = os.path.abspath(os.path.join( | 213 _SYMBOLS_PATH = os.path.abspath(os.path.join( |
| 17 os.path.dirname(os.path.realpath(__file__)), | 214 os.path.dirname(os.path.realpath(__file__)), |
| 18 '..', | 215 '..', |
| 19 'third_party', | 216 'third_party', |
| 20 'symbols')) | 217 'symbols')) |
| 21 sys.path.append(_SYMBOLS_PATH) | 218 sys.path.append(_SYMBOLS_PATH) |
| 22 # pylint: disable=import-error | 219 # pylint: disable=import-error |
| 23 import symbols.elf_symbolizer as elf_symbolizer | 220 import symbols.elf_symbolizer as elf_symbolizer |
| 24 | 221 |
| 25 import symbolize_trace_atos_regex | 222 import symbolize_trace_atos_regex |
| 26 import symbolize_trace_macho_reader | 223 import symbolize_trace_macho_reader |
| 27 | 224 |
| 28 | 225 |
| 29 # Relevant trace event phases from Chromium's | 226 class NodeWrapper(object): |
| 30 # src/base/trace_event/common/trace_event_common.h. | 227 """Wraps an event data node(s). |
| 31 TRACE_EVENT_PHASE_METADATA = 'M' | 228 |
| 32 TRACE_EVENT_PHASE_MEMORY_DUMP = 'v' | 229 A node is a reference into a trace event JSON. Wrappers parse nodes to |
| 230 provide convenient APIs and update nodes when asked to propagate changes | |
| 231 back (see ApplyModifications() below). | |
| 232 | |
| 233 Here is an example of legacy metadata event that contains stack frame tree: | |
| 234 | |
| 235 { | |
| 236 "args": { | |
| 237 "stackFrames": { ... } | |
| 238 }, | |
| 239 "cat": "__metadata", | |
| 240 "name": "stackFrames", | |
| 241 "ph": "M", | |
| 242 ... | |
| 243 } | |
| 244 | |
| 245 When this event is encountered, a reference to the "stackFrames" dictionary | |
| 246 is obtained and passed down to a specific wrapped class, which knows how to | |
| 247 parse / update the dictionary. | |
| 248 | |
| 249 There are two parsing patterns depending on whether node is serialized | |
| 250 incrementally: | |
| 251 | |
| 252 * If node is not incremental, then parsing is done by __init__(), | |
| 253 see MemoryMap for an example. | |
| 254 | |
| 255 * If node is incremental, then __init__() does nothing, and ParseNext() | |
| 256 is called when next node (from a next event) is encountered. | |
| 257 | |
| 258 Some wrappers can also modify nodes they parsed. In such cases they have | |
| 259 additional APIs: | |
| 260 | |
| 261 * 'modified' flag, which indicates whether the wrapper was changed. | |
| 262 | |
| 263 * 'ApplyModifications' method, which propagates changes made to the wrapper | |
| 264 back to nodes. Successful invocation of ApplyModifications() resets | |
| 265 'modified' flag. | |
| 266 | |
| 267 """ | |
| 268 | |
| 269 # def __init__(self, node): | |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
Are these commented lines intentional ? I think th
DmitrySkiba
2017/05/04 00:30:56
The thing is that their exact shape is not determi
| |
| 270 # ... | |
| 271 | |
| 272 # def ParseNext(self, node, ...): | |
| 273 # ... | |
| 274 | |
| 275 # @property | |
| 276 # def modified(self): | |
| 277 # ... | |
| 278 | |
| 279 # def ApplyModifications(self, ...): | |
| 280 # ... | |
| 281 | |
| 282 pass | |
| 33 | 283 |
| 34 | 284 |
| 35 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) | 285 class MemoryMap(NodeWrapper): |
| 36 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available | 286 """Wraps 'process_mmaps' node. |
| 37 # via 'name' group. | |
| 38 ANDROID_PATH_MATCHER = re.compile( | |
| 39 r'^/data/(?:' | |
| 40 r'app/[^/]+/lib/[^/]+/|' | |
| 41 r'app-lib/[^/]+/|' | |
| 42 r'data/[^/]+/incremental-install-files/lib/' | |
| 43 r')(?P<name>.*\.so)') | |
| 44 | 287 |
| 45 # Subpath of output path where unstripped libraries are stored. | 288 'process_mmaps' node contains information about file mappings. |
| 46 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped' | |
| 47 | 289 |
| 48 | 290 "process_mmaps": { |
| 49 def FindInSystemPath(binary_name): | 291 "vm_regions": [ |
| 50 paths = os.environ['PATH'].split(os.pathsep) | 292 { |
| 51 for path in paths: | 293 "mf": "<file_path>", |
| 52 binary_path = os.path.join(path, binary_name) | 294 "sa": "<start_address>", |
| 53 if os.path.isfile(binary_path): | 295 "sz": "<size>", |
| 54 return binary_path | 296 ... |
| 55 return None | 297 }, |
| 56 | 298 ... |
| 57 | 299 ] |
| 58 class Symbolizer(object): | 300 } |
| 59 # Encapsulates platform-specific symbolization logic. | 301 """ |
| 60 def __init__(self): | |
| 61 self.is_mac = sys.platform == 'darwin' | |
| 62 self.is_win = sys.platform == 'win32' | |
| 63 if self.is_mac: | |
| 64 self.binary = 'atos' | |
| 65 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher() | |
| 66 elif self.is_win: | |
| 67 self.binary = 'addr2line-pdb.exe' | |
| 68 else: | |
| 69 self.binary = 'addr2line' | |
| 70 self.symbolizer_path = FindInSystemPath(self.binary) | |
| 71 | |
| 72 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name): | |
| 73 def _SymbolizerCallback(sym_info, frames): | |
| 74 # Unwind inline chain to the top. | |
| 75 while sym_info.inlined_by: | |
| 76 sym_info = sym_info.inlined_by | |
| 77 | |
| 78 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name | |
| 79 for frame in frames: | |
| 80 frame.name = symbolized_name | |
| 81 | |
| 82 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path, | |
| 83 self.symbolizer_path, | |
| 84 _SymbolizerCallback, | |
| 85 inlines=True) | |
| 86 | |
| 87 for address, frames in symfile.frames_by_address.iteritems(): | |
| 88 # SymbolizeAsync() asserts that the type of address is int. We operate | |
| 89 # on longs (since they are raw pointers possibly from 64-bit processes). | |
| 90 # It's OK to cast here because we're passing relative PC, which should | |
| 91 # always fit into int. | |
| 92 symbolizer.SymbolizeAsync(int(address), frames) | |
| 93 | |
| 94 symbolizer.Join() | |
| 95 | |
| 96 | |
| 97 def _SymbolizeMac(self, symfile): | |
| 98 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True)) | |
| 99 | |
| 100 # 16 for the address, 2 for "0x", 1 for the space | |
| 101 chars_per_address = 19 | |
| 102 | |
| 103 load_address = (symbolize_trace_macho_reader. | |
| 104 ReadMachOTextLoadAddress(symfile.symbolizable_path)) | |
| 105 assert load_address is not None | |
| 106 | |
| 107 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l', | |
| 108 '0x%x' % load_address, '-o', | |
| 109 symfile.symbolizable_path] | |
| 110 chars_for_other_arguments = len(' '.join(cmd_base)) + 1 | |
| 111 | |
| 112 # The maximum number of inputs that can be processed at once is limited by | |
| 113 # ARG_MAX. This currently evalutes to ~13000 on macOS. | |
| 114 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address | |
| 115 | |
| 116 all_keys = symfile.frames_by_address.keys() | |
| 117 processed_keys_count = 0 | |
| 118 while len(all_keys): | |
| 119 input_count = min(len(all_keys), max_inputs) | |
| 120 keys_to_process = all_keys[0:input_count] | |
| 121 | |
| 122 cmd = list(cmd_base) | |
| 123 cmd.extend([hex(int(x) + load_address) | |
| 124 for x in keys_to_process]) | |
| 125 output_array = subprocess.check_output(cmd).split('\n') | |
| 126 for i in range(len(keys_to_process)): | |
| 127 for frame in (symfile.frames_by_address.values() | |
| 128 [i + processed_keys_count]): | |
| 129 frame.name = self._matcher.Match(output_array[i]) | |
| 130 processed_keys_count += len(keys_to_process) | |
| 131 all_keys = all_keys[input_count:] | |
| 132 | |
| 133 | |
| 134 def _SymbolizeWin(self, symfile): | |
| 135 """Invoke symbolizer binary on windows and write all input in one go. | |
| 136 | |
| 137 Unlike linux, on windows, symbolization talks through a shared system | |
| 138 service that handles communication with the NT symbol servers. This | |
| 139 creates an explicit serialization (and therefor lock contention) of | |
| 140 any process using the symbol API for files do not have a local PDB. | |
| 141 | |
| 142 Thus, even though the windows symbolizer binary can be make command line | |
| 143 compatible with the POSIX addr2line interface, paralellizing the | |
| 144 symbolization does not yield the same performance effects. Running | |
| 145 just one symbolizer seems good enough for now. Can optimize later | |
| 146 if this becomes a bottleneck. | |
| 147 """ | |
| 148 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe', | |
| 149 symfile.symbolizable_path] | |
| 150 | |
| 151 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, | |
| 152 stderr=sys.stderr) | |
| 153 addrs = ["%x" % relative_pc for relative_pc in | |
| 154 symfile.frames_by_address.keys()] | |
| 155 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs)) | |
| 156 stdout_data = stdout_data.split('\n') | |
| 157 | |
| 158 # This is known to be in the same order as stderr_data. | |
| 159 for i, addr in enumerate(addrs): | |
| 160 for frame in symfile.frames_by_address[int(addr, 16)]: | |
| 161 # Output of addr2line with --functions is always 2 outputs per | |
| 162 # symbol, function name followed by source line number. Only grab | |
| 163 # the function name as line info is not always available. | |
| 164 frame.name = stdout_data[i * 2] | |
| 165 | |
| 166 | |
| 167 def Symbolize(self, symfile, unsymbolized_name): | |
| 168 if self.is_mac: | |
| 169 self._SymbolizeMac(symfile) | |
| 170 if self.is_win: | |
| 171 self._SymbolizeWin(symfile) | |
| 172 else: | |
| 173 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name) | |
| 174 | |
| 175 | |
| 176 def IsSymbolizableFile(self, file_path): | |
| 177 if self.is_win: | |
| 178 extension = os.path.splitext(file_path)[1].lower() | |
| 179 return extension in ['.dll', '.exe'] | |
| 180 else: | |
| 181 result = subprocess.check_output(['file', '-0', file_path]) | |
| 182 type_string = result[result.find('\0') + 1:] | |
| 183 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*', | |
| 184 type_string, re.DOTALL)) | |
| 185 | |
| 186 | |
| 187 class ProcessMemoryMaps(object): | |
| 188 """Represents 'process_mmaps' trace file entry.""" | |
| 189 | 302 |
| 190 class Region(object): | 303 class Region(object): |
| 191 def __init__(self, start_address, size, file_path): | 304 def __init__(self, start_address, size, file_path): |
| 192 self._start_address = start_address | 305 self._start_address = start_address |
| 193 self._size = size | 306 self._size = size |
| 194 self._file_path = file_path | 307 self._file_path = file_path |
| 195 | 308 |
| 196 @property | 309 @property |
| 197 def start_address(self): | 310 def start_address(self): |
| 198 return self._start_address | 311 return self._start_address |
| (...skipping 15 matching lines...) Expand all Loading... | |
| 214 return long(self._start_address).__cmp__(long(other._start_address)) | 327 return long(self._start_address).__cmp__(long(other._start_address)) |
| 215 elif isinstance(other, (long, int)): | 328 elif isinstance(other, (long, int)): |
| 216 return long(self._start_address).__cmp__(long(other)) | 329 return long(self._start_address).__cmp__(long(other)) |
| 217 else: | 330 else: |
| 218 raise Exception('Cannot compare with %s' % type(other)) | 331 raise Exception('Cannot compare with %s' % type(other)) |
| 219 | 332 |
| 220 def __repr__(self): | 333 def __repr__(self): |
| 221 return 'Region(0x{:X} - 0x{:X}, {})'.format( | 334 return 'Region(0x{:X} - 0x{:X}, {})'.format( |
| 222 self.start_address, self.end_address, self.file_path) | 335 self.start_address, self.end_address, self.file_path) |
| 223 | 336 |
| 224 def __init__(self, process_mmaps): | 337 def __init__(self, process_mmaps_node): |
| 225 """Parses 'process_mmaps' dictionary.""" | |
| 226 | |
| 227 regions = [] | 338 regions = [] |
| 228 for region_value in process_mmaps['vm_regions']: | 339 for region_node in process_mmaps_node['vm_regions']: |
| 229 regions.append(self.Region( | 340 regions.append(self.Region( |
| 230 long(region_value['sa'], 16), | 341 long(region_node['sa'], 16), |
| 231 long(region_value['sz'], 16), | 342 long(region_node['sz'], 16), |
| 232 region_value['mf'])) | 343 region_node['mf'])) |
| 233 regions.sort() | 344 regions.sort() |
| 234 | 345 |
| 235 # Copy regions without duplicates and check for overlaps. | 346 # Copy regions without duplicates and check for overlaps. |
| 236 self._regions = [] | 347 self._regions = [] |
| 237 previous_region = None | 348 previous_region = None |
| 238 for region in regions: | 349 for region in regions: |
| 239 if previous_region is not None: | 350 if previous_region is not None: |
| 240 if region == previous_region: | 351 if region == previous_region: |
| 241 continue | 352 continue |
| 242 assert region.start_address >= previous_region.end_address, \ | 353 assert region.start_address >= previous_region.end_address, \ |
| 243 'Regions {} and {} overlap.'.format(previous_region, region) | 354 'Regions {} and {} overlap.'.format(previous_region, region) |
| 244 previous_region = region | 355 previous_region = region |
| 245 self._regions.append(region) | 356 self._regions.append(region) |
| 246 | 357 |
| 247 @property | 358 @property |
| 248 def regions(self): | 359 def regions(self): |
| 249 return self._regions | 360 return self._regions |
| 250 | 361 |
| 251 def FindRegion(self, address): | 362 def FindRegion(self, address): |
| 252 """Finds region containing |address|. Returns None if none found.""" | 363 """Finds region containing |address|. Returns None if none found.""" |
| 253 | 364 |
| 254 region_index = bisect.bisect_right(self._regions, address) - 1 | 365 region_index = bisect.bisect_right(self._regions, address) - 1 |
| 255 if region_index >= 0: | 366 if region_index >= 0: |
| 256 region = self._regions[region_index] | 367 region = self._regions[region_index] |
| 257 if address >= region.start_address and address < region.end_address: | 368 if address >= region.start_address and address < region.end_address: |
| 258 return region | 369 return region |
| 259 return None | 370 return None |
| 260 | 371 |
| 261 | 372 |
| 262 class StackFrames(object): | 373 class UnsupportedHeapDumpVersionError(Exception): |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
No need to change it now, but for the future I hon
DmitrySkiba
2017/05/04 00:30:56
I wanted to surface the version that caused the er
| |
| 263 """Represents 'stackFrames' trace file entry.""" | 374 """Helper exception class to signal unsupported heap dump version.""" |
| 264 | 375 |
| 265 class PCFrame(object): | 376 def __init__(self, version): |
| 266 def __init__(self, pc, frame): | 377 message = 'Unsupported heap dump version: {}'.format(version) |
| 378 super(UnsupportedHeapDumpVersionError, self).__init__(message) | |
| 379 | |
| 380 | |
| 381 class StringMap(NodeWrapper): | |
| 382 """Wraps all 'strings' nodes for a process. | |
| 383 | |
| 384 'strings' node contains incremental mappings between integer ids and strings. | |
| 385 | |
| 386 "strings": [ | |
| 387 { | |
| 388 "id": <string_id>, | |
| 389 "string": <string> | |
| 390 }, | |
| 391 ... | |
| 392 ] | |
| 393 """ | |
| 394 | |
| 395 def __init__(self): | |
| 396 self._modified = False | |
| 397 self._strings_nodes = [] | |
| 398 self._string_by_id = {} | |
| 399 self._id_by_string = {} | |
| 400 self._max_string_id = 0 | |
| 401 | |
| 402 @property | |
| 403 def modified(self): | |
| 404 """Returns True if the wrapper was modified (see NodeWrapper).""" | |
| 405 return self._modified | |
| 406 | |
| 407 @property | |
| 408 def string_by_id(self): | |
| 409 return self._string_by_id | |
| 410 | |
| 411 def ParseNext(self, heap_dump_version, strings_node): | |
| 412 """Parses and interns next node (see NodeWrapper).""" | |
| 413 | |
| 414 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
Here I would have just done
assert(heap_dump_vers
DmitrySkiba
2017/05/04 00:30:55
Acknowledged.
| |
| 415 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 416 | |
| 417 self._strings_nodes.append(strings_node) | |
| 418 for string_node in strings_node: | |
| 419 self._Insert(string_node['id'], string_node['string']) | |
| 420 | |
| 421 def Clear(self): | |
| 422 """Clears all string mappings.""" | |
| 423 if self._string_by_id: | |
| 424 self._modified = True | |
| 425 self._string_by_id = {} | |
| 426 self._id_by_string = {} | |
| 427 self._Insert(0, '[null]') | |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
is it intentional that clear does this _Insert and
DmitrySkiba
2017/05/04 00:30:55
__init__() (or rather ParseNext) wraps existing no
| |
| 428 self._max_string_id = 0 | |
| 429 | |
| 430 def AddString(self, string): | |
| 431 """Adds a string (if it doesn't exist) and returns its integer id.""" | |
| 432 string_id = self._id_by_string.get(string) | |
| 433 if string_id is None: | |
| 434 string_id = self._max_string_id + 1 | |
| 435 self._Insert(string_id, string) | |
| 436 self._modified = True | |
| 437 return string_id | |
| 438 | |
| 439 def ApplyModifications(self): | |
| 440 """Propagates modifications back to nodes (see NodeWrapper).""" | |
| 441 if not self.modified: | |
| 442 return | |
| 443 | |
| 444 assert self._strings_nodes, 'no nodes' | |
| 445 | |
| 446 # Serialize into the first node, and clear all others. | |
| 447 | |
| 448 for strings_node in self._strings_nodes: | |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
maybe when you do this add a comment explaining th
DmitrySkiba
2017/05/04 00:30:55
See comments at the top of the file. "Details" exp
| |
| 449 del strings_node[:] | |
| 450 strings_node = self._strings_nodes[0] | |
| 451 for string_id, string in self._string_by_id.iteritems(): | |
| 452 strings_node.append({'id': string_id, 'string': string}) | |
| 453 | |
| 454 self._modified = False | |
| 455 | |
| 456 def _Insert(self, string_id, string): | |
| 457 self._id_by_string[string] = string_id | |
| 458 self._string_by_id[string_id] = string | |
| 459 self._max_string_id = max(self._max_string_id, string_id) | |
| 460 | |
| 461 | |
| 462 class TypeNameMap(NodeWrapper): | |
| 463 """Wraps all 'types' nodes for a process. | |
| 464 | |
| 465 'types' nodes encode mappings between integer type ids and integer | |
| 466 string ids (from 'strings' nodes). | |
| 467 | |
| 468 "types": [ | |
| 469 { | |
| 470 "id": <type_id>, | |
| 471 "name_sid": <name_string_id> | |
| 472 } | |
| 473 ... | |
| 474 ] | |
| 475 | |
| 476 For simplicity string ids are translated into strings during parsing, | |
| 477 and then translated back to ids in ApplyModifications(). | |
| 478 """ | |
| 479 def __init__(self): | |
| 480 self._modified = False | |
| 481 self._type_name_nodes = [] | |
| 482 self._name_by_id = {} | |
| 483 self._id_by_name = {} | |
| 484 self._max_type_id = 0 | |
| 485 | |
| 486 @property | |
| 487 def modified(self): | |
| 488 """Returns True if the wrapper was modified (see NodeWrapper).""" | |
| 489 return self._modified | |
| 490 | |
| 491 @property | |
| 492 def name_by_id(self): | |
| 493 """Returns {id -> name} dict (must not be changed directly).""" | |
| 494 return self._name_by_id | |
| 495 | |
| 496 def ParseNext(self, heap_dump_version, type_name_node, string_map): | |
| 497 """Parses and interns next node (see NodeWrapper). | |
| 498 | |
| 499 |string_map| - A StringMap object to use to translate string ids | |
| 500 to strings. | |
| 501 """ | |
| 502 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
| 503 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 504 | |
| 505 self._type_name_nodes.append(type_name_node) | |
| 506 for type_node in type_name_node: | |
| 507 self._Insert(type_node['id'], | |
| 508 string_map.string_by_id[type_node['name_sid']]) | |
| 509 | |
| 510 def AddType(self, type_name): | |
| 511 """Adds a type name (if it doesn't exist) and returns its id.""" | |
| 512 type_id = self._id_by_name.get(type_name) | |
| 513 if type_id is None: | |
| 514 type_id = self._max_type_id + 1 | |
| 515 self._Insert(type_id, type_name) | |
| 516 self._modified = True | |
| 517 return type_id | |
| 518 | |
| 519 def ApplyModifications(self, string_map, force=False): | |
| 520 """Propagates modifications back to nodes. | |
| 521 | |
| 522 |string_map| - A StringMap object to use to translate strings to ids. | |
| 523 |force| - Whether to propagate changes regardless of 'modified' flag. | |
| 524 """ | |
| 525 if not self.modified and not force: | |
| 526 return | |
| 527 | |
| 528 assert self._type_name_nodes, 'no nodes' | |
| 529 | |
| 530 # Serialize into the first node, and clear all others. | |
| 531 | |
| 532 for types_node in self._type_name_nodes: | |
| 533 del types_node[:] | |
| 534 types_node = self._type_name_nodes[0] | |
| 535 for type_id, type_name in self._name_by_id.iteritems(): | |
| 536 types_node.append({ | |
| 537 'id': type_id, | |
| 538 'name_sid': string_map.AddString(type_name)}) | |
| 539 | |
| 540 self._modified = False | |
| 541 | |
| 542 def _Insert(self, type_id, type_name): | |
| 543 self._id_by_name[type_name] = type_id | |
| 544 self._name_by_id[type_id] = type_name | |
| 545 self._max_type_id = max(self._max_type_id, type_id) | |
| 546 | |
| 547 | |
| 548 class StackFrameMap(NodeWrapper): | |
| 549 """ Wraps stack frame tree nodes for a process. | |
| 550 | |
| 551 For the legacy format this wrapper expects a single 'stackFrames' node | |
| 552 (which comes from metadata event): | |
| 553 | |
| 554 "stackFrames": { | |
| 555 "<frame_id>": { | |
| 556 "name": "<frame_name>" | |
| 557 "parent": "<parent_frame_id>" | |
| 558 }, | |
| 559 ... | |
| 560 } | |
| 561 | |
| 562 For the modern format this wrapper expects several 'nodes' nodes: | |
| 563 | |
| 564 "nodes": [ | |
| 565 { | |
| 566 "id": <frame_id>, | |
| 567 "parent": <parent_frame_id>, | |
| 568 "name_sid": <name_string_id> | |
| 569 }, | |
| 570 ... | |
| 571 ] | |
| 572 | |
| 573 In both formats frame name is a string. Native heap profiler generates | |
| 574 specially formatted frame names (e.g. "pc:10eb78dba") for function | |
| 575 addresses (PCs). Inner Frame class below parses name and extracts PC, | |
| 576 if it's there. | |
| 577 """ | |
| 578 class Frame(object): | |
| 579 def __init__(self, frame_id, name, parent_frame_id): | |
| 267 self._modified = False | 580 self._modified = False |
| 268 self._pc = pc | 581 self._id = frame_id |
| 269 self._frame = frame | 582 self._name = name |
| 583 self._pc = self._ParsePC(name) | |
| 584 self._parent_id = parent_frame_id | |
| 585 self._ext = None | |
| 270 | 586 |
| 271 @property | 587 @property |
| 272 def modified(self): | 588 def modified(self): |
| 589 """Returns True if the frame was modified. | |
| 590 | |
| 591 For example changing frame's name sets this flag (since the change | |
| 592 needs to be propagated back to nodes). | |
| 593 """ | |
| 273 return self._modified | 594 return self._modified |
| 274 | 595 |
| 275 @property | 596 @property |
| 597 def id(self): | |
| 598 """Frame id (integer).""" | |
| 599 return self._id | |
| 600 | |
| 601 @property | |
| 276 def pc(self): | 602 def pc(self): |
| 603 """Parsed (integer) PC of the frame, or None.""" | |
| 277 return self._pc | 604 return self._pc |
| 278 | 605 |
| 279 @property | 606 @property |
| 280 def name(self): | 607 def name(self): |
| 281 return self._frame['name'] | 608 """Name of the frame (see above).""" |
| 609 return self._name | |
| 282 | 610 |
| 283 @name.setter | 611 @name.setter |
| 284 def name(self, value): | 612 def name(self, value): |
| 613 """Changes the name. Doesn't affect value of |pc|.""" | |
| 285 self._modified = True | 614 self._modified = True |
| 286 self._frame['name'] = value | 615 self._name = value |
| 287 | 616 |
| 288 def __init__(self, stack_frames): | 617 @property |
| 289 """Constructs object using 'stackFrames' dictionary.""" | 618 def parent_id(self): |
| 290 self._pc_frames = [] | 619 """Parent frame id (integer).""" |
| 291 for frame in stack_frames.itervalues(): | 620 return self._parent_id |
| 292 pc_frame = self._ParsePCFrame(frame) | 621 |
| 293 if pc_frame: | 622 _PC_TAG = 'pc:' |
| 294 self._pc_frames.append(pc_frame) | 623 |
| 295 | 624 def _ParsePC(self, name): |
| 296 @property | 625 if not name.startswith(self._PC_TAG): |
| 297 def pc_frames(self): | 626 return None |
| 298 return self._pc_frames | 627 return long(name[len(self._PC_TAG):], 16) |
| 628 | |
| 629 def _ClearModified(self): | |
| 630 self._modified = False | |
| 631 | |
| 632 def __init__(self): | |
| 633 self._modified = False | |
| 634 self._heap_dump_version = None | |
| 635 self._stack_frames_nodes = [] | |
| 636 self._frame_by_id = {} | |
| 299 | 637 |
| 300 @property | 638 @property |
| 301 def modified(self): | 639 def modified(self): |
| 302 return any(f.modified for f in self._pc_frames) | 640 """Returns True if the wrapper or any of its frames were modified.""" |
| 303 | 641 return (self._modified or |
| 304 _PC_TAG = 'pc:' | 642 any(f.modified for f in self._frame_by_id.itervalues())) |
| 305 | 643 |
| 306 @classmethod | 644 @property |
| 307 def _ParsePCFrame(self, frame): | 645 def frame_by_id(self): |
| 308 name = frame['name'] | 646 """Returns {id -> frame} dict (must not be modified directly).""" |
| 309 if not name.startswith(self._PC_TAG): | 647 return self._frame_by_id |
| 310 return None | 648 |
| 311 pc = long(name[len(self._PC_TAG):], 16) | 649 def ParseNext(self, heap_dump_version, stack_frames_node, string_map): |
| 312 return self.PCFrame(pc, frame) | 650 """Parses the next stack frames node (see NodeWrapper). |
| 313 | 651 |
| 314 | 652 For the modern format |string_map| is used to translate string ids |
| 315 class Process(object): | 653 to strings. |
| 316 """Holds various bits of information about a process in a trace file.""" | 654 """ |
| 317 | 655 |
| 318 def __init__(self, pid): | 656 frame_by_id = {} |
| 319 self.pid = pid | 657 if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: |
| 320 self.name = None | 658 if self._stack_frames_nodes: |
| 321 self.mmaps = None | 659 raise Exception('Legacy stack frames node is expected only once.') |
| 322 self.stack_frames = None | 660 for frame_id, frame_node in stack_frames_node.iteritems(): |
| 323 | 661 frame = self.Frame(frame_id, |
| 324 | 662 frame_node['name'], |
| 325 def CollectProcesses(trace): | 663 frame_node.get('parent')) |
| 326 """Parses trace dictionary and returns pid->Process map of all processes | 664 frame_by_id[frame.id] = frame |
| 327 suitable for symbolization (which have both mmaps and stack_frames). | 665 else: |
| 666 if heap_dump_version != Trace.HEAP_DUMP_VERSION_1: | |
| 667 raise UnsupportedHeapDumpVersionError(heap_dump_version) | |
| 668 for frame_node in stack_frames_node: | |
| 669 frame = self.Frame(frame_node['id'], | |
| 670 string_map.string_by_id[frame_node['name_sid']], | |
| 671 frame_node.get('parent')) | |
| 672 frame_by_id[frame.id] = frame | |
| 673 | |
| 674 self._heap_dump_version = heap_dump_version | |
| 675 self._stack_frames_nodes.append(stack_frames_node) | |
| 676 | |
| 677 self._frame_by_id = frame_by_id | |
| 678 | |
| 679 def ApplyModifications(self, string_map, force=False): | |
| 680 """Applies modifications back to nodes (see NodeWrapper).""" | |
| 681 | |
| 682 if not self.modified and not force: | |
| 683 return | |
| 684 | |
| 685 assert self._stack_frames_nodes, 'no nodes' | |
| 686 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 687 assert string_map is None, \ | |
| 688 'string_map should not be used with the legacy format' | |
| 689 | |
| 690 # Serialize frames into the first node, clear all others. | |
| 691 | |
| 692 for frames_node in self._stack_frames_nodes: | |
| 693 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 694 frames_node.clear() | |
| 695 else: | |
| 696 del frames_node[:] | |
| 697 | |
| 698 frames_node = self._stack_frames_nodes[0] | |
| 699 for frame in self._frame_by_id.itervalues(): | |
| 700 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 701 frame_node = {'name': frame.name} | |
| 702 frames_node[frame.id] = frame_node | |
| 703 else: | |
| 704 frame_node = { | |
| 705 'id': frame.id, | |
| 706 'name_sid': string_map.AddString(frame.name) | |
| 707 } | |
| 708 frames_node.append(frame_node) | |
| 709 if frame.parent_id is not None: | |
| 710 frame_node['parent'] = frame.parent_id | |
| 711 frame._ClearModified() | |
| 712 | |
| 713 self._modified = False | |
| 714 | |
| 715 | |
| 716 class Trace(NodeWrapper): | |
| 717 """Wrapper for the root trace node (i.e. the trace JSON itself). | |
| 718 | |
| 719 This wrapper parses select nodes from memory-infra events and groups | |
| 720 parsed data per-process (see inner Process class below). | |
| 328 """ | 721 """ |
| 329 | 722 |
| 330 process_map = {} | 723 # Indicates legacy heap dump format. |
| 331 | 724 HEAP_DUMP_VERSION_LEGACY = 'Legacy' |
| 332 # Android traces produced via 'chrome://inspect/?tracing#devices' are | 725 |
| 333 # just list of events. | 726 # Indicates variation of a modern heap dump format. |
| 334 events = trace if isinstance(trace, list) else trace['traceEvents'] | 727 HEAP_DUMP_VERSION_1 = 1 |
| 335 for event in events: | 728 |
| 336 name = event.get('name') | 729 class Process(object): |
| 337 if not name: | 730 """Collection of per-process data and wrappers.""" |
| 338 continue | 731 |
| 339 | 732 def __init__(self, pid): |
| 340 pid = event['pid'] | 733 self._pid = pid |
| 341 process = process_map.get(pid) | 734 self._name = None |
| 342 if process is None: | 735 self._memory_map = None |
| 343 process = Process(pid) | 736 self._stack_frame_map = StackFrameMap() |
| 344 process_map[pid] = process | 737 self._type_name_map = TypeNameMap() |
| 345 | 738 self._string_map = StringMap() |
| 346 phase = event['ph'] | 739 self._heap_dump_version = None |
| 347 if phase == TRACE_EVENT_PHASE_METADATA: | 740 |
| 348 if name == 'process_name': | 741 @property |
| 349 process.name = event['args']['name'] | 742 def modified(self): |
| 350 elif name == 'stackFrames': | 743 return self._stack_frame_map.modified or self._type_name_map.modified |
| 351 process.stack_frames = StackFrames(event['args']['stackFrames']) | 744 |
| 352 elif phase == TRACE_EVENT_PHASE_MEMORY_DUMP: | 745 @property |
| 353 process_mmaps = event['args']['dumps'].get('process_mmaps') | 746 def pid(self): |
| 354 if process_mmaps: | 747 return self._pid |
| 355 # TODO(dskiba): this parses all process_mmaps, but retains only the | 748 |
| 356 # last one. We need to parse only once (lazy parsing?). | 749 @property |
| 357 process.mmaps = ProcessMemoryMaps(process_mmaps) | 750 def name(self): |
| 358 | 751 return self._name |
| 359 return [p for p in process_map.itervalues() if p.mmaps and p.stack_frames] | 752 |
| 753 @property | |
| 754 def unique_name(self): | |
| 755 """Returns string that includes both process name and its pid.""" | |
| 756 name = self._name if self._name else 'UnnamedProcess' | |
| 757 return '{}({})'.format(name, self._pid) | |
| 758 | |
| 759 @property | |
| 760 def memory_map(self): | |
| 761 return self._memory_map | |
| 762 | |
| 763 @property | |
| 764 def stack_frame_map(self): | |
| 765 return self._stack_frame_map | |
| 766 | |
| 767 @property | |
| 768 def type_name_map(self): | |
| 769 return self._type_name_map | |
| 770 | |
| 771 def ApplyModifications(self): | |
| 772 """Calls ApplyModifications() on contained wrappers.""" | |
| 773 if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY: | |
| 774 self._stack_frame_map.ApplyModifications(None) | |
| 775 else: | |
| 776 if self._stack_frame_map.modified or self._type_name_map.modified: | |
| 777 self._string_map.Clear() | |
| 778 self._stack_frame_map.ApplyModifications(self._string_map, force=True) | |
| 779 self._type_name_map.ApplyModifications(self._string_map, force=True) | |
| 780 self._string_map.ApplyModifications() | |
| 781 | |
| 782 def __init__(self, trace_node): | |
| 783 self._trace_node = trace_node | |
| 784 self._processes = [] | |
| 785 self._heap_dump_version = None | |
| 786 | |
| 787 # Misc per-process information needed only during parsing. | |
| 788 class ProcessExt(object): | |
| 789 def __init__(self, pid): | |
| 790 self.process = Trace.Process(pid) | |
| 791 self.mapped_entry_names = set() | |
| 792 self.process_mmaps_node = None | |
| 793 self.seen_strings_node = False | |
| 794 | |
| 795 process_ext_by_pid = {} | |
| 796 | |
| 797 # Android traces produced via 'chrome://inspect/?tracing#devices' are | |
| 798 # just list of events. | |
| 799 events = trace_node if isinstance(trace_node, list) \ | |
| 800 else trace_node['traceEvents'] | |
| 801 for event in events: | |
| 802 name = event.get('name') | |
| 803 if not name: | |
| 804 continue | |
| 805 | |
| 806 pid = event['pid'] | |
| 807 process_ext = process_ext_by_pid.get(pid) | |
| 808 if process_ext is None: | |
| 809 process_ext = ProcessExt(pid) | |
| 810 process_ext_by_pid[pid] = process_ext | |
| 811 process = process_ext.process | |
| 812 | |
| 813 phase = event['ph'] | |
| 814 if phase == self._EVENT_PHASE_METADATA: | |
| 815 if name == 'process_name': | |
| 816 process._name = event['args']['name'] | |
| 817 elif name == 'stackFrames': | |
| 818 process._stack_frame_map.ParseNext( | |
| 819 self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY), | |
| 820 event['args']['stackFrames'], | |
| 821 process._string_map) | |
| 822 elif phase == self._EVENT_PHASE_MEMORY_DUMP: | |
| 823 dumps = event['args']['dumps'] | |
| 824 process_mmaps = dumps.get('process_mmaps') | |
| 825 if process_mmaps: | |
| 826 # We want the most recent memory map, so parsing happens later | |
| 827 # once we finished reading all events. | |
| 828 process_ext.process_mmaps_node = process_mmaps | |
| 829 heaps = dumps.get('heaps_v2') | |
| 830 if heaps: | |
| 831 version = self._UseHeapDumpVersion(heaps['version']) | |
| 832 maps = heaps.get('maps') | |
| 833 if maps: | |
| 834 process_ext.mapped_entry_names.update(maps.iterkeys()) | |
| 835 types = maps.get('types') | |
| 836 stack_frames = maps.get('nodes') | |
| 837 strings = maps.get('strings') | |
| 838 if (strings is None and (types or stack_frames) | |
| 839 and not process_ext.seen_strings_node): | |
| 840 # ApplyModifications() for TypeNameMap and StackFrameMap puts | |
| 841 # everything into the first node and depends on StringMap. So | |
| 842 # we need to make sure that 'strings' node is there if any of | |
| 843 # other two nodes present. | |
| 844 strings = [] | |
| 845 maps['strings'] = strings | |
| 846 if strings is not None: | |
| 847 process_ext.seen_strings_node = True | |
| 848 process._string_map.ParseNext(version, strings) | |
| 849 if types: | |
| 850 process._type_name_map.ParseNext( | |
| 851 version, types, process._string_map) | |
| 852 if stack_frames: | |
| 853 process._stack_frame_map.ParseNext( | |
| 854 version, stack_frames, process._string_map) | |
| 855 | |
| 856 self._processes = [] | |
| 857 for pe in process_ext_by_pid.itervalues(): | |
| 858 pe.process._heap_dump_version = self._heap_dump_version | |
| 859 if pe.process_mmaps_node: | |
| 860 # Now parse the most recent memory map. | |
| 861 pe.process._memory_map = MemoryMap(pe.process_mmaps_node) | |
| 862 self._processes.append(pe.process) | |
| 863 | |
| 864 @property | |
| 865 def node(self): | |
| 866 """Root node (that was passed to the __init__).""" | |
| 867 return self._trace_node | |
| 868 | |
| 869 @property | |
| 870 def modified(self): | |
| 871 """Returns True if trace file needs to be updated. | |
| 872 | |
| 873 Before writing trace JSON back to a file ApplyModifications() needs | |
| 874 to be called. | |
| 875 """ | |
| 876 return any(p.modified for p in self._processes) | |
| 877 | |
| 878 @property | |
| 879 def processes(self): | |
| 880 return self._processes | |
| 881 | |
| 882 @property | |
| 883 def heap_dump_version(self): | |
| 884 return self._heap_dump_version | |
| 885 | |
| 886 def ApplyModifications(self): | |
| 887 """Propagates modifications back to the trace JSON.""" | |
| 888 for process in self._processes: | |
| 889 process.ApplyModifications() | |
| 890 assert not self.modified, 'still modified' | |
| 891 | |
| 892 # Relevant trace event phases from Chromium's | |
| 893 # src/base/trace_event/common/trace_event_common.h. | |
| 894 _EVENT_PHASE_METADATA = 'M' | |
| 895 _EVENT_PHASE_MEMORY_DUMP = 'v' | |
| 896 | |
| 897 def _UseHeapDumpVersion(self, version): | |
| 898 if self._heap_dump_version is None: | |
| 899 self._heap_dump_version = version | |
| 900 return version | |
| 901 elif self._heap_dump_version != version: | |
| 902 raise Exception( | |
| 903 ("Inconsistent trace file: first saw '{}' heap dump version, " | |
| 904 "then '{}'.").format(self._heap_dump_version, version)) | |
| 905 else: | |
| 906 return version | |
| 360 | 907 |
| 361 | 908 |
| 362 class SymbolizableFile(object): | 909 class SymbolizableFile(object): |
| 363 """Holds file path, addresses to symbolize and stack frames to update. | 910 """Holds file path, addresses to symbolize and stack frames to update. |
| 364 | 911 |
| 365 This class is a link between ELFSymbolizer and a trace file: it specifies | 912 This class is a link between ELFSymbolizer and a trace file: it specifies |
| 366 what to symbolize (addresses) and what to update with the symbolization | 913 what to symbolize (addresses) and what to update with the symbolization |
| 367 result (frames). | 914 result (frames). |
| 368 """ | 915 """ |
| 369 def __init__(self, file_path): | 916 def __init__(self, file_path): |
| 370 self.path = file_path | 917 self.path = file_path |
| 371 self.symbolizable_path = file_path # path to use for symbolization | 918 self.symbolizable_path = file_path # path to use for symbolization |
| 372 self.frames_by_address = collections.defaultdict(list) | 919 self.frames_by_address = collections.defaultdict(list) |
| 373 | 920 |
| 374 | 921 |
| 375 def ResolveSymbolizableFiles(processes): | 922 def ResolveSymbolizableFiles(processes): |
| 376 """Resolves and groups PCs into list of SymbolizableFiles. | 923 """Resolves and groups PCs into list of SymbolizableFiles. |
| 377 | 924 |
| 378 As part of the grouping process, this function resolves PC from each stack | 925 As part of the grouping process, this function resolves PC from each stack |
| 379 frame to the corresponding mmap region. Stack frames that failed to resolve | 926 frame to the corresponding mmap region. Stack frames that failed to resolve |
| 380 are symbolized with '<unresolved>'. | 927 are symbolized with '<unresolved>'. |
| 381 """ | 928 """ |
| 382 symfile_by_path = {} | 929 symfile_by_path = {} |
| 383 for process in processes: | 930 for process in processes: |
| 384 for frame in process.stack_frames.pc_frames: | 931 if not process.memory_map: |
| 385 region = process.mmaps.FindRegion(frame.pc) | 932 continue |
| 933 for frame in process.stack_frame_map.frame_by_id.itervalues(): | |
| 934 if frame.pc is None: | |
| 935 continue | |
| 936 region = process.memory_map.FindRegion(frame.pc) | |
| 386 if region is None: | 937 if region is None: |
| 387 frame.name = '<unresolved>' | 938 frame.name = '<unresolved>' |
| 388 continue | 939 continue |
| 389 | 940 |
| 390 symfile = symfile_by_path.get(region.file_path) | 941 symfile = symfile_by_path.get(region.file_path) |
| 391 if symfile is None: | 942 if symfile is None: |
| 392 symfile = SymbolizableFile(region.file_path) | 943 symfile = SymbolizableFile(region.file_path) |
| 393 symfile_by_path[symfile.path] = symfile | 944 symfile_by_path[symfile.path] = symfile |
| 394 | 945 |
| 395 relative_pc = frame.pc - region.start_address | 946 relative_pc = frame.pc - region.start_address |
| 396 symfile.frames_by_address[relative_pc].append(frame) | 947 symfile.frames_by_address[relative_pc].append(frame) |
| 397 return symfile_by_path.values() | 948 return symfile_by_path.values() |
| 398 | 949 |
| 399 | 950 |
| 951 def FindInSystemPath(binary_name): | |
| 952 paths = os.environ['PATH'].split(os.pathsep) | |
| 953 for path in paths: | |
| 954 binary_path = os.path.join(path, binary_name) | |
| 955 if os.path.isfile(binary_path): | |
| 956 return binary_path | |
| 957 return None | |
| 958 | |
| 959 | |
| 960 class Symbolizer(object): | |
| 961 """Encapsulates platform-specific symbolization logic.""" | |
| 962 | |
| 963 def __init__(self): | |
| 964 self.is_mac = sys.platform == 'darwin' | |
| 965 self.is_win = sys.platform == 'win32' | |
| 966 if self.is_mac: | |
| 967 self.binary = 'atos' | |
| 968 self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher() | |
| 969 elif self.is_win: | |
| 970 self.binary = 'addr2line-pdb.exe' | |
| 971 else: | |
| 972 self.binary = 'addr2line' | |
| 973 self.symbolizer_path = FindInSystemPath(self.binary) | |
| 974 | |
| 975 def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name): | |
| 976 def _SymbolizerCallback(sym_info, frames): | |
| 977 # Unwind inline chain to the top. | |
| 978 while sym_info.inlined_by: | |
| 979 sym_info = sym_info.inlined_by | |
| 980 | |
| 981 symbolized_name = sym_info.name if sym_info.name else unsymbolized_name | |
| 982 for frame in frames: | |
| 983 frame.name = symbolized_name | |
| 984 frame.ext.source_path = sym_info.source_path | |
| 985 | |
| 986 symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path, | |
| 987 self.symbolizer_path, | |
| 988 _SymbolizerCallback, | |
| 989 inlines=True) | |
| 990 | |
| 991 for address, frames in symfile.frames_by_address.iteritems(): | |
| 992 # SymbolizeAsync() asserts that the type of address is int. We operate | |
| 993 # on longs (since they are raw pointers possibly from 64-bit processes). | |
| 994 # It's OK to cast here because we're passing relative PC, which should | |
| 995 # always fit into int. | |
| 996 symbolizer.SymbolizeAsync(int(address), frames) | |
| 997 | |
| 998 symbolizer.Join() | |
| 999 | |
| 1000 | |
| 1001 def _SymbolizeMac(self, symfile): | |
| 1002 chars_max = int(subprocess.check_output("getconf ARG_MAX", shell=True)) | |
| 1003 | |
| 1004 # 16 for the address, 2 for "0x", 1 for the space | |
| 1005 chars_per_address = 19 | |
| 1006 | |
| 1007 load_address = (symbolize_trace_macho_reader. | |
| 1008 ReadMachOTextLoadAddress(symfile.symbolizable_path)) | |
| 1009 assert load_address is not None | |
| 1010 | |
| 1011 cmd_base = [self.symbolizer_path, '-arch', 'x86_64', '-l', | |
| 1012 '0x%x' % load_address, '-o', | |
| 1013 symfile.symbolizable_path] | |
| 1014 chars_for_other_arguments = len(' '.join(cmd_base)) + 1 | |
| 1015 | |
| 1016 # The maximum number of inputs that can be processed at once is limited by | |
| 1017 # ARG_MAX. This currently evalutes to ~13000 on macOS. | |
| 1018 max_inputs = (chars_max - chars_for_other_arguments) / chars_per_address | |
| 1019 | |
| 1020 all_keys = symfile.frames_by_address.keys() | |
| 1021 processed_keys_count = 0 | |
| 1022 while len(all_keys): | |
| 1023 input_count = min(len(all_keys), max_inputs) | |
| 1024 keys_to_process = all_keys[0:input_count] | |
| 1025 cmd = list(cmd_base) | |
| 1026 cmd.extend([hex(int(x) + load_address) | |
| 1027 for x in keys_to_process]) | |
| 1028 output_array = subprocess.check_output(cmd).split('\n') | |
| 1029 for i in range(len(keys_to_process)): | |
| 1030 for frame in (symfile.frames_by_address.values() | |
| 1031 [i + processed_keys_count]): | |
| 1032 frame.name = self._matcher.Match(output_array[i]) | |
| 1033 processed_keys_count += len(keys_to_process) | |
| 1034 all_keys = all_keys[input_count:] | |
| 1035 | |
| 1036 def _SymbolizeWin(self, symfile): | |
| 1037 """Invoke symbolizer binary on windows and write all input in one go. | |
| 1038 | |
| 1039 Unlike linux, on windows, symbolization talks through a shared system | |
| 1040 service that handles communication with the NT symbol servers. This | |
| 1041 creates an explicit serialization (and therefor lock contention) of | |
| 1042 any process using the symbol API for files do not have a local PDB. | |
| 1043 | |
| 1044 Thus, even though the windows symbolizer binary can be make command line | |
| 1045 compatible with the POSIX addr2line interface, paralellizing the | |
| 1046 symbolization does not yield the same performance effects. Running | |
| 1047 just one symbolizer seems good enough for now. Can optimize later | |
| 1048 if this becomes a bottleneck. | |
| 1049 """ | |
| 1050 cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe', | |
| 1051 symfile.symbolizable_path] | |
| 1052 | |
| 1053 proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, | |
| 1054 stderr=sys.stderr) | |
| 1055 addrs = ["%x" % relative_pc for relative_pc in | |
| 1056 symfile.frames_by_address.keys()] | |
| 1057 (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs)) | |
| 1058 stdout_data = stdout_data.split('\n') | |
| 1059 | |
| 1060 # This is known to be in the same order as stderr_data. | |
| 1061 for i, addr in enumerate(addrs): | |
| 1062 for frame in symfile.frames_by_address[int(addr, 16)]: | |
| 1063 # Output of addr2line with --functions is always 2 outputs per | |
| 1064 # symbol, function name followed by source line number. Only grab | |
| 1065 # the function name as line info is not always available. | |
| 1066 frame.name = stdout_data[i * 2] | |
| 1067 | |
| 1068 def Symbolize(self, symfile, unsymbolized_name): | |
| 1069 if self.is_mac: | |
| 1070 self._SymbolizeMac(symfile) | |
| 1071 elif self.is_win: | |
| 1072 self._SymbolizeWin(symfile) | |
| 1073 else: | |
| 1074 self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name) | |
| 1075 | |
| 1076 def IsSymbolizableFile(self, file_path): | |
| 1077 if self.is_win: | |
| 1078 extension = os.path.splitext(file_path)[1].lower() | |
| 1079 return extension in ['.dll', '.exe'] | |
| 1080 else: | |
| 1081 result = subprocess.check_output(['file', '-0', file_path]) | |
| 1082 type_string = result[result.find('\0') + 1:] | |
| 1083 return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*', | |
| 1084 type_string, re.DOTALL)) | |
| 1085 | |
| 1086 | |
| 400 def SymbolizeFiles(symfiles, symbolizer): | 1087 def SymbolizeFiles(symfiles, symbolizer): |
| 401 """Symbolizes each file in the given list of SymbolizableFiles | 1088 """Symbolizes each file in the given list of SymbolizableFiles |
| 402 and updates stack frames with symbolization results.""" | 1089 and updates stack frames with symbolization results.""" |
| 1090 | |
| 1091 if not symfiles: | |
| 1092 print 'Nothing to symbolize.' | |
| 1093 return | |
| 1094 | |
| 403 print 'Symbolizing...' | 1095 print 'Symbolizing...' |
| 404 | 1096 |
| 405 def _SubPrintf(message, *args): | 1097 def _SubPrintf(message, *args): |
| 406 print (' ' + message).format(*args) | 1098 print (' ' + message).format(*args) |
| 407 | 1099 |
| 408 symbolized = False | |
| 409 for symfile in symfiles: | 1100 for symfile in symfiles: |
| 410 unsymbolized_name = '<{}>'.format( | 1101 unsymbolized_name = '<{}>'.format( |
| 411 symfile.path if symfile.path else 'unnamed') | 1102 symfile.path if symfile.path else 'unnamed') |
| 412 | 1103 |
| 413 problem = None | 1104 problem = None |
| 414 if not os.path.isabs(symfile.symbolizable_path): | 1105 if not os.path.isabs(symfile.symbolizable_path): |
| 415 problem = 'not a file' | 1106 problem = 'not a file' |
| 416 elif not os.path.isfile(symfile.symbolizable_path): | 1107 elif not os.path.isfile(symfile.symbolizable_path): |
| 417 problem = "file doesn't exist" | 1108 problem = "file doesn't exist" |
| 418 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): | 1109 elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path): |
| 419 problem = 'file is not symbolizable' | 1110 problem = 'file is not symbolizable' |
| 420 if problem: | 1111 if problem: |
| 421 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", | 1112 _SubPrintf("Won't symbolize {} PCs for '{}': {}.", |
| 422 len(symfile.frames_by_address), | 1113 len(symfile.frames_by_address), |
| 423 symfile.symbolizable_path, | 1114 symfile.symbolizable_path, |
| 424 problem) | 1115 problem) |
| 425 for frames in symfile.frames_by_address.itervalues(): | 1116 for frames in symfile.frames_by_address.itervalues(): |
| 426 for frame in frames: | 1117 for frame in frames: |
| 427 frame.name = unsymbolized_name | 1118 frame.name = unsymbolized_name |
| 428 continue | 1119 continue |
| 429 | 1120 |
| 430 _SubPrintf('Symbolizing {} PCs from {}...', | 1121 _SubPrintf('Symbolizing {} PCs from {}...', |
| 431 len(symfile.frames_by_address), | 1122 len(symfile.frames_by_address), |
| 432 symfile.path) | 1123 symfile.path) |
| 433 | 1124 |
| 434 symbolizer.Symbolize(symfile, unsymbolized_name) | 1125 symbolizer.Symbolize(symfile, unsymbolized_name) |
| 435 symbolized = True | |
| 436 | 1126 |
| 437 return symbolized | 1127 |
| 1128 # Matches Android library paths, supports both K (/data/app-lib/<>/lib.so) | |
| 1129 # as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available | |
| 1130 # via 'name' group. | |
| 1131 ANDROID_PATH_MATCHER = re.compile( | |
| 1132 r'^/data/(?:' | |
| 1133 r'app/[^/]+/lib/[^/]+/|' | |
| 1134 r'app-lib/[^/]+/|' | |
| 1135 r'data/[^/]+/incremental-install-files/lib/' | |
| 1136 r')(?P<name>.*\.so)') | |
| 1137 | |
| 1138 # Subpath of output path where unstripped libraries are stored. | |
| 1139 ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped' | |
| 438 | 1140 |
| 439 | 1141 |
| 440 def HaveFilesFromAndroid(symfiles): | 1142 def HaveFilesFromAndroid(symfiles): |
| 441 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) | 1143 return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles) |
| 442 | 1144 |
| 443 | 1145 |
| 444 def RemapAndroidFiles(symfiles, output_path): | 1146 def RemapAndroidFiles(symfiles, output_path): |
| 445 for symfile in symfiles: | 1147 for symfile in symfiles: |
| 446 match = ANDROID_PATH_MATCHER.match(symfile.path) | 1148 match = ANDROID_PATH_MATCHER.match(symfile.path) |
| 447 if match: | 1149 if match: |
| 448 name = match.group('name') | 1150 name = match.group('name') |
| 449 symfile.symbolizable_path = os.path.join( | 1151 symfile.symbolizable_path = os.path.join( |
| 450 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) | 1152 output_path, ANDROID_UNSTRIPPED_SUBPATH, name) |
| 451 else: | 1153 else: |
| 452 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). | 1154 # Clobber file path to trigger "not a file" problem in SymbolizeFiles(). |
| 453 # Without this, files won't be symbolized with "file not found" problem, | 1155 # Without this, files won't be symbolized with "file not found" problem, |
| 454 # which is not accurate. | 1156 # which is not accurate. |
| 455 symfile.symbolizable_path = 'android://{}'.format(symfile.path) | 1157 symfile.symbolizable_path = 'android://{}'.format(symfile.path) |
| 456 | 1158 |
| 457 | 1159 |
| 1160 def Symbolize(options, trace, symbolizer): | |
| 1161 symfiles = ResolveSymbolizableFiles(trace.processes) | |
| 1162 | |
| 1163 # Android trace files don't have any indication they are from Android. | |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:04
As per discussion offline, maybe specify: traces c
fmeawad
2017/05/03 18:19:34
look for os-name in the metadata
| |
| 1164 # So we're checking for Android-specific paths. | |
| 1165 if HaveFilesFromAndroid(symfiles): | |
| 1166 if not options.output_directory: | |
| 1167 sys.exit('The trace file appears to be from Android. Please ' | |
| 1168 'specify output directory to properly symbolize it.') | |
| 1169 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory)) | |
| 1170 | |
| 1171 SymbolizeFiles(symfiles, symbolizer) | |
| 1172 | |
| 1173 | |
| 1174 def OpenTraceFile(file_path, mode): | |
| 1175 if file_path.endswith('.gz'): | |
| 1176 return gzip.open(file_path, mode + 'b') | |
| 1177 else: | |
| 1178 return open(file_path, mode + 't') | |
| 1179 | |
| 1180 | |
| 458 # Suffix used for backup files. | 1181 # Suffix used for backup files. |
| 459 BACKUP_FILE_TAG = '.BACKUP' | 1182 BACKUP_FILE_TAG = '.BACKUP' |
| 460 | 1183 |
| 461 def main(): | 1184 def main(): |
| 462 parser = argparse.ArgumentParser() | 1185 class MultilineHelpFormatter(argparse.HelpFormatter): |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
For a one file python script, having a custom form
DmitrySkiba
2017/05/04 00:30:55
Hmm, actually this is a leftover from a version th
| |
| 463 parser.add_argument('file', | 1186 def _split_lines(self, text, width): |
| 464 help='Trace file to symbolize (.json or .json.gz)') | 1187 extra_lines = [] |
| 465 parser.add_argument('--no-backup', | 1188 if '\n' in text: |
| 466 dest='backup', default='true', action='store_false', | 1189 lines = text.splitlines() |
| 467 help="Don't create {} files".format(BACKUP_FILE_TAG)) | 1190 text = lines[0] |
| 468 parser.add_argument('--output-directory', | 1191 extra_lines = lines[1:] |
| 469 help='The path to the build output directory, such ' + | 1192 return super(MultilineHelpFormatter, self)._split_lines(text, width) + \ |
| 470 'as out/Debug. Only needed for Android.') | 1193 extra_lines |
| 471 options = parser.parse_args() | |
| 472 | 1194 |
| 473 trace_file_path = options.file | 1195 parser = argparse.ArgumentParser(formatter_class=MultilineHelpFormatter) |
| 474 def _OpenTraceFile(mode): | 1196 parser.add_argument( |
| 475 if trace_file_path.endswith('.gz'): | 1197 'file', |
| 476 return gzip.open(trace_file_path, mode + 'b') | 1198 help='Trace file to symbolize (.json or .json.gz)') |
| 477 else: | 1199 |
| 478 return open(trace_file_path, mode + 't') | 1200 parser.add_argument( |
| 1201 '--no-backup', dest='backup', default='true', action='store_false', | |
| 1202 help="Don't create {} files".format(BACKUP_FILE_TAG)) | |
| 1203 | |
| 1204 parser.add_argument( | |
| 1205 '--output-directory', | |
| 1206 help='The path to the build output directory, such as out/Debug.') | |
| 479 | 1207 |
| 480 symbolizer = Symbolizer() | 1208 symbolizer = Symbolizer() |
| 481 if symbolizer.symbolizer_path is None: | 1209 if symbolizer.symbolizer_path is None: |
| 482 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) | 1210 sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary) |
| 483 | 1211 |
| 1212 options = parser.parse_args() | |
| 1213 | |
| 1214 trace_file_path = options.file | |
| 1215 | |
| 484 print 'Reading trace file...' | 1216 print 'Reading trace file...' |
| 485 with _OpenTraceFile('r') as trace_file: | 1217 with OpenTraceFile(trace_file_path, 'r') as trace_file: |
| 486 trace = json.load(trace_file) | 1218 trace = Trace(json.load(trace_file)) |
| 487 | 1219 |
| 488 processes = CollectProcesses(trace) | 1220 Symbolize(options, trace, symbolizer) |
| 489 symfiles = ResolveSymbolizableFiles(processes) | |
| 490 | 1221 |
| 491 # Android trace files don't have any indication they are from Android. | 1222 if trace.modified: |
| 492 # So we're checking for Android-specific paths. | 1223 trace.ApplyModifications() |
| 493 if HaveFilesFromAndroid(symfiles): | |
| 494 if not options.output_directory: | |
| 495 parser.error('The trace file appears to be from Android. Please ' | |
| 496 "specify output directory (e.g. 'out/Debug') to properly " | |
| 497 'symbolize it.') | |
| 498 RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory)) | |
| 499 | 1224 |
| 500 if SymbolizeFiles(symfiles, symbolizer): | |
| 501 if options.backup: | 1225 if options.backup: |
| 502 backup_file_path = trace_file_path + BACKUP_FILE_TAG | 1226 backup_file_path = trace_file_path + BACKUP_FILE_TAG |
| 503 print 'Backing up trace file to {}...'.format(backup_file_path) | 1227 if os.path.exists(backup_file_path): |
|
Primiano Tucci (use gerrit)
2017/05/03 17:25:05
isn't this a bit too much and really worth the com
DmitrySkiba
2017/05/04 00:30:55
Also a leftover from a previous versions. Removed.
| |
| 1228 for i in itertools.count(1): | |
| 1229 unique_file_path = '{}{}'.format(backup_file_path, i) | |
| 1230 if not os.path.exists(unique_file_path): | |
| 1231 backup_file_path = unique_file_path | |
| 1232 break | |
| 1233 print 'Backing up trace file to {}'.format(backup_file_path) | |
| 504 os.rename(trace_file_path, backup_file_path) | 1234 os.rename(trace_file_path, backup_file_path) |
| 505 | 1235 |
| 506 print 'Updating trace file...' | 1236 print 'Updating the trace file...' |
| 507 with _OpenTraceFile('w') as trace_file: | 1237 with OpenTraceFile(trace_file_path, 'w') as trace_file: |
| 508 json.dump(trace, trace_file) | 1238 json.dump(trace.node, trace_file) |
| 509 else: | 1239 else: |
| 510 print 'No PCs symbolized - not updating trace file.' | 1240 print 'No modifications were made - not updating the trace file.' |
| 511 | 1241 |
| 512 | 1242 |
| 513 if __name__ == '__main__': | 1243 if __name__ == '__main__': |
| 514 main() | 1244 main() |
| OLD | NEW |