tracing/bin/symbolize_trace - Issue 2950723002: Add an end-to-end test for symbolize_trace on macOS.

Unified Diff: tracing/bin/symbolize_trace

Issue 2950723002: Add an end-to-end test for symbolize_trace on macOS. (Closed)

Patch Set: Created 3 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: tracing/bin/symbolize_trace

diff --git a/tracing/bin/symbolize_trace b/tracing/bin/symbolize_trace

deleted file mode 100755

index 50bb2c5c69edd3b856b7a71f9946215d764dfc40..0000000000000000000000000000000000000000

--- a/tracing/bin/symbolize_trace

+++ /dev/null

@@ -1,1413 +0,0 @@

-#!/usr/bin/env python

-# Use of this source code is governed by a BSD-style license that can be

-# found in the LICENSE file.

-"""

-This script processes trace files and symbolizes stack frames generated by

-Chrome's native heap profiler. This script assumes that the Chrome binary

-referenced in the trace contains symbols, and is the same binary used to emit

-the trace.

-=== Overview ===

-Trace file is essentially a giant JSON array of dictionaries (events).

-Events have some predefined keys (e.g. 'pid'), but otherwise are free to

-have anything inside. Trace file contains events from all Chrome processes

-that were sampled during tracing period.

-This script cares only about memory dump events generated with memory-infra

-category enabled.

-When Chrome native heap profiling is enabled, some memory dump events

-include the following extra information:

-* (Per allocator) Information about live allocations at the moment of the

- memory dump (the information includes backtraces, types / categories,

- sizes, and counts of allocations). There are several allocators in

- Chrome: e.g. malloc, blink_gc, partition_alloc.

-* (Per process) Stack frame tree of all functions that called allocators

- above.

-This script does the following:

-1. Parses the given trace file (loads JSON).

-2. Finds memory dump events and parses stack frame tree for each process.

-3. Finds stack frames that have PC addresses instead of function names.

-4. Symbolizes PCs and modifies loaded JSON.

-5. Writes modified JSON back to the file.

-The script supports trace files from the following platforms:

- * Android (the script itself must be run on Linux)

- * Linux

- * macOS

- * Windows

-Important note - the script doesn't check that it symbolizes same binaries

-that were used at the time trace was taken. I.e. if you take a trace, change

-and rebuild Chrome binaries, the script will blindly use the new binaries.

-=== Details ===

-There are two formats of heap profiler information: legacy and modern. The

-main differences relevant to this script are:

-* In the modern format the stack frame tree, type name mapping, and string

- mapping nodes are dumped incrementally. These nodes are dumped in each

- memory dump event and carry updates that occurred since the last event.

- For example, let's say that when the first memory dump event is generated

- we only know about a function foo() (called from main()) allocating objects

- of type "int":

- {

- "args": {

- "dumps": {

- "heaps_v2": {

- "maps": {

- "nodes": [

- { "id": 1, "name_sid": 1 },

- { "id": 2, "parent": 1, "name_sid": 3 },

- ],

- "types": [

- { "id": 1, "name_sid": 2 },

- ],

- "strings": [

- { "id": 1, "string": "main()" },

- { "id": 2, "string": "int" },

- { "id": 3, "string": "foo()" },

- ]

- },

- "allocators": { ...live allocations per allocator... },

- ...

- },

- ...

- }

- },

- ...

- }

- Here:

- * 'nodes' node encodes stack frame tree

- * 'types' node encodes type name mappings

- * 'strings' node encodes string mapping (explained below)

- Then, by the time second memory dump even is generated, we learn about

- bar() (called from main()), which also allocated "int" objects. Only the

- new information is dumped, i.e. bar() stack frame:

- {

- "args": {

- "dumps": {

- "heaps_v2": {

- "maps": {

- "nodes": [

- { "id": 2, "parent": 1, "name_sid": 4 },

- ],

- "types": [],

- "strings": [

- { "id": 4, "string": "bar()" },

- ]

- },

- "allocators": { ...live allocations per allocator... },

- ...

- },

- ...

- }

- },

- ...

- }

- Note that 'types' node is empty, since there were no updates. All three

- nodes ('nodes', types', and 'strings') can be empty if there were no updates

- to them.

- For simplicity, when the script updates incremental nodes, it puts updated

- content in the first node, and clears all others. I.e. the following stack

- frame nodes:

- 'nodes': [

- { "id": 1, "name_sid": 1 },

- { "id": 2, "parent": 1, "name_sid": 2 },

- ]

- 'nodes': [

- { "id": 3, "parent": 2, "name_sid": 3 },

- ]

- 'nodes': [

- { "id": 4, "parent": 3, "name_sid": 4 },

- { "id": 5, "parent": 1, "name_sid": 5 },

- ]

- After symbolization are written as:

- 'nodes': [

- { "id": 1, "name_sid": 1 },

- { "id": 2, "parent": 1, "name_sid": 2 },

- { "id": 3, "parent": 2, "name_sid": 3 },

- { "id": 4, "parent": 3, "name_sid": 4 },

- { "id": 5, "parent": 1, "name_sid": 5 },

- ]

- 'nodes': []

-* In contrast, in the legacy format stack frame tree and type mappings are

- dumped separately from memory dump events, once per process.

- Here is how trace file with two memory dump events looks like in the

- legacy format:

- {

- "args": {

- "dumps": {

- "heaps": { ...live allocations per allocator... },

- ...

- }

- },

- ...

- }

- {

- "args": {

- "dumps": {

- "heaps": { ...live allocations per allocator... },

- ...

- }

- },

- ...

- }

- {

- "args": {

- "typeNames": {

- 1: "int",

- }

- },

- "cat": "__metadata",

- "name": "typeNames",

- ...

- }

- {

- "args": {

- "stackFrames": {

- 1: { "name": "main" },

- 2: { "name": "foo", "parent": 1 },

- 3: { "name": "bar", "parent": 1 },

- }

- },

- "cat": "__metadata",

- "name": "stackFrames",

- ...

- }

-* Another change in the modern format is 'strings' node, which was added

- to deduplicate stack frame names (mainly for trace file size reduction).

- For consistency 'types' node also uses string mappings.

-See crbug.com/708930 for more information about the modern format.

-"""

-import argparse

-import bisect

-import collections

-import gzip

-import itertools

-import json

-import os

-import re

-import shutil

-import subprocess

-import sys

-import tarfile

-import zipfile

-import tempfile

-_SYMBOLS_PATH = os.path.abspath(os.path.join(

- os.path.dirname(os.path.realpath(__file__)),

- '..',

- 'third_party',

- 'symbols'))

-sys.path.append(_SYMBOLS_PATH)

-# pylint: disable=import-error

-import symbols.elf_symbolizer as elf_symbolizer

-import symbolize_trace_atos_regex

-import symbolize_trace_macho_reader

-_PY_UTILS_PATH = os.path.abspath(os.path.join(

- os.path.dirname(os.path.realpath(__file__)),

- '..',

- 'common',

- 'py_utils'))

-sys.path.append(_PY_UTILS_PATH)

-# pylint: disable=import-error

-import py_utils.cloud_storage as cloud_storage

-class NodeWrapper(object):

- """Wraps an event data node(s).

- A node is a reference into a trace event JSON. Wrappers parse nodes to

- provide convenient APIs and update nodes when asked to propagate changes

- back (see ApplyModifications() below).

- Here is an example of legacy metadata event that contains stack frame tree:

- {

- "args": {

- "stackFrames": { ... }

- },

- "cat": "__metadata",

- "name": "stackFrames",

- "ph": "M",

- ...

- }

- When this event is encountered, a reference to the "stackFrames" dictionary

- is obtained and passed down to a specific wrapped class, which knows how to

- parse / update the dictionary.

- There are two parsing patterns depending on whether node is serialized

- incrementally:

- * If node is not incremental, then parsing is done by __init__(),

- see MemoryMap for an example.

- * If node is incremental, then __init__() does nothing, and instead

- ParseNext() method is called when next node (from a next event) is

- encountered.

- Some wrappers can also modify nodes they parsed. In such cases they have

- additional APIs:

- * 'modified' flag, which indicates whether the wrapper was changed.

- * 'ApplyModifications' method, which propagates changes made to the wrapper

- back to nodes. Successful invocation of ApplyModifications() resets

- 'modified' flag.

- """

- pass

-class MemoryMap(NodeWrapper):

- """Wraps 'process_mmaps' node.

- 'process_mmaps' node contains information about file mappings.

- "process_mmaps": {

- "vm_regions": [

- {

- "mf": "<file_path>",

- "sa": "<start_address>",

- "sz": "<size>",

- ...

- },

- ...

- ]

- }

- """

- class Region(object):

- def __init__(self, start_address, size, file_path):

- self._start_address = start_address

- self._size = size

- self._file_path = file_path

- @property

- def start_address(self):

- return self._start_address

- @property

- def end_address(self):

- return self._start_address + self._size

- @property

- def size(self):

- return self._size

- @property

- def file_path(self):

- return self._file_path

- def __cmp__(self, other):

- if isinstance(other, type(self)):

- other_start_address = other._start_address

- elif isinstance(other, (long, int)):

- other_start_address = other

- else:

- raise Exception('Cannot compare with %s' % type(other))

- if self._start_address < other_start_address:

- return -1

- elif self._start_address > other_start_address:

- return 1

- else:

- return 0

- def __repr__(self):

- return 'Region(0x{:X} - 0x{:X}, {})'.format(

- self.start_address, self.end_address, self.file_path)

- def __init__(self, process_mmaps_node):

- regions = []

- for region_node in process_mmaps_node['vm_regions']:

- regions.append(self.Region(

- long(region_node['sa'], 16),

- long(region_node['sz'], 16),

- region_node['mf']))

- regions.sort()

- # Copy regions without duplicates and check for overlaps.

- self._regions = []

- previous_region = None

- for region in regions:

- if previous_region is not None:

- if region == previous_region:

- continue

- assert region.start_address >= previous_region.end_address, \

- 'Regions {} and {} overlap.'.format(previous_region, region)

- previous_region = region

- self._regions.append(region)

- @property

- def regions(self):

- return self._regions

- def FindRegion(self, address):

- """Finds region containing |address|. Returns None if none found."""

- region_index = bisect.bisect_right(self._regions, address) - 1

- if region_index >= 0:

- region = self._regions[region_index]

- if address >= region.start_address and address < region.end_address:

- return region

- return None

-class UnsupportedHeapDumpVersionError(Exception):

- """Helper exception class to signal unsupported heap dump version."""

- def __init__(self, version):

- message = 'Unsupported heap dump version: {}'.format(version)

- super(UnsupportedHeapDumpVersionError, self).__init__(message)

-class StringMap(NodeWrapper):

- """Wraps all 'strings' nodes for a process.

- 'strings' node contains incremental mappings between integer ids and strings.

- "strings": [

- {

- "id": <string_id>,

- "string": <string>

- },

- ...

- ]

- """

- def __init__(self):

- self._modified = False

- self._strings_nodes = []

- self._string_by_id = {}

- self._id_by_string = {}

- self._max_string_id = 0

- @property

- def modified(self):

- """Returns True if the wrapper was modified (see NodeWrapper)."""

- return self._modified

- @property

- def string_by_id(self):

- return self._string_by_id

- def ParseNext(self, heap_dump_version, strings_node):

- """Parses and interns next node (see NodeWrapper)."""

- if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

- raise UnsupportedHeapDumpVersionError(heap_dump_version)

- self._strings_nodes.append(strings_node)

- for string_node in strings_node:

- self._Insert(string_node['id'], string_node['string'])

- def Clear(self):

- """Clears all string mappings."""

- if self._string_by_id:

- self._modified = True

- # ID #0 means 'no entry' and must always be present. Carry it over.

- null_string = self._string_by_id[0]

- self._string_by_id = {}

- self._id_by_string = {}

- self._Insert(0, null_string)

- self._max_string_id = 0

- def AddString(self, string):

- """Adds a string (if it doesn't exist) and returns its integer id."""

- string_id = self._id_by_string.get(string)

- if string_id is None:

- string_id = self._max_string_id + 1

- self._Insert(string_id, string)

- self._modified = True

- return string_id

- def ApplyModifications(self):

- """Propagates modifications back to nodes (see NodeWrapper)."""

- if not self.modified:

- return

- assert self._strings_nodes, 'no nodes'

- # Serialize into the first node, and clear all others.

- for strings_node in self._strings_nodes:

- del strings_node[:]

- strings_node = self._strings_nodes[0]

- for string_id, string in self._string_by_id.iteritems():

- strings_node.append({'id': string_id, 'string': string})

- self._modified = False

- def _Insert(self, string_id, string):

- self._id_by_string[string] = string_id

- self._string_by_id[string_id] = string

- self._max_string_id = max(self._max_string_id, string_id)

-class TypeNameMap(NodeWrapper):

- """Wraps all 'types' nodes for a process.

- 'types' nodes encode mappings between integer type ids and integer

- string ids (from 'strings' nodes).

- "types": [

- {

- "id": <type_id>,

- "name_sid": <name_string_id>

- }

- ...

- ]

- For simplicity string ids are translated into strings during parsing,

- and then translated back to ids in ApplyModifications().

- """

- def __init__(self):

- self._modified = False

- self._type_name_nodes = []

- self._name_by_id = {}

- self._id_by_name = {}

- self._max_type_id = 0

- @property

- def modified(self):

- """Returns True if the wrapper was modified (see NodeWrapper)."""

- return self._modified

- @property

- def name_by_id(self):

- """Returns {id -> name} dict (must not be changed directly)."""

- return self._name_by_id

- def ParseNext(self, heap_dump_version, type_name_node, string_map):

- """Parses and interns next node (see NodeWrapper).

- |string_map| - A StringMap object to use to translate string ids

- to strings.

- """

- if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

- raise UnsupportedHeapDumpVersionError(heap_dump_version)

- self._type_name_nodes.append(type_name_node)

- for type_node in type_name_node:

- self._Insert(type_node['id'],

- string_map.string_by_id[type_node['name_sid']])

- def AddType(self, type_name):

- """Adds a type name (if it doesn't exist) and returns its id."""

- type_id = self._id_by_name.get(type_name)

- if type_id is None:

- type_id = self._max_type_id + 1

- self._Insert(type_id, type_name)

- self._modified = True

- return type_id

- def ApplyModifications(self, string_map, force=False):

- """Propagates modifications back to nodes.

- |string_map| - A StringMap object to use to translate strings to ids.

- |force| - Whether to propagate changes regardless of 'modified' flag.

- """

- if not self.modified and not force:

- return

- assert self._type_name_nodes, 'no nodes'

- # Serialize into the first node, and clear all others.

- for types_node in self._type_name_nodes:

- del types_node[:]

- types_node = self._type_name_nodes[0]

- for type_id, type_name in self._name_by_id.iteritems():

- types_node.append({

- 'id': type_id,

- 'name_sid': string_map.AddString(type_name)})

- self._modified = False

- def _Insert(self, type_id, type_name):

- self._id_by_name[type_name] = type_id

- self._name_by_id[type_id] = type_name

- self._max_type_id = max(self._max_type_id, type_id)

-class StackFrameMap(NodeWrapper):

- """ Wraps stack frame tree nodes for a process.

- For the legacy format this wrapper expects a single 'stackFrames' node

- (which comes from metadata event):

- "stackFrames": {

- "<frame_id>": {

- "name": "<frame_name>"

- "parent": "<parent_frame_id>"

- },

- ...

- }

- For the modern format this wrapper expects several 'nodes' nodes:

- "nodes": [

- {

- "id": <frame_id>,

- "parent": <parent_frame_id>,

- "name_sid": <name_string_id>

- },

- ...

- ]

- In both formats frame name is a string. Native heap profiler generates

- specially formatted frame names (e.g. "pc:10eb78dba") for function

- addresses (PCs). Inner Frame class below parses name and extracts PC,

- if it's there.

- """

- class Frame(object):

- def __init__(self, frame_id, name, parent_frame_id):

- self._modified = False

- self._id = frame_id

- self._name = name

- self._pc = self._ParsePC(name)

- self._parent_id = parent_frame_id

- self._ext = None

- @property

- def modified(self):

- """Returns True if the frame was modified.

- For example changing frame's name sets this flag (since the change

- needs to be propagated back to nodes).

- """

- return self._modified

- @property

- def id(self):

- """Frame id (integer)."""

- return self._id

- @property

- def pc(self):

- """Parsed (integer) PC of the frame, or None."""

- return self._pc

- @property

- def name(self):

- """Name of the frame (see above)."""

- return self._name

- @name.setter

- def name(self, value):

- """Changes the name. Doesn't affect value of |pc|."""

- self._modified = True

- self._name = value

- @property

- def parent_id(self):

- """Parent frame id (integer)."""

- return self._parent_id

- _PC_TAG = 'pc:'

- def _ParsePC(self, name):

- if not name.startswith(self._PC_TAG):

- return None

- return long(name[len(self._PC_TAG):], 16)

- def _ClearModified(self):

- self._modified = False

- def __init__(self):

- self._modified = False

- self._heap_dump_version = None

- self._stack_frames_nodes = []

- self._frame_by_id = {}

- @property

- def modified(self):

- """Returns True if the wrapper or any of its frames were modified."""

- return (self._modified or

- any(f.modified for f in self._frame_by_id.itervalues()))

- @property

- def frame_by_id(self):

- """Returns {id -> frame} dict (must not be modified directly)."""

- return self._frame_by_id

- def ParseNext(self, heap_dump_version, stack_frames_node, string_map):

- """Parses the next stack frames node (see NodeWrapper).

- For the modern format |string_map| is used to translate string ids

- to strings.

- """

- frame_by_id = {}

- if heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

- if self._stack_frames_nodes:

- raise Exception('Legacy stack frames node is expected only once.')

- for frame_id, frame_node in stack_frames_node.iteritems():

- frame = self.Frame(frame_id,

- frame_node['name'],

- frame_node.get('parent'))

- frame_by_id[frame.id] = frame

- else:

- if heap_dump_version != Trace.HEAP_DUMP_VERSION_1:

- raise UnsupportedHeapDumpVersionError(heap_dump_version)

- for frame_node in stack_frames_node:

- frame = self.Frame(frame_node['id'],

- string_map.string_by_id[frame_node['name_sid']],

- frame_node.get('parent'))

- frame_by_id[frame.id] = frame

- self._heap_dump_version = heap_dump_version

- self._stack_frames_nodes.append(stack_frames_node)

- self._frame_by_id.update(frame_by_id)

- def ApplyModifications(self, string_map, force=False):

- """Applies modifications back to nodes (see NodeWrapper)."""

- if not self.modified and not force:

- return

- assert self._stack_frames_nodes, 'no nodes'

- if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

- assert string_map is None, \

- 'string_map should not be used with the legacy format'

- # Serialize frames into the first node, clear all others.

- for frames_node in self._stack_frames_nodes:

- if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

- frames_node.clear()

- else:

- del frames_node[:]

- frames_node = self._stack_frames_nodes[0]

- for frame in self._frame_by_id.itervalues():

- if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

- frame_node = {'name': frame.name}

- frames_node[frame.id] = frame_node

- else:

- frame_node = {

- 'id': frame.id,

- 'name_sid': string_map.AddString(frame.name)

- }

- frames_node.append(frame_node)

- if frame.parent_id is not None:

- frame_node['parent'] = frame.parent_id

- frame._ClearModified()

- self._modified = False

-class Trace(NodeWrapper):

- """Wrapper for the root trace node (i.e. the trace JSON itself).

- This wrapper parses select nodes from memory-infra events and groups

- parsed data per-process (see inner Process class below).

- """

- # Indicates legacy heap dump format.

- HEAP_DUMP_VERSION_LEGACY = 'Legacy'

- # Indicates variation of a modern heap dump format.

- HEAP_DUMP_VERSION_1 = 1

- class Process(object):

- """Collection of per-process data and wrappers."""

- def __init__(self, pid):

- self._pid = pid

- self._name = None

- self._memory_map = None

- self._stack_frame_map = StackFrameMap()

- self._type_name_map = TypeNameMap()

- self._string_map = StringMap()

- self._heap_dump_version = None

- @property

- def modified(self):

- return self._stack_frame_map.modified or self._type_name_map.modified

- @property

- def pid(self):

- return self._pid

- @property

- def name(self):

- return self._name

- @property

- def unique_name(self):

- """Returns string that includes both process name and its pid."""

- name = self._name if self._name else 'UnnamedProcess'

- return '{}({})'.format(name, self._pid)

- @property

- def memory_map(self):

- return self._memory_map

- @property

- def stack_frame_map(self):

- return self._stack_frame_map

- @property

- def type_name_map(self):

- return self._type_name_map

- def ApplyModifications(self):

- """Calls ApplyModifications() on contained wrappers."""

- if self._heap_dump_version == Trace.HEAP_DUMP_VERSION_LEGACY:

- self._stack_frame_map.ApplyModifications(None)

- else:

- if self._stack_frame_map.modified or self._type_name_map.modified:

- self._string_map.Clear()

- self._stack_frame_map.ApplyModifications(self._string_map, force=True)

- self._type_name_map.ApplyModifications(self._string_map, force=True)

- self._string_map.ApplyModifications()

- def __init__(self, trace_node):

- self._trace_node = trace_node

- self._processes = []

- self._heap_dump_version = None

- self._version = None

- self._is_chromium = True

- self._is_64bit = False

- self._is_win = False

- self._is_mac = False

- # Misc per-process information needed only during parsing.

- class ProcessExt(object):

- def __init__(self, pid):

- self.process = Trace.Process(pid)

- self.mapped_entry_names = set()

- self.process_mmaps_node = None

- self.seen_strings_node = False

- process_ext_by_pid = {}

- if isinstance(trace_node, dict):

- metadata = trace_node['metadata']

- product_version = metadata['product-version']

- # product-version has the form "Chrome/60.0.3103.0"

- self._version = product_version.split('/', 1)[-1]

- command_line = metadata['command_line']

- self._is_win = re.search('windows', metadata['os-name'] , re.IGNORECASE)

- self._is_mac = re.search('mac', metadata['os-name'] , re.IGNORECASE)

- if self._is_win:

- self._is_chromium = (

- not re.search('Chrome SxS\\\\Application\\\\chrome.exe', command_line,

- re.IGNORECASE) and

- not re.search('Chrome\\\\Application\\\\chrome.exe', command_line,

- re.IGNORECASE))

- if self._is_mac:

- self._is_chromium = re.search('chromium', command_line, re.IGNORECASE)

- self._is_64bit = (

- re.search('x86_64', metadata['os-arch'] , re.IGNORECASE) and

- not re.search('WOW64', metadata['user-agent'] , re.IGNORECASE))

- # Android traces produced via 'chrome://inspect/?tracing#devices' are

- # just list of events.

- events = trace_node if isinstance(trace_node, list) \

- else trace_node['traceEvents']

- for event in events:

- name = event.get('name')

- if not name:

- continue

- pid = event['pid']

- process_ext = process_ext_by_pid.get(pid)

- if process_ext is None:

- process_ext = ProcessExt(pid)

- process_ext_by_pid[pid] = process_ext

- process = process_ext.process

- phase = event['ph']

- if phase == self._EVENT_PHASE_METADATA:

- if name == 'process_name':

- process._name = event['args']['name']

- elif name == 'stackFrames':

- process._stack_frame_map.ParseNext(

- self._UseHeapDumpVersion(self.HEAP_DUMP_VERSION_LEGACY),

- event['args']['stackFrames'],

- process._string_map)

- elif phase == self._EVENT_PHASE_MEMORY_DUMP:

- dumps = event['args']['dumps']

- process_mmaps = dumps.get('process_mmaps')

- if process_mmaps:

- # We want the most recent memory map, so parsing happens later

- # once we finished reading all events.

- process_ext.process_mmaps_node = process_mmaps

- heaps = dumps.get('heaps_v2')

- if heaps:

- version = self._UseHeapDumpVersion(heaps['version'])

- maps = heaps.get('maps')

- if maps:

- process_ext.mapped_entry_names.update(maps.iterkeys())

- types = maps.get('types')

- stack_frames = maps.get('nodes')

- strings = maps.get('strings')

- if (strings is None and (types or stack_frames)

- and not process_ext.seen_strings_node):

- # ApplyModifications() for TypeNameMap and StackFrameMap puts

- # everything into the first node and depends on StringMap. So

- # we need to make sure that 'strings' node is there if any of

- # other two nodes present.

- strings = []

- maps['strings'] = strings

- if strings is not None:

- process_ext.seen_strings_node = True

- process._string_map.ParseNext(version, strings)

- if types:

- process._type_name_map.ParseNext(

- version, types, process._string_map)

- if stack_frames:

- process._stack_frame_map.ParseNext(

- version, stack_frames, process._string_map)

- self._processes = []

- for pe in process_ext_by_pid.itervalues():

- pe.process._heap_dump_version = self._heap_dump_version

- if pe.process_mmaps_node:

- # Now parse the most recent memory map.

- pe.process._memory_map = MemoryMap(pe.process_mmaps_node)

- self._processes.append(pe.process)

- @property

- def node(self):

- """Root node (that was passed to the __init__)."""

- return self._trace_node

- @property

- def modified(self):

- """Returns True if trace file needs to be updated.

- Before writing trace JSON back to a file ApplyModifications() needs

- to be called.

- """

- return any(p.modified for p in self._processes)

- @property

- def processes(self):

- return self._processes

- @property

- def heap_dump_version(self):

- return self._heap_dump_version

- @property

- def version(self):

- return self._version

- @property

- def is_chromium(self):

- return self._is_chromium

- @property

- def is_mac(self):

- return self._is_mac

- @property

- def is_win(self):

- return self._is_win

- @property

- def is_64bit(self):

- return self._is_64bit

- def ApplyModifications(self):

- """Propagates modifications back to the trace JSON."""

- for process in self._processes:

- process.ApplyModifications()

- assert not self.modified, 'still modified'

- # Relevant trace event phases from Chromium's

- # src/base/trace_event/common/trace_event_common.h.

- _EVENT_PHASE_METADATA = 'M'

- _EVENT_PHASE_MEMORY_DUMP = 'v'

- def _UseHeapDumpVersion(self, version):

- if self._heap_dump_version is None:

- self._heap_dump_version = version

- return version

- elif self._heap_dump_version != version:

- raise Exception(

- ("Inconsistent trace file: first saw '{}' heap dump version, "

- "then '{}'.").format(self._heap_dump_version, version))

- else:

- return version

-class SymbolizableFile(object):

- """Holds file path, addresses to symbolize and stack frames to update.

- This class is a link between ELFSymbolizer and a trace file: it specifies

- what to symbolize (addresses) and what to update with the symbolization

- result (frames).

- """

- def __init__(self, file_path):

- self.path = file_path

- self.symbolizable_path = file_path # path to use for symbolization

- self.frames_by_address = collections.defaultdict(list)

-def ResolveSymbolizableFiles(processes):

- """Resolves and groups PCs into list of SymbolizableFiles.

- As part of the grouping process, this function resolves PC from each stack

- frame to the corresponding mmap region. Stack frames that failed to resolve

- are symbolized with '<unresolved>'.

- """

- symfile_by_path = {}

- for process in processes:

- if not process.memory_map:

- continue

- for frame in process.stack_frame_map.frame_by_id.itervalues():

- if frame.pc is None:

- continue

- region = process.memory_map.FindRegion(frame.pc)

- if region is None:

- frame.name = '<unresolved>'

- continue

- symfile = symfile_by_path.get(region.file_path)

- if symfile is None:

- symfile = SymbolizableFile(region.file_path)

- symfile_by_path[symfile.path] = symfile

- relative_pc = frame.pc - region.start_address

- symfile.frames_by_address[relative_pc].append(frame)

- return symfile_by_path.values()

-def FindInSystemPath(binary_name):

- paths = os.environ['PATH'].split(os.pathsep)

- for path in paths:

- binary_path = os.path.join(path, binary_name)

- if os.path.isfile(binary_path):

- return binary_path

- return None

-class Symbolizer(object):

- """Encapsulates platform-specific symbolization logic."""

- def __init__(self):

- self.is_mac = sys.platform == 'darwin'

- self.is_win = sys.platform == 'win32'

- if self.is_mac:

- self.binary = 'atos'

- self._matcher = symbolize_trace_atos_regex.AtosRegexMatcher()

- elif self.is_win:

- self.binary = 'addr2line-pdb.exe'

- else:

- self.binary = 'addr2line'

- self.symbolizer_path = FindInSystemPath(self.binary)

- def _SymbolizeLinuxAndAndroid(self, symfile, unsymbolized_name):

- def _SymbolizerCallback(sym_info, frames):

- # Unwind inline chain to the top.

- while sym_info.inlined_by:

- sym_info = sym_info.inlined_by

- symbolized_name = sym_info.name if sym_info.name else unsymbolized_name

- for frame in frames:

- frame.name = symbolized_name

- symbolizer = elf_symbolizer.ELFSymbolizer(symfile.symbolizable_path,

- self.symbolizer_path,

- _SymbolizerCallback,

- inlines=True)

- for address, frames in symfile.frames_by_address.iteritems():

- # SymbolizeAsync() asserts that the type of address is int. We operate

- # on longs (since they are raw pointers possibly from 64-bit processes).

- # It's OK to cast here because we're passing relative PC, which should

- # always fit into int.

- symbolizer.SymbolizeAsync(int(address), frames)

- symbolizer.Join()

- def _SymbolizeMac(self, symfile):

- load_address = (symbolize_trace_macho_reader.

- ReadMachOTextLoadAddress(symfile.symbolizable_path))

- assert load_address is not None

- address_os_file, address_file_path = tempfile.mkstemp()

- try:

- with os.fdopen(address_os_file, 'w') as address_file:

- for address in symfile.frames_by_address.iterkeys():

- address_file.write('{:x} '.format(address + load_address))

- cmd = [self.symbolizer_path, '-arch', 'x86_64', '-l',

- '0x%x' % load_address, '-o', symfile.symbolizable_path,

- '-f', address_file_path]

- output_array = subprocess.check_output(cmd).split('\n')

- for i, frames in enumerate(symfile.frames_by_address.itervalues()):

- symbolized_name = self._matcher.Match(output_array[i])

- for frame in frames:

- frame.name = symbolized_name

- finally:

- os.remove(address_file_path)

- def _SymbolizeWin(self, symfile):

- """Invoke symbolizer binary on windows and write all input in one go.

- Unlike linux, on windows, symbolization talks through a shared system

- service that handles communication with the NT symbol servers. This

- creates an explicit serialization (and therefor lock contention) of

- any process using the symbol API for files do not have a local PDB.

- Thus, even though the windows symbolizer binary can be make command line

- compatible with the POSIX addr2line interface, parallelizing the

- symbolization does not yield the same performance effects. Running

- just one symbolizer seems good enough for now. Can optimize later

- if this becomes a bottleneck.

- """

- cmd = [self.symbolizer_path, '--functions', '--demangle', '--exe',

- symfile.symbolizable_path]

- proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE,

- stderr=sys.stderr)

- addrs = ["%x" % relative_pc for relative_pc in

- symfile.frames_by_address.keys()]

- (stdout_data, stderr_data) = proc.communicate('\n'.join(addrs))

- stdout_data = stdout_data.split('\n')

- # This is known to be in the same order as stderr_data.

- for i, addr in enumerate(addrs):

- for frame in symfile.frames_by_address[int(addr, 16)]:

- # Output of addr2line with --functions is always 2 outputs per

- # symbol, function name followed by source line number. Only grab

- # the function name as line info is not always available.

- frame.name = stdout_data[i * 2]

- def Symbolize(self, symfile, unsymbolized_name):

- if self.is_mac:

- self._SymbolizeMac(symfile)

- elif self.is_win:

- self._SymbolizeWin(symfile)

- else:

- self._SymbolizeLinuxAndAndroid(symfile, unsymbolized_name)

- def IsSymbolizableFile(self, file_path):

- if self.is_win:

- extension = os.path.splitext(file_path)[1].lower()

- return extension in ['.dll', '.exe']

- else:

- result = subprocess.check_output(['file', '-0', file_path])

- type_string = result[result.find('\0') + 1:]

- return bool(re.match(r'.*(ELF|Mach-O) (32|64)-bit\b.*',

- type_string, re.DOTALL))

-def SymbolizeFiles(symfiles, symbolizer):

- """Symbolizes each file in the given list of SymbolizableFiles

- and updates stack frames with symbolization results."""

- if not symfiles:

- print 'Nothing to symbolize.'

- return

- print 'Symbolizing...'

- def _SubPrintf(message, *args):

- print (' ' + message).format(*args)

- for symfile in symfiles:

- unsymbolized_name = '<{}>'.format(

- symfile.path if symfile.path else 'unnamed')

- problem = None

- if not os.path.isabs(symfile.symbolizable_path):

- problem = 'not a file'

- elif not os.path.isfile(symfile.symbolizable_path):

- problem = "file doesn't exist"

- elif not symbolizer.IsSymbolizableFile(symfile.symbolizable_path):

- problem = 'file is not symbolizable'

- if problem:

- _SubPrintf("Won't symbolize {} PCs for '{}': {}.",

- len(symfile.frames_by_address),

- symfile.symbolizable_path,

- problem)

- for frames in symfile.frames_by_address.itervalues():

- for frame in frames:

- frame.name = unsymbolized_name

- continue

- _SubPrintf('Symbolizing {} PCs from {}...',

- len(symfile.frames_by_address),

- symfile.symbolizable_path)

- symbolizer.Symbolize(symfile, unsymbolized_name)

-# Matches Android library paths, supports both K (/data/app-lib/<>/lib.so)

-# as well as L+ (/data/app/<>/lib/<>/lib.so). Library name is available

-# via 'name' group.

-ANDROID_PATH_MATCHER = re.compile(

- r'^/data/(?:'

- r'app/[^/]+/lib/[^/]+/|'

- r'app-lib/[^/]+/|'

- r'data/[^/]+/incremental-install-files/lib/'

- r')(?P<name>.*\.so)')

-# Subpath of output path where unstripped libraries are stored.

-ANDROID_UNSTRIPPED_SUBPATH = 'lib.unstripped'

-def HaveFilesFromAndroid(symfiles):

- return any(ANDROID_PATH_MATCHER.match(f.path) for f in symfiles)

-def RemapAndroidFiles(symfiles, output_path):

- for symfile in symfiles:

- match = ANDROID_PATH_MATCHER.match(symfile.path)

- if match:

- name = match.group('name')

- symfile.symbolizable_path = os.path.join(

- output_path, ANDROID_UNSTRIPPED_SUBPATH, name)

- else:

- # Clobber file path to trigger "not a file" problem in SymbolizeFiles().

- # Without this, files won't be symbolized with "file not found" problem,

- # which is not accurate.

- symfile.symbolizable_path = 'android://{}'.format(symfile.path)

-def RemapMacFiles(symfiles, symbol_base_directory, version):

- suffix = ("Google Chrome Framework.dSYM/Contents/Resources/DWARF/"

- "Google Chrome Framework")

- symbol_sub_dir = os.path.join(symbol_base_directory, version)

- symbolizable_path = os.path.join(symbol_sub_dir, suffix)

- for symfile in symfiles:

- if symfile.path.endswith("Google Chrome Framework"):

- symfile.symbolizable_path = symbolizable_path

-def RemapWinFiles(symfiles, symbol_base_directory, version, is64bit):

- folder = "win64" if is64bit else "win"

- symbol_sub_dir = os.path.join(symbol_base_directory,

- "chrome-" + folder + "-" + version)

- for symfile in symfiles:

- image = os.path.join(symbol_sub_dir, os.path.basename(symfile.path))

- symbols = image + ".pdb"

- if os.path.isfile(image) and os.path.isfile(symbols):

- symfile.symbolizable_path = image

-def Symbolize(options, trace, symbolizer):

- symfiles = ResolveSymbolizableFiles(trace.processes)

- # Android trace files don't have any indication they are from Android.

- # So we're checking for Android-specific paths.

- if HaveFilesFromAndroid(symfiles):

- if not options.output_directory:

- sys.exit('The trace file appears to be from Android. Please '

- 'specify output directory to properly symbolize it.')

- RemapAndroidFiles(symfiles, os.path.abspath(options.output_directory))

- if not trace.is_chromium:

- if symbolizer.is_mac:

- RemapMacFiles(symfiles, options.symbol_base_directory, trace.version)

- if symbolizer.is_win:

- RemapWinFiles(symfiles, options.symbol_base_directory, trace.version,

- trace.is_64bit)

- SymbolizeFiles(symfiles, symbolizer)

-def OpenTraceFile(file_path, mode):

- if file_path.endswith('.gz'):

- return gzip.open(file_path, mode + 'b')

- else:

- return open(file_path, mode + 't')

-def FetchAndExtractSymbolsMac(symbol_base_directory, version):

- def GetLocalPath(base_dir, version):

- return os.path.join(base_dir, version + ".tar.bz2")

- def GetSymbolsPath(version):

- return "desktop-*/" + version + "/mac64/Google Chrome.dSYM.tar.bz2"

- def ExtractSymbolTarFile(symbol_sub_dir, symbol_tar_file):

- os.makedirs(symbol_sub_dir)

- with tarfile.open(os.path.expanduser(symbol_tar_file), "r:bz2") as tar:

- tar.extractall(symbol_sub_dir)

- symbol_sub_dir = os.path.join(symbol_base_directory, version)

- if os.path.isdir(symbol_sub_dir):

- return True

- bzip_path = GetLocalPath(symbol_base_directory, version)

- if not os.path.isfile(bzip_path):

- _CLOUD_STORAGE_BUCKET = "chrome-unsigned"

- if not cloud_storage.Exists(_CLOUD_STORAGE_BUCKET, GetSymbolsPath(version)):

- print "Can't find symbols on GCS."

- return False

- print "Downloading symbols files from GCS, please wait."

- cloud_storage.Get(_CLOUD_STORAGE_BUCKET, GetSymbolsPath(version), bzip_path)

- ExtractSymbolTarFile(symbol_sub_dir, bzip_path)

- return True

-def FetchAndExtractSymbolsWin(symbol_base_directory, version, is64bit):

- def DownloadAndExtractZipFile(zip_path, source, destination):

- if not os.path.isfile(zip_path):

- _CLOUD_STORAGE_BUCKET = "chrome-unsigned"

- if not cloud_storage.Exists(_CLOUD_STORAGE_BUCKET, source):

- print "Can't find symbols on GCS."

- return False

- print "Downloading symbols files from GCS, please wait."

- cloud_storage.Get(_CLOUD_STORAGE_BUCKET, source, zip_path)

- if not os.path.isfile(zip_path):

- print "Can't download symbols on GCS."

- return False

- with zipfile.ZipFile(zip_path, "r") as zip:

- for member in zip.namelist():

- filename = os.path.basename(member)

- # Skip directories.

- if not filename:

- continue

- # Extract archived files.

- source = zip.open(member)

- target = file(os.path.join(symbol_sub_dir, filename), "wb")

- with source, target:

- shutil.copyfileobj(source, target)

- folder = "win64" if is64bit else "win"

- gcs_folder = "desktop-*/" + version + "/" + folder + "-pgo/"

- symbol_sub_dir = os.path.join(symbol_base_directory,

- "chrome-" + folder + "-" + version)

- if os.path.isdir(symbol_sub_dir):

- return True

- os.makedirs(symbol_sub_dir)

- DownloadAndExtractZipFile(

- os.path.join(symbol_base_directory,

- "chrome-" + folder + "-" + version + "-syms.zip"),

- gcs_folder + "chrome-win32-syms.zip",

- symbol_sub_dir)

- DownloadAndExtractZipFile(

- os.path.join(symbol_base_directory,

- "chrome-" + folder + "-" + version + ".zip"),

- gcs_folder + "chrome-" + folder + "-pgo.zip",

- symbol_sub_dir)

- return True

-# Suffix used for backup files.

-BACKUP_FILE_TAG = '.BACKUP'

-def main():

- parser = argparse.ArgumentParser()

- parser.add_argument(

- 'file',

- help='Trace file to symbolize (.json or .json.gz)')

- parser.add_argument(

- '--no-backup', dest='backup', default='true', action='store_false',

- help="Don't create {} files".format(BACKUP_FILE_TAG))

- parser.add_argument(

- '--output-directory',

- help='The path to the build output directory, such as out/Debug.')

- home_dir = os.path.expanduser('~')

- default_dir = os.path.join(home_dir, "symbols")

- parser.add_argument(

- '--symbol-base-directory',

- default=default_dir,

- help='Directory where symbols are downloaded and cached.')

- symbolizer = Symbolizer()

- if symbolizer.symbolizer_path is None:

- sys.exit("Can't symbolize - no %s in PATH." % symbolizer.binary)

- options = parser.parse_args()

- trace_file_path = options.file

- print 'Reading trace file...'

- with OpenTraceFile(trace_file_path, 'r') as trace_file:

- trace = Trace(json.load(trace_file))

- # Perform some sanity checks.

- if trace.is_win and sys.platform != 'win32':

- print "Cannot symbolize a windows trace on this architecture!"

- return False

- # If the trace is from Chromium, assume that symbols are already present.

- # Otherwise the trace is from Google Chrome. Assume that this is not a local

- # build of Google Chrome with symbols, and that we need to fetch symbols

- # from gcs.

- if not trace.is_chromium:

- has_symbols = False

- if symbolizer.is_mac:

- has_symbols = FetchAndExtractSymbolsMac(options.symbol_base_directory,

- trace.version)

- if symbolizer.is_win:

- has_symbols = FetchAndExtractSymbolsWin(options.symbol_base_directory,

- trace.version, trace.is_64bit)

- if not has_symbols:

- print 'Cannot fetch symbols from GCS'

- return False

- Symbolize(options, trace, symbolizer)

- if trace.modified:

- trace.ApplyModifications()

- if options.backup:

- backup_file_path = trace_file_path + BACKUP_FILE_TAG

- print 'Backing up trace file to {}'.format(backup_file_path)

- os.rename(trace_file_path, backup_file_path)

- print 'Updating the trace file...'

- with OpenTraceFile(trace_file_path, 'w') as trace_file:

- json.dump(trace.node, trace_file)

- else:

- print 'No modifications were made - not updating the trace file.'

-if __name__ == '__main__':

- main()

« tracing/bin/run_symbolize_trace_tests ('K') | « tracing/bin/run_symbolize_trace_tests ('k') | tracing/bin/symbolize_trace.py » ('j') | tracing/bin/symbolize_trace.py » ('J')