Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 # Copyright 2014 The Chromium Authors. All rights reserved. | 1 # Copyright 2014 The Chromium Authors. All rights reserved. |
| 2 # Use of this source code is governed by a BSD-style license that can be | 2 # Use of this source code is governed by a BSD-style license that can be |
| 3 # found in the LICENSE file. | 3 # found in the LICENSE file. |
| 4 | 4 |
| 5 import collections | 5 import collections |
| 6 import datetime | 6 import datetime |
| 7 import logging | 7 import logging |
| 8 import multiprocessing | 8 import multiprocessing |
| 9 import os | 9 import os |
| 10 import posixpath | 10 import posixpath |
| 11 import Queue | 11 import Queue |
| 12 import re | 12 import re |
| 13 import subprocess | 13 import subprocess |
| 14 import sys | 14 import sys |
| 15 import threading | 15 import threading |
| 16 | 16 from sets import Set |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
just use the builtin set. We don't target python <
| |
| 17 | 17 |
| 18 # addr2line builds a possibly infinite memory cache that can exhaust | 18 # addr2line builds a possibly infinite memory cache that can exhaust |
| 19 # the computer's memory if allowed to grow for too long. This constant | 19 # the computer's memory if allowed to grow for too long. This constant |
| 20 # controls how many lookups we do before restarting the process. 4000 | 20 # controls how many lookups we do before restarting the process. 4000 |
| 21 # gives near peak performance without extreme memory usage. | 21 # gives near peak performance without extreme memory usage. |
| 22 ADDR2LINE_RECYCLE_LIMIT = 4000 | 22 ADDR2LINE_RECYCLE_LIMIT = 4000 |
| 23 | 23 |
| 24 | 24 |
| 25 class ELFSymbolizer(object): | 25 class ELFSymbolizer(object): |
| 26 """An uber-fast (multiprocessing, pipelined and asynchronous) ELF symbolizer. | 26 """An uber-fast (multiprocessing, pipelined and asynchronous) ELF symbolizer. |
| (...skipping 41 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 68 |max_queue_size| bound, a new addr2line instance is kicked in. | 68 |max_queue_size| bound, a new addr2line instance is kicked in. |
| 69 In the case of a very eager producer (i.e. all |max_concurrent_jobs| instances | 69 In the case of a very eager producer (i.e. all |max_concurrent_jobs| instances |
| 70 have a backlog of |max_queue_size|), back-pressure is applied on the caller by | 70 have a backlog of |max_queue_size|), back-pressure is applied on the caller by |
| 71 blocking the SymbolizeAsync method. | 71 blocking the SymbolizeAsync method. |
| 72 | 72 |
| 73 This module has been deliberately designed to be dependency free (w.r.t. of | 73 This module has been deliberately designed to be dependency free (w.r.t. of |
| 74 other modules in this project), to allow easy reuse in external projects. | 74 other modules in this project), to allow easy reuse in external projects. |
| 75 """ | 75 """ |
| 76 | 76 |
| 77 def __init__(self, elf_file_path, addr2line_path, callback, inlines=False, | 77 def __init__(self, elf_file_path, addr2line_path, callback, inlines=False, |
| 78 max_concurrent_jobs=None, addr2line_timeout=30, max_queue_size=50): | 78 max_concurrent_jobs=None, addr2line_timeout=30, max_queue_size=50, |
| 79 disambiguate=False, disambiguation_source_path=''): | |
|
Primiano Tucci (use gerrit)
2014/06/18 10:48:53
I had a quick chat with Andrew and we feel the bes
| |
| 79 """Args: | 80 """Args: |
| 80 elf_file_path: path of the elf file to be symbolized. | 81 elf_file_path: path of the elf file to be symbolized. |
| 81 addr2line_path: path of the toolchain's addr2line binary. | 82 addr2line_path: path of the toolchain's addr2line binary. |
| 82 callback: a callback which will be invoked for each resolved symbol with | 83 callback: a callback which will be invoked for each resolved symbol with |
| 83 the two args (sym_info, callback_arg). The former is an instance of | 84 the two args (sym_info, callback_arg). The former is an instance of |
| 84 |ELFSymbolInfo| and contains the symbol information. The latter is an | 85 |ELFSymbolInfo| and contains the symbol information. The latter is an |
| 85 embedder-provided argument which is passed to SymbolizeAsync(). | 86 embedder-provided argument which is passed to SymbolizeAsync(). |
| 86 inlines: when True, the ELFSymbolInfo will contain also the details about | 87 inlines: when True, the ELFSymbolInfo will contain also the details about |
| 87 the outer inlining functions. When False, only the innermost function | 88 the outer inlining functions. When False, only the innermost function |
| 88 will be provided. | 89 will be provided. |
| 89 max_concurrent_jobs: Max number of addr2line instances spawned. | 90 max_concurrent_jobs: Max number of addr2line instances spawned. |
| 90 Parallelize responsibly, addr2line is a memory and I/O monster. | 91 Parallelize responsibly, addr2line is a memory and I/O monster. |
| 91 max_queue_size: Max number of outstanding requests per addr2line instance. | 92 max_queue_size: Max number of outstanding requests per addr2line instance. |
| 92 addr2line_timeout: Max time (in seconds) to wait for a addr2line response. | 93 addr2line_timeout: Max time (in seconds) to wait for a addr2line response. |
| 93 After the timeout, the instance will be considered hung and respawned. | 94 After the timeout, the instance will be considered hung and respawned. |
| 95 disambiguate: Whether to run a disambiguation process or not. | |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
Can we use just one variable (call it source_root_
| |
| 96 Disambiguation means to resolve ambiguous source_paths, for | |
| 97 example turn addr2line output "unicode.cc" into a full and absolute | |
| 98 path. In some toolchains only the name of the source file is output, | |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
I'd love to know more about this btw. In which cas
| |
| 99 without any path information; disambiguation searches through the | |
| 100 source directory specified by 'disambiguate_source_path' argument | |
| 101 for files whose name matches. If there are multiple files with the | |
| 102 same name, disambiguation will fail. | |
| 103 disambiguate_source_path: The path to the directory where the source | |
| 104 files are located, used for disambiguating paths. | |
| 94 """ | 105 """ |
| 95 assert(os.path.isfile(addr2line_path)), 'Cannot find ' + addr2line_path | 106 assert(os.path.isfile(addr2line_path)), 'Cannot find ' + addr2line_path |
| 96 self.elf_file_path = elf_file_path | 107 self.elf_file_path = elf_file_path |
| 97 self.addr2line_path = addr2line_path | 108 self.addr2line_path = addr2line_path |
| 98 self.callback = callback | 109 self.callback = callback |
| 99 self.inlines = inlines | 110 self.inlines = inlines |
| 100 self.max_concurrent_jobs = (max_concurrent_jobs or | 111 self.max_concurrent_jobs = (max_concurrent_jobs or |
| 101 min(multiprocessing.cpu_count(), 4)) | 112 min(multiprocessing.cpu_count(), 4)) |
| 102 self.max_queue_size = max_queue_size | 113 self.max_queue_size = max_queue_size |
| 103 self.addr2line_timeout = addr2line_timeout | 114 self.addr2line_timeout = addr2line_timeout |
| 104 self.requests_counter = 0 # For generating monotonic request IDs. | 115 self.requests_counter = 0 # For generating monotonic request IDs. |
| 105 self._a2l_instances = [] # Up to |max_concurrent_jobs| _Addr2Line inst. | 116 self._a2l_instances = [] # Up to |max_concurrent_jobs| _Addr2Line inst. |
| 106 | 117 |
| 118 # If necessary, create disambiguation lookup table | |
| 119 self.disambiguate = disambiguate | |
| 120 self.commonprefix = '' | |
| 121 self.lookup_table = {} | |
| 122 if(self.disambiguate): | |
| 123 self._CreateDisambiguationTable(disambiguation_source_path) | |
| 124 | |
| 107 # Create one addr2line instance. More instances will be created on demand | 125 # Create one addr2line instance. More instances will be created on demand |
| 108 # (up to |max_concurrent_jobs|) depending on the rate of the requests. | 126 # (up to |max_concurrent_jobs|) depending on the rate of the requests. |
| 109 self._CreateNewA2LInstance() | 127 self._CreateNewA2LInstance() |
| 110 | 128 |
| 111 def SymbolizeAsync(self, addr, callback_arg=None): | 129 def SymbolizeAsync(self, addr, callback_arg=None): |
| 112 """Requests symbolization of a given address. | 130 """Requests symbolization of a given address. |
| 113 | 131 |
| 114 This method is not guaranteed to return immediately. It generally does, but | 132 This method is not guaranteed to return immediately. It generally does, but |
| 115 in some scenarios (e.g. all addr2line instances have full queues) it can | 133 in some scenarios (e.g. all addr2line instances have full queues) it can |
| 116 block to create back-pressure. | 134 block to create back-pressure. |
| (...skipping 37 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 154 for a2l in self._a2l_instances: | 172 for a2l in self._a2l_instances: |
| 155 a2l.WaitForIdle() | 173 a2l.WaitForIdle() |
| 156 a2l.Terminate() | 174 a2l.Terminate() |
| 157 | 175 |
| 158 def _CreateNewA2LInstance(self): | 176 def _CreateNewA2LInstance(self): |
| 159 assert(len(self._a2l_instances) < self.max_concurrent_jobs) | 177 assert(len(self._a2l_instances) < self.max_concurrent_jobs) |
| 160 a2l = ELFSymbolizer.Addr2Line(self) | 178 a2l = ELFSymbolizer.Addr2Line(self) |
| 161 self._a2l_instances.append(a2l) | 179 self._a2l_instances.append(a2l) |
| 162 return a2l | 180 return a2l |
| 163 | 181 |
| 182 def _CreateDisambiguationTable(self, src_root_path): | |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
I don't really like this approach. This is trying
Primiano Tucci (use gerrit)
2014/06/18 10:48:53
After talking with Andrew, given your use case, I'
| |
| 183 """ Creates a table of files used for disambiguation later | |
| 184 Disambiguation: | |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
This comment is redundant. You already explained t
| |
| 185 addr2line sometimes return an ambigous file-name rather than the | |
| 186 full path of the file where the symbol is located. | |
| 187 | |
| 188 adopted from andrewhaydens implementation in earlier commits """ | |
| 189 interesting_file_endings = { ".c", ".cc", ".h", ".cp", ".cpp", ".cxx", | |
| 190 ".c++", ".asm", ".inc", ".s", ".hxx" } | |
| 191 duplicates = Set() | |
| 192 self.lookup_table = {} | |
| 193 src_root_path = os.path.abspath(src_root_path) | |
| 194 | |
| 195 for root, _, filenames in os.walk(src_root_path): | |
| 196 for f in filenames: | |
| 197 _, ext = os.path.splitext(f) | |
| 198 if not ext in interesting_file_endings: | |
| 199 continue | |
| 200 | |
| 201 base = os.path.basename(f) # Just in case | |
| 202 if self.lookup_table.get(base) is None: | |
| 203 self.lookup_table[base] = "%s/%s" % (root, f) | |
| 204 else: | |
| 205 duplicates.add(base) | |
| 206 | |
| 207 # Duplicates can not be used for disambiguation, as we can not determine | |
| 208 # the true source if we have more than one to choose from | |
| 209 for d in duplicates: | |
| 210 del self.lookup_table[d] | |
| 211 | |
| 212 # Get the common prefix for the source paths | |
| 213 self.commonprefix = os.path.commonprefix(self.lookup_table.values()) | |
| 164 | 214 |
| 165 class Addr2Line(object): | 215 class Addr2Line(object): |
| 166 """A python wrapper around an addr2line instance. | 216 """A python wrapper around an addr2line instance. |
| 167 | 217 |
| 168 The communication with the addr2line process looks as follows: | 218 The communication with the addr2line process looks as follows: |
| 169 [STDIN] [STDOUT] (from addr2line's viewpoint) | 219 [STDIN] [STDOUT] (from addr2line's viewpoint) |
| 170 > f001111 | 220 > f001111 |
| 171 > f002222 | 221 > f002222 |
| 172 < Symbol::Name(foo, bar) for f001111 | 222 < Symbol::Name(foo, bar) for f001111 |
| 173 < /path/to/source/file.c:line_number | 223 < /path/to/source/file.c:line_number |
| (...skipping 131 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 305 source_line = None | 355 source_line = None |
| 306 m = ELFSymbolizer.Addr2Line.SYM_ADDR_RE.match(line2) | 356 m = ELFSymbolizer.Addr2Line.SYM_ADDR_RE.match(line2) |
| 307 if m: | 357 if m: |
| 308 if not m.group(1).startswith('?'): | 358 if not m.group(1).startswith('?'): |
| 309 source_path = m.group(1) | 359 source_path = m.group(1) |
| 310 if not m.group(2).startswith('?'): | 360 if not m.group(2).startswith('?'): |
| 311 source_line = int(m.group(2)) | 361 source_line = int(m.group(2)) |
| 312 else: | 362 else: |
| 313 logging.warning('Got invalid symbol path from addr2line: %s' % line2) | 363 logging.warning('Got invalid symbol path from addr2line: %s' % line2) |
| 314 | 364 |
| 315 sym_info = ELFSymbolInfo(name, source_path, source_line) | 365 # In case disambiguation is on, and needed |
| 366 disambiguated = False | |
| 367 failed_disambiguation = False | |
| 368 if self._symbolizer.disambiguate: | |
| 369 if not source_path is None and not source_path.startswith('/'): | |
|
Andrew Hayden (chromium.org)
2014/06/18 09:10:59
Again, let's avoid double-negation:
if source_path
| |
| 370 source_path = self._symbolizer.lookup_table.get(source_path) | |
| 371 failed_disambiguation = source_path is None | |
| 372 disambiguated = not failed_disambiguation | |
|
Andrew Hayden (chromium.org)
2014/06/18 09:10:59
As written, you don't need two booleans. They're j
| |
| 373 | |
| 374 # Use the absolute path | |
| 375 if not source_path is None: | |
|
Andrew Hayden (chromium.org)
2014/06/18 09:10:59
Again let's avoid double negation:
if source_path
| |
| 376 source_path = os.path.abspath(source_path) | |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
What is this for?
Looks like that even if you're n
Primiano Tucci (use gerrit)
2014/06/18 10:48:53
At this point you also check if source_path.starts
| |
| 377 | |
| 378 sym_info = ELFSymbolInfo(name, source_path, source_line, disambiguated, | |
| 379 failed_disambiguation) | |
| 316 if prev_sym_info: | 380 if prev_sym_info: |
| 317 prev_sym_info.inlined_by = sym_info | 381 prev_sym_info.inlined_by = sym_info |
| 318 if not innermost_sym_info: | 382 if not innermost_sym_info: |
| 319 innermost_sym_info = sym_info | 383 innermost_sym_info = sym_info |
| 320 | 384 |
| 321 self._processed_symbols_count += 1 | 385 self._processed_symbols_count += 1 |
| 322 self._symbolizer.callback(innermost_sym_info, callback_arg) | 386 self._symbolizer.callback(innermost_sym_info, callback_arg) |
| 323 | 387 |
| 324 def _RestartAddr2LineProcess(self): | 388 def _RestartAddr2LineProcess(self): |
| 325 if self._proc: | 389 if self._proc: |
| (...skipping 60 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 386 | 450 |
| 387 @property | 451 @property |
| 388 def first_request_id(self): | 452 def first_request_id(self): |
| 389 """Returns the request_id of the oldest pending request in the queue.""" | 453 """Returns the request_id of the oldest pending request in the queue.""" |
| 390 return self._request_queue[0][2] if self._request_queue else 0 | 454 return self._request_queue[0][2] if self._request_queue else 0 |
| 391 | 455 |
| 392 | 456 |
| 393 class ELFSymbolInfo(object): | 457 class ELFSymbolInfo(object): |
| 394 """The result of the symbolization passed as first arg. of each callback.""" | 458 """The result of the symbolization passed as first arg. of each callback.""" |
| 395 | 459 |
| 396 def __init__(self, name, source_path, source_line): | 460 def __init__(self, name, source_path, source_line, disambiguated=False, |
|
Primiano Tucci (use gerrit)
2014/06/18 09:49:11
Do you need to pass these booleans at all? Can't y
| |
| 461 failed_disambiguation=False): | |
| 397 """All the fields here can be None (if addr2line replies with '??').""" | 462 """All the fields here can be None (if addr2line replies with '??').""" |
| 398 self.name = name | 463 self.name = name |
| 399 self.source_path = source_path | 464 self.source_path = source_path |
| 400 self.source_line = source_line | 465 self.source_line = source_line |
| 401 # In the case of |inlines|=True, the |inlined_by| points to the outer | 466 # In the case of |inlines|=True, the |inlined_by| points to the outer |
| 402 # function inlining the current one (and so on, to form a chain). | 467 # function inlining the current one (and so on, to form a chain). |
| 403 self.inlined_by = None | 468 self.inlined_by = None |
| 469 self.disambiguated = disambiguated | |
| 470 self.failed_disambiguation = failed_disambiguation | |
| 404 | 471 |
| 405 def __str__(self): | 472 def __str__(self): |
| 406 return '%s [%s:%d]' % ( | 473 return '%s [%s:%d]' % ( |
| 407 self.name or '??', self.source_path or '??', self.source_line or 0) | 474 self.name or '??', self.source_path or '??', self.source_line or 0) |
| OLD | NEW |