tools/binary_size/explain_binary_size_delta.py - Issue 258633003: Graphical version of the run_binary_size_analysis tool.

Unified Diff: tools/binary_size/explain_binary_size_delta.py

Issue 258633003: Graphical version of the run_binary_size_analysis tool. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Using the python addr2line wrapper. Created 6 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

« tools/binary_size/binary_size_utils.py ('K') | « tools/binary_size/binary_size_utils.py ('k') | tools/binary_size/java/src/org/chromium/tools/binary_size/Addr2LineWorkerPool.java » ('j') | tools/binary_size/run_binary_size_analysis.py » ('J')
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: tools/binary_size/explain_binary_size_delta.py

diff --git a/tools/binary_size/explain_binary_size_delta.py b/tools/binary_size/explain_binary_size_delta.py

new file mode 100755

index 0000000000000000000000000000000000000000..9552ebc1e5428c36f5faec9cf89a0f66d78ff865

--- /dev/null

+++ b/tools/binary_size/explain_binary_size_delta.py

@@ -0,0 +1,414 @@

+#!/usr/bin/python

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 #!/usr/bin/env python ? The difference is that the

+# Use of this source code is governed by a BSD-style license that can be

+# found in the LICENSE file.

+"""Generate a description of the size differences between two binaries

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 The first line of a docstring should fit in one li

+based on an analysis of symbols.

+This tool needs "nm" dumps of each binary with full symbol information.

+In order to obtain such dumps you need full source checkouts of each

+binary that you want to anaylze. You can obtain the necessary dumps by

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Out of curiosity, why you need the source? Isn't t

Daniel Bratell 2014/05/21 08:42:13 I don't know for sure, but we've talked about comp

+running the run_binary_size_analysis.py script upon each binary, with

+the "--nm-out" parameter set to the location in which you want to save

+the dumps. Example:

+ # obtain symbol data from first binary in /tmp/nm1.dump

+ cd $CHECKOUT1_SRC

+ ninja -C out/Release binary_size_tool

+ tools/binary_size/run_binary_size_analysis \

+ --library <path_to_library>

+ --destdir /tmp/throwaway

+ --nm-out /tmp/nm1.dump

+ # obtain symbol data from second binary in /tmp/nm2.dump

+ cd $CHECKOUT2_SRC

+ ninja -C out/Release binary_size_tool

+ tools/binary_size/run_binary_size_analysis \

+ --library <path_to_library>

+ --destdir /tmp/throwaway

+ --nm-out /tmp/nm2.dump

+ # cleanup useless files

+ rm -r /tmp/throwaway

+ # run this tool

+ explain_binary_size_delta.py --nm1 /tmp/nm1.dump --nm2 /tmp/nm2.dump

+"""

+import collections

+import fileinput

+import json

+import optparse

+import os

+import pprint

+import sys

+import binary_size_utils

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Nit, add an extra line (two lines between top leve

+def compare(symbols1, symbols2):

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Nit: Compare (capital case) Also top level functio

+ added = [] # tuples

+ removed = [] # tuples

+ changed = [] # tuples

+ unchanged = [] # tuples

+ cache1 = {}

+ cache2 = {}

+ # Make a map of (file, symbol_type) : (symbol_name, symbol_size)

+ for cache, symbols in ((cache1, symbols1), (cache2, symbols2)):

+ for symbol_name, symbol_type, symbol_size, file_path in symbols:

+ if 'vtable for ' in symbol_name:

+ symbol_type = '@' # hack to categorize these separately

+ if file_path:

+ file_path = os.path.normpath(file_path)

+ else:

+ file_path = '(No Path)'

+ key = (file_path, symbol_type)

+ bucket = cache.get(key, None)

+ if not bucket:

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 lines 68.70 can be just: bucket = cache.setdefault

+ bucket = {}

+ cache[key] = bucket

+ bucket[symbol_name] = symbol_size

+ # Now diff them. We iterate over the elements in cache1. For each symbol

+ # that we find in cache2, we record whether it was deleted, changed, or

+ # unchanged. We then remove it from cache2; all the symbols that remain

+ # in cache2 at the end of the iteration over cache1 are the 'new' symbols.

+ for key, bucket1 in cache1.items():

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 s/items/iteritems/

Daniel Bratell 2014/05/21 08:42:13 cache1 is mutated during the iteration so it has t

+ bucket2 = cache2.get(key, None)

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 ", None" is redundant, the default value for the

+ if not bucket2:

+ # A file was removed. Everything in bucket1 is dead.

+ for symbol_name, symbol_size in bucket1.items():

+ removed.append((key[0], key[1], symbol_name, symbol_size, None))

+ else:

+ # File still exists, look for changes within.

+ for symbol_name, symbol_size in bucket1.items():

+ size2 = bucket2.get(symbol_name, None)

+ if not size2:

+ # Symbol no longer exists in bucket2.

+ removed.append((key[0], key[1], symbol_name, symbol_size, None))

+ else:

+ del bucket2[symbol_name] # Symbol is not new, delete from cache2.

+ if len(bucket2) == 0:

+ del cache1[key] # Entire bucket is empty, delete from cache2

+ if symbol_size != size2:

+ # Symbol has change size in bucket.

+ changed.append((key[0], key[1], symbol_name, symbol_size, size2))

+ else:

+ # Symbol is unchanged.

+ unchanged.append((key[0], key[1], symbol_name, symbol_size, size2))

+ # We have now analyzed all symbols that are in cache1 and removed all of

+ # the encountered symbols from cache2. What's left in cache2 is the new

+ # symbols.

+ for key, bucket2 in cache2.items():

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 s/items/iteritems/ here and below

+ for symbol_name, symbol_size in bucket2.items():

+ added.append((key[0], key[1], symbol_name, None, symbol_size))

+ return [added, removed, changed, unchanged]

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Shouldn't this be a tuple rather than a list? You

+def test_compare():

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Many thanks for the test, yay! :) Just, we typical

Daniel Bratell 2014/05/21 08:42:13 Let me make a case for keeping it in the same file

Primiano Tucci (use gerrit) 2014/05/21 10:05:59 Oh, actually, good point. This should have a PRESU

+ # List entries have form: symbol_name, symbol_type, symbol_size, file_path

+ symbol_list1 = (

+ # File with one symbol, left as-is.

+ ( 'unchanged', 't', 1000, '/file_unchanged' ),

+ # File with one symbol, changed.

+ ( 'changed', 't', 1000, '/file_all_changed' ),

+ # File with one symbol, deleted.

+ ( 'removed', 't', 1000, '/file_all_deleted' ),

+ # File with two symbols, one unchanged, one changed, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_changed' ),

+ ( 'changed', 't', 1000, '/file_pair_unchanged_changed' ),

+ # File with two symbols, one unchanged, one deleted, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_removed' ),

+ ( 'removed', 't', 1000, '/file_pair_unchanged_removed' ),

+ # File with two symbols, one unchanged, one added, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_added' ),

+ # File with two symbols, one unchanged, one changed, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_changed' ),

+ ( 'changed', '@', 1000, '/file_pair_unchanged_diffbuck_changed' ),

+ # File with two symbols, one unchanged, one deleted, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_removed' ),

+ ( 'removed', '@', 1000, '/file_pair_unchanged_diffbuck_removed' ),

+ # File with two symbols, one unchanged, one added, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_added' ),

+ # File with four symbols, one added, one removed, one changed, one unchanged

+ ( 'size_changed', 't', 1000, '/file_tetra' ),

+ ( 'removed', 't', 1000, '/file_tetra' ),

+ ( 'unchanged', 't', 1000, '/file_tetra' ),

+ );

+ symbol_list2 = (

+ # File with one symbol, left as-is.

+ ( 'unchanged', 't', 1000, '/file_unchanged' ),

+ # File with one symbol, changed.

+ ( 'changed', 't', 2000, '/file_all_changed' ),

+ # File with two symbols, one unchanged, one changed, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_changed' ),

+ ( 'changed', 't', 2000, '/file_pair_unchanged_changed' ),

+ # File with two symbols, one unchanged, one deleted, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_removed' ),

+ # File with two symbols, one unchanged, one added, same bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_added' ),

+ ( 'added', 't', 1000, '/file_pair_unchanged_added' ),

+ # File with two symbols, one unchanged, one changed, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_changed' ),

+ ( 'changed', '@', 2000, '/file_pair_unchanged_diffbuck_changed' ),

+ # File with two symbols, one unchanged, one deleted, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_removed' ),

+ # File with two symbols, one unchanged, one added, different bucket

+ ( 'unchanged', 't', 1000, '/file_pair_unchanged_diffbuck_added' ),

+ ( 'added', '@', 1000, '/file_pair_unchanged_diffbuck_added' ),

+ # File with four symbols, one added, one removed, one changed, one unchanged

+ ( 'size_changed', 't', 2000, '/file_tetra' ),

+ ( 'unchanged', 't', 1000, '/file_tetra' ),

+ ( 'added', 't', 1000, '/file_tetra' ),

+ # New file with one symbol added

+ ( 'added', 't', 1000, '/file_new' ),

+ );

+ # Here we go

+ (added, removed, changed, unchanged) = compare(symbol_list1, symbol_list2)

+ # File with one symbol, left as-is.

+ assert ('/file_unchanged', 't', 'unchanged', 1000, 1000) in unchanged

Primiano Tucci (use gerrit) 2014/05/20 15:22:57 Also, you might want to take a look to python unit

+ # File with one symbol, changed.

+ assert ('/file_all_changed', 't', 'changed', 1000, 2000) in changed

+ # File with one symbol, deleted.

+ assert ('/file_all_deleted', 't', 'removed', 1000, None) in removed

+ # New file with one symbol added

+ assert ('/file_new', 't', 'added', None, 1000) in added

+ # File with two symbols, one unchanged, one changed, same bucket

+ assert ('/file_pair_unchanged_changed',