tools/remove_duplicate_includes.py - Issue 2379993006: Created a tool to remove duplicate includes between h and cc files.

Side by Side Diff: tools/remove_duplicate_includes.py

Issue 2379993006: Created a tool to remove duplicate includes between h and cc files. (Closed)

Patch Set: Making file executable and reworking dry-run. Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
(Empty)
	1 #!/usr/bin/env python

	2 # Copyright 2016 The Chromium Authors. All rights reserved.

	3 # Use of this source code is governed by a BSD-style license that can be

	4 # found in the LICENSE file.

	5

	6 """This script will search through the target folder specified and try to find

	7 duplicate includes from h and cc files, and remove them from the cc files.

	8

	9 Usage remove_duplicate_includes.py --dry-run components/foo components/bar

	10 """

	11

	12 import argparse;

	13 import collections;

	14 import os;

	15 import re;

	16

	17 parser = argparse.ArgumentParser()

	18 parser.add_argument('--dry-run', action='store_true',

	19 help='Does not actually remove lines when specified.')

	20 parser.add_argument('--span-dirs', action='store_true',

	21 help='Mapping between h and cc files will not be limited to same folders.')

	22 parser.add_argument('targets', nargs='+',

	23 help="Folders to search for duplicate includes in.")

	24 args = parser.parse_args()

	25

	26 # This could be generlized if desired, and moved to command line arguments.
	maxbogue 2016/09/30 17:40:31 generalized generalized skym 2016/09/30 19:18:12 Done. Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > generalized Done.
	27 H_FILE_SUFFIX = ".h"

	28 CC_FILE_SUFFIX = ".cc"

	29

	30 def HasSuffix(file_name, suffix):

	31 return os.path.splitext(file_name)[1] == suffix

	32

	33 def IsEmpty(line):

	34 return not line.strip()

	35

	36 # The \s should allow us to ignore any whitespace and only focus on the group

	37 # captured when comparing between files.

	38 regex = re.compile('^\s#include\s+(.?)\s*$')

	39

	40 # The key here depends on the span-dirs flag, if specified then it will only be
	maxbogue 2016/09/30 17:40:31 "A map of header files to the includes they contai "A map of header files to the includes they contain. The key is the full path of the header file unless the span-dirs flag is set, in which case will only be the file name (to allow mapping between files not in the same folder)." - The purpose of the variable first - The default key case (no flag) second - The abnormal key case (flag) third skym 2016/09/30 19:18:12 Done. Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > "A map of header files to the includes they contain. The key is the full path of > the header file unless the span-dirs flag is set, in which case will only be the > file name (to allow mapping between files not in the same folder)." > > - The purpose of the variable first > - The default key case (no flag) second > - The abnormal key case (flag) third Done.
	41 # the file name, and this will allows mapping between files not in the same

	42 # folder. If this flag not present then full path is used.

	43 h_file_to_include_set = collections.defaultdict(set)

	44

	45 # Key is always the full path to the cc file.
	maxbogue 2016/09/30 17:40:31 Explain the purpose of this variable. I think it c Explain the purpose of this variable. I think it collects all the cc files during the traversal so you can go through them after the headers have been processed? skym 2016/09/30 19:18:12 Done. Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > Explain the purpose of this variable. I think it collects all the cc files > during the traversal so you can go through them after the headers have been > processed? Done.
	46 cc_file_path_set = set()

	47

	48 for relative_root in args.targets:

	49 absolute_root = os.path.join(os.getcwd(), relative_root)

	50 for (dir_path, dir_name_list, file_name_list) in os.walk(absolute_root):
	maxbogue 2016/09/30 17:40:31 I haven't worked with python in a while but I'm pr I haven't worked with python in a while but I'm pretty sure you don't need the parens around the tuple. skym 2016/09/30 19:18:12 Done. Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > I haven't worked with python in a while but I'm pretty sure you don't need the > parens around the tuple. Done.
	51 for file_name in file_name_list:

	52 file_path = os.path.join(dir_path, file_name)

	53 if HasSuffix(file_name, H_FILE_SUFFIX):

	54 # Can be either name or path depending on flag.

	55 file_key = file_name if args.span_dirs else file_path

	56 with open(file_path) as file_handle:

	57 for line in file_handle:

	58 match = regex.search(line)

	59 if match:

	60 h_file_to_include_set[file_key].add(match.group(1))

	61 elif HasSuffix(file_name, CC_FILE_SUFFIX):

	62 cc_file_path_set.add(file_path)

	63

	64 for cc_file_path in cc_file_path_set:

	65 # The lookup must match index method when adding h files, depending on flag.

	66 cc_file_name = os.path.basename(cc_file_path)

	67 cc_file_key = cc_file_name if args.span_dirs else cc_file_path

	68 h_file_key = os.path.splitext(cc_file_key)[0] + H_FILE_SUFFIX
	maxbogue 2016/09/30 17:40:31 You could strip "_unittest" from the end here if i You could strip "_unittest" from the end here if it exists and suddenly this would catch duplicate includes in unittest files as well right? skym 2016/09/30 19:18:13 Oooooh, actually maybe I should be looking at the Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > You could strip "_unittest" from the end here if it exists and suddenly this > would catch duplicate includes in unittest files as well right? Oooooh, actually maybe I should be looking at the very first include in the .cc file and use that to calculate its appropriate .h file instead. And if it doesn't conform to the name of file.cc/file.h/file_unittest.cc then throw out a warning.
	69

	70 if h_file_key in h_file_to_include_set:

	71 include_set = h_file_to_include_set[h_file_key]

	72

	73 # Read out all the data and reset file position to start overwriting.

	74 file_handle = open(cc_file_path, "r" if args.dry_run else "r+")

	75 data = file_handle.readlines()
	maxbogue 2016/09/30 17:40:31 I'd probably just call this "lines" tbh I'd probably just call this "lines" tbh skym 2016/09/30 19:18:12 How about line_list? Show quoted text On 2016/09/30 17:40:31, maxbogue wrote: > I'd probably just call this "lines" tbh How about line_list?
	76 file_handle.seek(0)

	77

	78 # When a section of includes are completely removed, we want to remove the

	79 # trailing empty as well.

	80 lastCopiedLineWasEmpty = False

	81 lastLineWasOmitted = False

	82 for line in data:

	83 match = regex.search(line)

	84 if match is not None and match.group(1) in include_set:

	85 print "Removed " + match.group(1) + " " + cc_file_name

	86 lastLineWasOmitted = True

	87 elif lastCopiedLineWasEmpty and lastLineWasOmitted and IsEmpty(line):

	88 print "Removed empty line " + cc_file_name

	89 lastLineWasOmitted = True

	90 else:

	91 lastCopiedLineWasEmpty = IsEmpty(line)

	92 lastLineWasOmitted = False

	93 if not args.dry_run:

	94 file_handle.write(line)

	95 if not args.dry_run:

	96 file_handle.truncate()

	97 file_handle.close()

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »