build/scan_sources.py - Issue 8037013: Create scanning script to determine header dependencies.

Unified Diff: build/scan_sources.py

Issue 8037013: Create scanning script to determine header dependencies. (Closed) Base URL: svn://chrome-svn/chrome/trunk/src/

Patch Set: Created 9 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: build/scan_sources.py

===================================================================

--- build/scan_sources.py (revision 0)

+++ build/scan_sources.py (revision 0)

@@ -0,0 +1,163 @@

+#!/usr/bin/python

+# Use of this source code is governed by a BSD-style license that can be

+# found in the LICENSE file.

+import os

+import re

+import sys

+"""

bradn 2011/09/26 06:21:57 Pull this up onto the line with the """ (per style

noelallen1 2011/09/30 01:12:45 Done.

+ Header Scanner.

+This module will scan a set of input sources for include dependencies. Use

+the command-line switch -Ixxxx to add include paths.

+"""

bradn 2011/09/26 06:21:57 Extra cr here.

noelallen1 2011/09/30 01:12:45 Done.

+class Resolver(object):

+ """

+ The Resolver object provides a mechanism to to find and convert a source or

+ include filename into a relative path based on provided search paths.

+ """

+ def __init__(self):

+ self.search_dirs = []

+ self.AddDir('.')

bradn 2011/09/26 06:21:57 This is adding cwd to the search order (the cwd wh

noelallen1 2011/09/30 01:12:45 Removed, I force you to specify it now.

+ self.cwd = os.path.realpath(os.getcwd())

bradn 2011/09/26 06:21:57 I've noticed you used realpath throughout. I actua

noelallen1 2011/09/30 01:12:45 I use realpath so that two links to the same locat

+ self.offs = len(self.cwd)

+ def AddDir(self, pathname):

+ """Add an include search path."""

+ pathname = os.path.realpath(pathname)

+ if pathname not in self.search_dirs:

+ if os.path.isdir(pathname):

+ self.search_dirs.append(pathname)

+ print 'Added dir: %s' % pathname

bradn 2011/09/26 06:21:57 Leftover? I assume these are something you want in

noelallen1 2011/09/30 01:12:45 Not sure what we should do here. We "failed" beca

+ else:

+ print 'Not a directory: %s' % pathname

+ return False

+ return True

+ def ToRelative(self, filepath):

+ """Returns a relative path from CWD to filepath."""

+ filepath = os.path.realpath(filepath)

+ basepath = self.cwd

+ path_parts = filepath.split(os.sep)

+ base_parts = basepath.split(os.sep)

+ print "filepath vs basepath"

+ print " %s\n %s\n" % (filepath, basepath)

bradn 2011/09/26 06:21:57 Leftover?

noelallen1 2011/09/30 01:12:45 Done.

+ while path_parts and base_parts and path_parts[0] == base_parts[0]:

+ path_parts = path_parts[1:]

+ base_parts = base_parts[1:]

+ rel_parts = ['..'] * len(base_parts) + path_parts

+ return os.sep.join(rel_parts)

+ def FindFile(self, filename):

+ """Search for <filename> across the search directories, and if found,

+ return the filepath relative to the CWD if found or None. """

+ if filename[0] == os.path.sep:

bradn 2011/09/26 06:21:57 os.path.isabs(filename)

noelallen1 2011/09/30 01:12:45 Done.

+ if os.path.exists(filename):

+ return self.ToRelative(filename)

+ return None

+ for pathname in self.search_dirs:

+ fullname = os.path.join(pathname, filename)

+ if os.path.exists(fullname):

+ return self.ToRelative(fullname)

+ return None

+class Scanner(object):

+ """

bradn 2011/09/26 06:21:57 Give a one line description per the style guide.

+ Scanner does the regular expression work, loading and scanning the source

+ files for the key '#include' to find dependencies.

+ """

+ def __init__(self, parent):

+ regex = r'(?P<inc>\#include [<"].+[>"])'

bradn 2011/09/26 06:21:57 You did P<inc> but never used that group name.

bradn 2011/09/26 06:21:57 .+ -> [^>"]+

+ self.work_q = parent

bradn 2011/09/26 06:21:57 So include <> vs include "" have different behavio

So include <> vs include "" have different behaviors. Though unfortunately different in preprocessor specific ways. Typically "" prepends the directory containing the file doing the including to the search order, whereas <> does not. In some preprocessors this extra path is propagated to child includes (msvs), but not in gcc I believe. Not sure how far we want to go on correctness, but I guess we do hope to use this all over. My guess is its worked locally due to cwd being in the include order? Dug up what scons is doing for instance: """ A Classic Scanner subclass which takes into account the type of bracketing used to include the file, and uses classic CPP rules for searching for the files based on the bracketing. Note that in order for this to work, the regular expression passed to the constructor must return the leading bracket in group 0, and the contained filename in group 1. """ def find_include(self, include, source_dir, path): if include[0] == '"': paths = (source_dir,) + tuple(path) else: paths = tuple(path) + (source_dir,) n = SCons.Node.FS.find_file(include[1], paths) return n, include[1] Appears they assume source_dir is prepended for " and appended for < just in case. Oh interesting, here's the regex scons uses: ^[ \t]*#[ \t]*(?:include|import)[ \t]*(<|")(\[^>"]+)(>|") You likely don't care about #import ObjectiveC-ism, but I think gcc supports it in general. Oh, but you do want to support the spaces/tabs between # and include and </". I don't know if we have any in our code base, but I assume you've seen indented processor style: #ifdef foo # include "a/b/c/d" #else # include "x/y/z" #endif

noelallen1 2011/09/30 01:12:45 Done.

On 2011/09/26 06:21:57, bradn wrote: > So include <> vs include "" have different behaviors. > Though unfortunately different in preprocessor specific ways. > Typically "" prepends the directory containing the file doing the including to > the search order, whereas <> does not. In some preprocessors this extra path is > propagated to child includes (msvs), but not in gcc I believe. > > Not sure how far we want to go on correctness, but I guess we do hope to use > this all over. My guess is its worked locally due to cwd being in the include > order? > > Dug up what scons is doing for instance: > > > """ > A Classic Scanner subclass which takes into account the type of > bracketing used to include the file, and uses classic CPP rules > for searching for the files based on the bracketing. > > Note that in order for this to work, the regular expression passed > to the constructor must return the leading bracket in group 0, and > the contained filename in group 1. > """ > def find_include(self, include, source_dir, path): > if include[0] == '"': > paths = (source_dir,) + tuple(path) > else: > paths = tuple(path) + (source_dir,) > > n = SCons.Node.FS.find_file(include[1], paths) > > return n, include[1] > > Appears they assume source_dir is prepended for " and appended for < just in > case. > > > Oh interesting, here's the regex scons uses: > > ^[ \t]*#[ \t]*(?:include|import)[ \t]*(<|")(\[^>"]+)(>|") > > You likely don't care about #import ObjectiveC-ism, but I think gcc supports it > in general. > Oh, but you do want to support the spaces/tabs between # and include and </". > > I don't know if we have any in our code base, but I assume you've seen indented > processor style: > > #ifdef foo > # include "a/b/c/d" > #else > # include "x/y/z" > #endif

Done.

+ self.parser = re.compile(regex)

+ def GetIncludes(self, data):

+ """Generate a list of includes."""

+ out = []

+ for token in self.parser.split(data):

bradn 2011/09/26 06:21:57 Split's a little weird, I'd use findall here as if

+ if len(token) > 8 and token[0:8] == '#include':

+ filepath = token.split()[1]

+ out.append(filepath[1:-1])

+ return out

+ def Scan(self, filename):

+ """Attempt to scan the given file for a list of includes."""

+ try:

+ data = open(filename).read()

+ return self.GetIncludes(data)

+ except:

bradn 2011/09/26 06:21:57 Technically the style guide forbids carte-blanc ex

+ return []

+class WorkQueue(object):

+ """

bradn 2011/09/26 06:21:57 Pull out the first sentence as a one line descript

+ WorkQueue contains provides a queue of files to be processed. The scanner

+ will attempt to push new items into the queue, which will be ignored if the

+ item is already in the queue. If the item is new, it will be added to the

+ work list, which is drained by the scanner.

+ """

+ def __init__(self, resolver):

+ self.added_set = set()

+ self.todo_list = list()

+ self.scanner = Scanner(self)

+ self.resolver = resolver

+ def PushIfNew(self, filename):

+ """Add this dependency to the list of not already there."""

+ resolved_name = self.resolver.FindFile(filename)

+ if not resolved_name:

+ return

+ if resolved_name in self.added_set:

+ return

+ self.todo_list.append(resolved_name)

+ self.added_set |= set([resolved_name])

bradn 2011/09/26 06:21:57 -> self.added_set.add(resolved_name)

noelallen1 2011/09/30 01:12:45 Done.

+ def PopIfAvail(self):

+ """Fetch the next dependency to search."""

+ if not self.todo_list:

+ return None

+ return self.todo_list.pop()

+ def Run(self):

+ """Search through the available dependencies until the list becomes empty.

+ The list must be primed with one or more source files to search."""

+ scan_name = self.PopIfAvail()

+ while scan_name:

+ includes = self.scanner.Scan(scan_name)

+ for include_file in includes:

+ self.PushIfNew(include_file)

+ scan_name = self.PopIfAvail()

+ sorted_list = sorted(self.added_set)

+ for pathname in sorted_list:

+ print pathname

+def Main(argv):

+ resolver = Resolver()

+ files = []

+ failed = False

+ for arg in argv[1:]:

+ if len(arg) > 2 and arg[0:2] == '-I':

bradn 2011/09/26 06:21:57 Style guide calls for consistent use of single or

bradn 2011/09/26 06:21:57 I'm all for avoiding optparse when possible, but I

noelallen1 2011/09/30 01:12:45 Done.

+ if not resolver.AddDir(arg[2:]):

+ print "Failed to add path: %s" % arg[2:]

+ failed = True

+ else:

+ files.append(arg)

+ if failed: return -1

+ workQ = WorkQueue(resolver)

+ for filename in files:

+ workQ.PushIfNew(filename)

+ workQ.Run()

+ return 0

bradn 2011/09/26 06:21:57 Style guide is a little vague, but most places see

noelallen1 2011/09/30 01:12:45 Done.

+if __name__ == '__main__':

+ sys.exit(Main(sys.argv))

bradn 2011/09/26 06:21:57 Drop trailing line.

noelallen1 2011/09/30 01:12:45 Done.

Property changes on: build/scan_sources.py

___________________________________________________________________

Added: svn:executable

+ *

« no previous file with comments | « no previous file | no next file » | no next file with comments »