| 
 | 
 | 
 Chromium Code Reviews
 Chromium Code Reviews Issue 
            2905263002:
    Filter added to prune files before applying network annotation extractor.  (Closed)
    
  
    Issue 
            2905263002:
    Filter added to prune files before applying network annotation extractor.  (Closed) 
  | Created: 3 years, 7 months ago by Ramin Halavati Modified: 3 years, 6 months ago CC: chromium-reviews, battre Target Ref: refs/heads/master Project: chromium Visibility: Public. | DescriptionFilter added to prune files before applying network annotation extractor.
Network traffic annotation extractor is too slow for unitttests. A filter
based on simple keyword matching on file content is added to remove unrelated
files.
BUG=656607
Review-Url: https://codereview.chromium.org/2905263002
Cr-Commit-Position: refs/heads/master@{#476004}
Committed: https://chromium.googlesource.com/chromium/src/+/18d4688f0d4f8075a9cfaf2d31e7b987336921a9
   Patch Set 1 #
      Total comments: 54
      
     Patch Set 2 : Comments addressed. #
      Total comments: 4
      
     Patch Set 3 : Comments addressed. #Patch Set 4 : run_tool.py reverted. #
 Messages
    Total messages: 27 (15 generated)
     
 Patchset #1 (id:1) has been deleted 
 Patchset #1 (id:20001) has been deleted 
 Patchset #1 (id:40001) has been deleted 
 Patchset #1 (id:60001) has been deleted 
 Patchset #1 (id:80001) has been deleted 
 rhalavati@chromium.org changed reviewers: + dcheng@chromium.org, msramek@chromium.org 
 Daniel, Martin, Running the clang tool on the whole repository took more than 40 minutes on my Z840 which was too slow for unit testing. This CL reduces it to less than 1 minute by text matching files to the patterns that should be in relevent files and passing the list to run_tool.py. I will added tests later to periodically compare the results of auditor with and without this pre-filtering to ensure it's working correctly. Please review. 
 On 2017/05/30 13:50:22, Ramin Halavati wrote: > Daniel, Martin, > > Running the clang tool on the whole repository took more than 40 minutes on my > Z840 which was too slow for unit testing. This CL reduces it to less than 1 > minute by text matching files to the patterns that should be in relevent files > and passing the list to run_tool.py. > I will added tests later to periodically compare the results of auditor with and > without this pre-filtering to ensure it's working correctly. > > Please review. run_tool.py already allows filtering based on command-line arguments doesn't it? What does --source-filenames add? 
 On 2017/05/30 19:49:16, dcheng wrote: > On 2017/05/30 13:50:22, Ramin Halavati wrote: > > Daniel, Martin, > > > > Running the clang tool on the whole repository took more than 40 minutes on my > > Z840 which was too slow for unit testing. This CL reduces it to less than 1 > > minute by text matching files to the patterns that should be in relevent files > > and passing the list to run_tool.py. > > I will added tests later to periodically compare the results of auditor with > and > > without this pre-filtering to ensure it's working correctly. > > > > Please review. > > run_tool.py already allows filtering based on command-line arguments doesn't it? > What does --source-filenames add? Yes, but I don't think that the set that I want to give it can be producible by command line arguments. From what I understand from run_tool.py, we can either have all files from git or all files from CompileDB, but I want to select a fraction of them based on the keywords, and give them to run_tool.py to run them in parallel. Can it be done by current arguments? 
 I only looked at traffic_annotation/, I'm leaving the other file to Daniel. The code seems correct, but I left a bunch of comments. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/annotation_relevent_filter.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:2: # Copyright (c) 2017 The Chromium Authors. All rights reserved. nit: We don't use (c) anymore. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:14: def NormalizePath(path): nit: make this also private? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:20: """Returns set of normalized paths to subdirectories containing sources nit: the set https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:25: git_cmd = 'git.bat' if os.name == 'nt' else 'git' Where does this come from? Can you add a comment with a link to a documentation which states that this is how the command looks on different platforms? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:63: 'SSLClientSocket', # SSLClientSocket:: style: two spaces before comments https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:79: base_directory: str Local path to root of checkout, e.g. C:\chr\src. The description says local path, but the example is an absolute one. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:83: os.path.join(os.path.dirname(__file__), Can you document this part (i.e. what happens if base_directory is None)? Also explain why we're going two directories up. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:84: os.path.pardir, os.path.pardir)) style: should be offset +4 from the beginning of the statement, which is os.path.join https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:90: elif os.path.exists(os.path.join(base_directory, '.svn')): Why do we need to support SVN? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:93: raise "%s is not a repository root" % base_directory Ditto - please be consistent with the single and double quotes. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:95: self.content_matcher = re.compile(".*(" + "|".join(self.KEYWORDS) + ").*") The keywords contain "." which will be interpreted as any character. We should assert that the keywords themselves are [A-Za-z.:_]+, and escape the "." into "\.". https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:98: '.*(\.cc|\.mm)') nit: Maybe add ^$? This will currently match "file.mmmmmmmmmmmmmmm". https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:101: def FileHasReleventContent(self, filename): s/relevent/relevant/ here and elsewhere, including the filename https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:114: directories would also be serached. typo: searched https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:2: # Copyright (c) 2017 The Chromium Authors. All rights reserved. Ditto. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:16: from annotation_relevent_filter import NetworkTrafficAnnotationFileFilter How about renaming the module to traffic_annotation_file_filter, and the class to TrafficAnnotationFileFilter? That way they're consistent with each other, and with this file. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:73: source_filenames = tempfile.NamedTemporaryFile(mode='w+t', delete=False) nit: please be consistent in the usage of single / double quotes https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:81: args[-1] = "--source-filenames=" + source_filenames.name Adding None and then optionally overwriting it seems strange. Why not just append at the end? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:93: stderr=subprocess.PIPE) style: offset https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:94: stdout_text, stderr_text = command.communicate() Why is this part duplicated? It can be extracted after the if-else statement. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:98: return raw_annotations Maybe return null if command exited with an error? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:299: args.path_filters if args.path_filters else [""], Can't this [""] be the "default" argument to argparse? 
 Patchset #2 (id:120001) has been deleted 
 Thanks Martin, all comments addressed, please review. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/annotation_relevent_filter.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:2: # Copyright (c) 2017 The Chromium Authors. All rights reserved. On 2017/05/31 11:31:36, msramek wrote: > nit: We don't use (c) anymore. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:14: def NormalizePath(path): On 2017/05/31 11:31:35, msramek wrote: > nit: make this also private? Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:20: """Returns set of normalized paths to subdirectories containing sources On 2017/05/31 11:31:35, msramek wrote: > nit: the set Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:25: git_cmd = 'git.bat' if os.name == 'nt' else 'git' On 2017/05/31 11:31:35, msramek wrote: > Where does this come from? Can you add a comment with a link to a documentation > which states that this is how the command looks on different platforms? Comment added in docstring. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:63: 'SSLClientSocket', # SSLClientSocket:: On 2017/05/31 11:31:35, msramek wrote: > style: two spaces before comments Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:79: base_directory: str Local path to root of checkout, e.g. C:\chr\src. On 2017/05/31 11:31:36, msramek wrote: > The description says local path, but the example is an absolute one. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:83: os.path.join(os.path.dirname(__file__), On 2017/05/31 11:31:35, msramek wrote: > Can you document this part (i.e. what happens if base_directory is None)? > > Also explain why we're going two directories up. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:84: os.path.pardir, os.path.pardir)) On 2017/05/31 11:31:35, msramek wrote: > style: should be offset +4 from the beginning of the statement, which is > os.path.join Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:90: elif os.path.exists(os.path.join(base_directory, '.svn')): On 2017/05/31 11:31:35, msramek wrote: > Why do we need to support SVN? This part is also copied fro builddeps.py. Is it extra? Should I ask them? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:93: raise "%s is not a repository root" % base_directory On 2017/05/31 11:31:36, msramek wrote: > Ditto - please be consistent with the single and double quotes. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:95: self.content_matcher = re.compile(".*(" + "|".join(self.KEYWORDS) + ").*") On 2017/05/31 11:31:35, msramek wrote: > The keywords contain "." which will be interpreted as any character. We should > assert that the keywords themselves are [A-Za-z.:_]+, and escape the "." into > "\.". Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:98: '.*(\.cc|\.mm)') On 2017/05/31 11:31:35, msramek wrote: > nit: Maybe add ^$? This will currently match "file.mmmmmmmmmmmmmmm". $ added, ^ is required? https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:101: def FileHasReleventContent(self, filename): On 2017/05/31 11:31:35, msramek wrote: > s/relevent/relevant/ > > here and elsewhere, including the filename Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:114: directories would also be serached. On 2017/05/31 11:31:36, msramek wrote: > typo: searched Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:2: # Copyright (c) 2017 The Chromium Authors. All rights reserved. On 2017/05/31 11:31:36, msramek wrote: > Ditto. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:16: from annotation_relevent_filter import NetworkTrafficAnnotationFileFilter On 2017/05/31 11:31:36, msramek wrote: > How about renaming the module to traffic_annotation_file_filter, and the class > to TrafficAnnotationFileFilter? That way they're consistent with each other, and > with this file. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:73: source_filenames = tempfile.NamedTemporaryFile(mode='w+t', delete=False) On 2017/05/31 11:31:36, msramek wrote: > nit: please be consistent in the usage of single / double quotes Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:81: args[-1] = "--source-filenames=" + source_filenames.name On 2017/05/31 11:31:36, msramek wrote: > Adding None and then optionally overwriting it seems strange. Why not just > append at the end? Because when prefilter_files is not selected, this value is replaced for each provided path. I can remove None and keep adding and removing it in the else part. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:93: stderr=subprocess.PIPE) On 2017/05/31 11:31:36, msramek wrote: > style: offset Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:94: stdout_text, stderr_text = command.communicate() On 2017/05/31 11:31:36, msramek wrote: > Why is this part duplicated? It can be extracted after the if-else statement. In the else part, we call it separately for each path_filter and concatenate the results. I could add a helper function or collect them in a set and execute them without duplicate code, but don't know which is neater. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:98: return raw_annotations On 2017/05/31 11:31:36, msramek wrote: > Maybe return null if command exited with an error? Clang issues some errors for files that are not part of compiledb, but we don't care about them. I can remove printing stderr, but thought it may attract some attention sometime. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:299: args.path_filters if args.path_filters else [""], On 2017/05/31 11:31:36, msramek wrote: > Can't this [""] be the "default" argument to argparse? Done. 
 LGTM with more comments. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/annotation_relevent_filter.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:25: git_cmd = 'git.bat' if os.name == 'nt' else 'git' On 2017/05/31 12:28:24, Ramin Halavati wrote: > On 2017/05/31 11:31:35, msramek wrote: > > Where does this come from? Can you add a comment with a link to a > documentation > > which states that this is how the command looks on different platforms? > > Comment added in docstring. I hoped to be able to point to some documentation that explains it. If you copied this from another tool, and the original does not have such a documentation, then I guess we'll have to trust it. This comment is not really relevant and can get easily outdated, so I would just remove it. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:90: elif os.path.exists(os.path.join(base_directory, '.svn')): On 2017/05/31 12:28:25, Ramin Halavati wrote: > On 2017/05/31 11:31:35, msramek wrote: > > Why do we need to support SVN? > > This part is also copied fro builddeps.py. Is it extra? Should I ask them? Chromium used to support both SVN and GIT back in the day, but I think that's been deprecated for at least a year. Even if there was a way to obtain an SVN checkout, I don't we're going to run these scripts on it. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:98: '.*(\.cc|\.mm)') On 2017/05/31 12:28:24, Ramin Halavati wrote: > On 2017/05/31 11:31:35, msramek wrote: > > nit: Maybe add ^$? This will currently match "file.mmmmmmmmmmmmmmm". > > $ added, ^ is required? I'm aware that re.match() starts matching from the beginning, but I typically prefer to add "^" when I add "$" to make it obvious to the reader that I want an exact string. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:81: args[-1] = "--source-filenames=" + source_filenames.name On 2017/05/31 12:28:25, Ramin Halavati wrote: > On 2017/05/31 11:31:36, msramek wrote: > > Adding None and then optionally overwriting it seems strange. Why not just > > append at the end? > > Because when prefilter_files is not selected, this value is replaced for each > provided path. I can remove None and keep adding and removing it in the else > part. Oh, I see. Could you just comment on the line 70? Something like: None] # Placeholder for a path argument. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:98: return raw_annotations On 2017/05/31 12:28:25, Ramin Halavati wrote: > On 2017/05/31 11:31:36, msramek wrote: > > Maybe return null if command exited with an error? > > Clang issues some errors for files that are not part of compiledb, but we don't > care about them. I can remove printing stderr, but thought it may attract some > attention sometime. Acknowledged. Then keep it as it is :) https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:65: "--generate-compdb", nit: Still a different type of quotes. https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_file_filter.py (right): https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_file_filter.py:99: assert(all(not re.match('.*[^A-Za-z:_].*', keyword) nit: I'd suggest avoiding the double negative - that the keywords must not contain characters other than... re.match('^[A-Za-z:_]+$') 
 On 2017/05/30 20:37:40, Ramin Halavati wrote: > On 2017/05/30 19:49:16, dcheng wrote: > > On 2017/05/30 13:50:22, Ramin Halavati wrote: > > > Daniel, Martin, > > > > > > Running the clang tool on the whole repository took more than 40 minutes on > my > > > Z840 which was too slow for unit testing. This CL reduces it to less than 1 > > > minute by text matching files to the patterns that should be in relevent > files > > > and passing the list to run_tool.py. > > > I will added tests later to periodically compare the results of auditor with > > and > > > without this pre-filtering to ensure it's working correctly. > > > > > > Please review. > > > > run_tool.py already allows filtering based on command-line arguments doesn't > it? > > What does --source-filenames add? > > Yes, but I don't think that the set that I want to give it can be producible by > command line arguments. > From what I understand from run_tool.py, we can either have all files from git > or all files from CompileDB, but I want to select a fraction of them based on > the keywords, and give them to run_tool.py to run them in parallel. Can it be > done by current arguments? run_tool.py --tool <tool> -C out/Debug base/at_exit.cc base/base64.cc should run it on just those two files. 
 On 2017/05/31 17:12:37, dcheng wrote: > On 2017/05/30 20:37:40, Ramin Halavati wrote: > > On 2017/05/30 19:49:16, dcheng wrote: > > > On 2017/05/30 13:50:22, Ramin Halavati wrote: > > > > Daniel, Martin, > > > > > > > > Running the clang tool on the whole repository took more than 40 minutes > on > > my > > > > Z840 which was too slow for unit testing. This CL reduces it to less than > 1 > > > > minute by text matching files to the patterns that should be in relevent > > files > > > > and passing the list to run_tool.py. > > > > I will added tests later to periodically compare the results of auditor > with > > > and > > > > without this pre-filtering to ensure it's working correctly. > > > > > > > > Please review. > > > > > > run_tool.py already allows filtering based on command-line arguments doesn't > > it? > > > What does --source-filenames add? > > > > Yes, but I don't think that the set that I want to give it can be producible > by > > command line arguments. > > From what I understand from run_tool.py, we can either have all files from git > > or all files from CompileDB, but I want to select a fraction of them based on > > the keywords, and give them to run_tool.py to run them in parallel. Can it be > > done by current arguments? > > run_tool.py --tool <tool> -C out/Debug base/at_exit.cc base/base64.cc > > should run it on just those two files. I need to run it on around 600 files in parallel. Is it OK to pass them all (with their paths) as command line? 
 Thank you Martin, all comments addressed. Thank you Daniel, you were right, I reverted run_tool.py https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/annotation_relevent_filter.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:25: git_cmd = 'git.bat' if os.name == 'nt' else 'git' On 2017/05/31 14:55:24, msramek wrote: > On 2017/05/31 12:28:24, Ramin Halavati wrote: > > On 2017/05/31 11:31:35, msramek wrote: > > > Where does this come from? Can you add a comment with a link to a > > documentation > > > which states that this is how the command looks on different platforms? > > > > Comment added in docstring. > > I hoped to be able to point to some documentation that explains it. If you > copied this from another tool, and the original does not have such a > documentation, then I guess we'll have to trust it. > > This comment is not really relevant and can get easily outdated, so I would just > remove it. Function Replaced! https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:90: elif os.path.exists(os.path.join(base_directory, '.svn')): On 2017/05/31 14:55:24, msramek wrote: > On 2017/05/31 12:28:25, Ramin Halavati wrote: > > On 2017/05/31 11:31:35, msramek wrote: > > > Why do we need to support SVN? > > > > This part is also copied fro builddeps.py. Is it extra? Should I ask them? > > Chromium used to support both SVN and GIT back in the day, but I think that's > been deprecated for at least a year. Even if there was a way to obtain an SVN > checkout, I don't we're going to run these scripts on it. Acknowledged. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/annotation_relevent_filter.py:98: '.*(\.cc|\.mm)') On 2017/05/31 14:55:24, msramek wrote: > On 2017/05/31 12:28:24, Ramin Halavati wrote: > > On 2017/05/31 11:31:35, msramek wrote: > > > nit: Maybe add ^$? This will currently match "file.mmmmmmmmmmmmmmm". > > > > $ added, ^ is required? > > I'm aware that re.match() starts matching from the beginning, but I typically > prefer to add "^" when I add "$" to make it obvious to the reader that I want an > exact string. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:81: args[-1] = "--source-filenames=" + source_filenames.name On 2017/05/31 14:55:24, msramek wrote: > On 2017/05/31 12:28:25, Ramin Halavati wrote: > > On 2017/05/31 11:31:36, msramek wrote: > > > Adding None and then optionally overwriting it seems strange. Why not just > > > append at the end? > > > > Because when prefilter_files is not selected, this value is replaced for each > > provided path. I can remove None and keep adding and removing it in the else > > part. > > Oh, I see. Could you just comment on the line 70? Something like: > > None] # Placeholder for a path argument. Done. https://codereview.chromium.org/2905263002/diff/100001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:98: return raw_annotations On 2017/05/31 14:55:24, msramek wrote: > On 2017/05/31 12:28:25, Ramin Halavati wrote: > > On 2017/05/31 11:31:36, msramek wrote: > > > Maybe return null if command exited with an error? > > > > Clang issues some errors for files that are not part of compiledb, but we > don't > > care about them. I can remove printing stderr, but thought it may attract some > > attention sometime. > > Acknowledged. Then keep it as it is :) Done. https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_auditor.py (right): https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_auditor.py:65: "--generate-compdb", On 2017/05/31 14:55:24, msramek wrote: > nit: Still a different type of quotes. Replaced all quotes in this file with ". https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... File tools/traffic_annotation/auditor/traffic_annotation_file_filter.py (right): https://codereview.chromium.org/2905263002/diff/140001/tools/traffic_annotati... tools/traffic_annotation/auditor/traffic_annotation_file_filter.py:99: assert(all(not re.match('.*[^A-Za-z:_].*', keyword) On 2017/05/31 14:55:24, msramek wrote: > nit: I'd suggest avoiding the double negative - that the keywords must not > contain characters other than... > > re.match('^[A-Za-z:_]+$') Done. 
 The CQ bit was checked by rhalavati@chromium.org to run a CQ dry run 
 Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or... 
 The CQ bit was unchecked by commit-bot@chromium.org 
 Dry run: This issue passed the CQ dry run. 
 On 2017/05/31 17:15:36, Ramin Halavati wrote: > On 2017/05/31 17:12:37, dcheng wrote: > > On 2017/05/30 20:37:40, Ramin Halavati wrote: > > > On 2017/05/30 19:49:16, dcheng wrote: > > > > On 2017/05/30 13:50:22, Ramin Halavati wrote: > > > > > Daniel, Martin, > > > > > > > > > > Running the clang tool on the whole repository took more than 40 minutes > > on > > > my > > > > > Z840 which was too slow for unit testing. This CL reduces it to less > than > > 1 > > > > > minute by text matching files to the patterns that should be in relevent > > > files > > > > > and passing the list to run_tool.py. > > > > > I will added tests later to periodically compare the results of auditor > > with > > > > and > > > > > without this pre-filtering to ensure it's working correctly. > > > > > > > > > > Please review. > > > > > > > > run_tool.py already allows filtering based on command-line arguments > doesn't > > > it? > > > > What does --source-filenames add? > > > > > > Yes, but I don't think that the set that I want to give it can be producible > > by > > > command line arguments. > > > From what I understand from run_tool.py, we can either have all files from > git > > > or all files from CompileDB, but I want to select a fraction of them based > on > > > the keywords, and give them to run_tool.py to run them in parallel. Can it > be > > > done by current arguments? > > > > run_tool.py --tool <tool> -C out/Debug base/at_exit.cc base/base64.cc > > > > should run it on just those two files. > > I need to run it on around 600 files in parallel. Is it OK to pass them all > (with their paths) as command line? It's possible we'll run into command-line limits at some point, but that hasn't happened yet =) 
 The CQ bit was checked by rhalavati@chromium.org 
 The patchset sent to the CQ was uploaded after l-g-t-m from msramek@chromium.org Link to the patchset: https://codereview.chromium.org/2905263002/#ps180001 (title: "run_tool.py reverted.") 
 CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or... 
 CQ is committing da patch.
Bot data: {"patchset_id": 180001, "attempt_start_ts": 1496261768982170,
"parent_rev": "d9279f0c0358aa6492a7a271e651eabe5e0f5086", "commit_rev":
"18d4688f0d4f8075a9cfaf2d31e7b987336921a9"}
 
            
              
                Message was sent while issue was closed.
              
            
             Description was changed from ========== Filter added to prune files before applying network annotation extractor. Network traffic annotation extractor is too slow for unitttests. A filter based on simple keyword matching on file content is added to remove unrelated files. BUG=656607 ========== to ========== Filter added to prune files before applying network annotation extractor. Network traffic annotation extractor is too slow for unitttests. A filter based on simple keyword matching on file content is added to remove unrelated files. BUG=656607 Review-Url: https://codereview.chromium.org/2905263002 Cr-Commit-Position: refs/heads/master@{#476004} Committed: https://chromium.googlesource.com/chromium/src/+/18d4688f0d4f8075a9cfaf2d31e7... ========== 
 
            
              
                Message was sent while issue was closed.
              
            
             Committed patchset #4 (id:180001) as https://chromium.googlesource.com/chromium/src/+/18d4688f0d4f8075a9cfaf2d31e7... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
