|
|
Created:
7 years, 11 months ago by mnaganov (inactive) Modified:
7 years, 10 months ago CC:
chromium-reviews, android-webview-reviews_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionMake webview_licenses.py script to use licensecheck.pl
webview_licenses.py now uses licensecheck.pl for finding 3rd-party
code outside of third_party directories. This allows us to share the
tool code with Chromium and make the checking process more intelligent
in order to avoid false positives.
licensecheck.pl updated:
- to add back copyrights discovery;
- to fix generated files criteria;
- to optimize for execution speed.
BUG=161461
Patch Set 1 #Patch Set 2 : Small cleanups #
Total comments: 22
Patch Set 3 : Comments addressed #
Total comments: 1
Messages
Total messages: 8 (0 generated)
Let's try the updated copyright finder code on webview bots first. If it will be working with no significant flakes, we might migrate it to the main linux bot.
https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... File android_webview/tools/third_party_files_whitelist.txt (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... android_webview/tools/third_party_files_whitelist.txt:154: # !!! a real issue !!! How about a bug *and* a TODO? https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... File android_webview/tools/webview_licenses.py (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... android_webview/tools/webview_licenses.py:106: '--check', '\.(c(c|pp|xx)?|h(h|pp|xx)?|p(l|m)|xs|sh|php|py(|x)|rb' \ Do you need to use --check? Why? https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:296: unless ($. > $opt_lines) { This change seems pretty invasive compared to just adding a regex. What are you trying to do? https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:344: my $line = $_[0]; This seems pretty invasive too - what are you trying to do? https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:451: # The regexp is slow. First, do a quick check using index. Is this really needed?
https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... File android_webview/tools/third_party_files_whitelist.txt (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... android_webview/tools/third_party_files_whitelist.txt:154: # !!! a real issue !!! On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > How about a bug *and* a TODO? Done. https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... File android_webview/tools/webview_licenses.py (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... android_webview/tools/webview_licenses.py:106: '--check', '\.(c(c|pp|xx)?|h(h|pp|xx)?|p(l|m)|xs|sh|php|py(|x)|rb' \ On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > Do you need to use --check? Why? Because the pattern in licensecheck.pl apparently misses "js|html|pac|mm|asm|idl" https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:296: unless ($. > $opt_lines) { On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > This change seems pretty invasive compared to just adding a regex. What are you > trying to do? I'm feeding all the lines (not only up to $opt_lines) to the copyrights detector. $opt_lines still restricts the number of lines used for license detection. If $opt_copyright isn't set, the behavior is unchanged compared to your modified version. We need to look for copyrights in the whole file, since it can contain a copyrighted function somewhere in the middle. But remember, that we only need to look for copyrights outside of third_party dirs of the script. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:344: my $line = $_[0]; On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > This seems pretty invasive too - what are you trying to do? Just speeding things up. Since for Chromium, about 44M lines of code are passing through this check, matching becomes a bottleneck (found that using NYTDprof). Using index for early bailout significantly reduces the processing time. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:451: # The regexp is slow. First, do a quick check using index. On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > Is this really needed? Yes. According to the profiler, this expression matches really slow (not sure why, though). Using index, the bottleneck was removed.
https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... File android_webview/tools/third_party_files_whitelist.txt (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/thir... android_webview/tools/third_party_files_whitelist.txt:154: # !!! a real issue !!! On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > How about a bug *and* a TODO? > > Done. Please upload an updated patch set. https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... File android_webview/tools/webview_licenses.py (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... android_webview/tools/webview_licenses.py:106: '--check', '\.(c(c|pp|xx)?|h(h|pp|xx)?|p(l|m)|xs|sh|php|py(|x)|rb' \ On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > Do you need to use --check? Why? > > Because the pattern in licensecheck.pl apparently misses > "js|html|pac|mm|asm|idl" Why not update the pattern in licensecheck.pl then? :) https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:296: unless ($. > $opt_lines) { On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > This change seems pretty invasive compared to just adding a regex. What are > you > > trying to do? > > I'm feeding all the lines (not only up to $opt_lines) to the copyrights > detector. $opt_lines still restricts the number of lines used for license > detection. If $opt_copyright isn't set, the behavior is unchanged compared to > your modified version. > > We need to look for copyrights in the whole file, since it can contain a > copyrighted function somewhere in the middle. But remember, that we only need to > look for copyrights outside of third_party dirs of the script. If that's the case, could you rather implement that in your Android-specific script instead of here? It seems that this change goes beyond what licensecheck.pl was supposed to do. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:344: my $line = $_[0]; On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > This seems pretty invasive too - what are you trying to do? > > Just speeding things up. Since for Chromium, about 44M lines of code are passing > through this check, matching becomes a bottleneck (found that using NYTDprof). > Using index for early bailout significantly reduces the processing time. Please include difference in run time when speed is the reason. I think this would be better done in Android-specific script. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:451: # The regexp is slow. First, do a quick check using index. On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > Is this really needed? > > Yes. According to the profiler, this expression matches really slow (not sure > why, though). Using index, the bottleneck was removed. What is the difference in run time?
Thanks for the comments! Sorry for not updating the patch for the first time--I've got distracted and didn't finish the update. https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... File android_webview/tools/webview_licenses.py (right): https://codereview.chromium.org/12047113/diff/5001/android_webview/tools/webv... android_webview/tools/webview_licenses.py:106: '--check', '\.(c(c|pp|xx)?|h(h|pp|xx)?|p(l|m)|xs|sh|php|py(|x)|rb' \ On 2013/01/28 16:28:29, Paweł Hajdan Jr. wrote: > On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > > Do you need to use --check? Why? > > > > Because the pattern in licensecheck.pl apparently misses > > "js|html|pac|mm|asm|idl" > > Why not update the pattern in licensecheck.pl then? :) I was considering these additions to be Chromium-specific. Moved to licensecheck.pl. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:296: unless ($. > $opt_lines) { On 2013/01/28 16:28:29, Paweł Hajdan Jr. wrote: > On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > > This change seems pretty invasive compared to just adding a regex. What are > > you > > > trying to do? > > > > I'm feeding all the lines (not only up to $opt_lines) to the copyrights > > detector. $opt_lines still restricts the number of lines used for license > > detection. If $opt_copyright isn't set, the behavior is unchanged compared to > > your modified version. > > > > We need to look for copyrights in the whole file, since it can contain a > > copyrighted function somewhere in the middle. But remember, that we only need > to > > look for copyrights outside of third_party dirs of the script. > > If that's the case, could you rather implement that in your Android-specific > script instead of here? It seems that this change goes beyond what > licensecheck.pl was supposed to do. Finding copyrighted code isn't Android-specific (I believe, we had agreed on this before). The original script was also capable of doing this (with --copyright flag in place). My changes are: - to scan the whole file instead of the number of '--lines'. Perhaps, we can generalize that by introducing '--copyright-lines' argument that will default to '--lines', so the original behavior could be preserved; - to avoid some false positives by cutting out C/C++ string contents, as they can contain the 'copyright' word. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:344: my $line = $_[0]; On 2013/01/28 16:28:29, Paweł Hajdan Jr. wrote: > On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > > This seems pretty invasive too - what are you trying to do? > > > > Just speeding things up. Since for Chromium, about 44M lines of code are > passing > > through this check, matching becomes a bottleneck (found that using NYTDprof). > > Using index for early bailout significantly reduces the processing time. > > Please include difference in run time when speed is the reason. > > I think this would be better done in Android-specific script. Not sure, what do you mean by "Android-specific script". My intention is to reuse code as much as possible. There is nothing Android-specific in this optimization. The optimization reduces copyrights scan time from ~22 sec to ~12 sec with populated file cache (Z620). To me, this looks like a significant win--almost twice as fast. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:451: # The regexp is slow. First, do a quick check using index. On 2013/01/28 16:28:29, Paweł Hajdan Jr. wrote: > On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > > Is this really needed? > > > > Yes. According to the profiler, this expression matches really slow (not sure > > why, though). Using index, the bottleneck was removed. > > What is the difference in run time? For the Android use case (scan excluding third-party dirs) this reduces scan time from ~19 sec to ~12 sec (-1/3 of time), with populated cache on Z620. For the Chromium use case (scan everything) this reduces scan time from ~46 sec to ~24 sec with the same conditions. I have started to optimize things out, because the original method we were using (with plain grep) was taking about 3 sec on a hot cache (although, as you can see, it had unacceptably more false positives).
https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:296: unless ($. > $opt_lines) { On 2013/01/28 17:12:38, Mikhail Naganov (Chromium) wrote: > Finding copyrighted code isn't Android-specific (I believe, we had agreed on > this before). The original script was also capable of doing this (with > --copyright flag in place). My changes are: > > - to scan the whole file instead of the number of '--lines'. Perhaps, we can > generalize that by introducing '--copyright-lines' argument that will default to > '--lines', so the original behavior could be preserved; > > - to avoid some false positives by cutting out C/C++ string contents, as they > can contain the 'copyright' word. And this is way beyond what the original licensecheck.pl was doing. I still think this belongs in an Android-specific file, even though licensecheck.pl has some *basic* capability of printing copyright info. https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:344: my $line = $_[0]; On 2013/01/28 17:12:38, Mikhail Naganov (Chromium) wrote: > Not sure, what do you mean by "Android-specific script". My intention is to > reuse code as much as possible. There is nothing Android-specific in this > optimization. Changes should be upstreamable though. > The optimization reduces copyrights scan time from ~22 sec to ~12 sec with > populated file cache (Z620). To me, this looks like a significant win--almost > twice as fast. IMHO not worth it. If you go the Android-specific file path as I suggested, feel free to do it the way you do here, saving the time. I'm fine with that approach, just not in licensecheck.pl . https://codereview.chromium.org/12047113/diff/5001/third_party/devscripts/lic... third_party/devscripts/licensecheck.pl:451: # The regexp is slow. First, do a quick check using index. On 2013/01/28 17:12:38, Mikhail Naganov (Chromium) wrote: > On 2013/01/28 16:28:29, Paweł Hajdan Jr. wrote: > > On 2013/01/28 14:51:54, Mikhail Naganov (Chromium) wrote: > > > On 2013/01/25 17:54:33, Paweł Hajdan Jr. wrote: > > > > Is this really needed? > > > > > > Yes. According to the profiler, this expression matches really slow (not > sure > > > why, though). Using index, the bottleneck was removed. > > > > What is the difference in run time? > > For the Android use case (scan excluding third-party dirs) this reduces scan > time from ~19 sec to ~12 sec (-1/3 of time), with populated cache on Z620. > > For the Chromium use case (scan everything) this reduces scan time from ~46 sec > to ~24 sec with the same conditions. > > I have started to optimize things out, because the original method we were using > (with plain grep) was taking about 3 sec on a hot cache (although, as you can > see, it had unacceptably more false positives). Again, not worth it. Each of these modifications has a maintenance cost, and a minute-long build step is already fine. Taking a few seconds away is a negligible win, even if it looks like a high percentage. The base value is pretty low anyway. https://codereview.chromium.org/12047113/diff/13001/third_party/devscripts/li... File third_party/devscripts/licensecheck.pl (right): https://codereview.chromium.org/12047113/diff/13001/third_party/devscripts/li... third_party/devscripts/licensecheck.pl:167: my $default_check_regex = '\.(c(c|pp|xx)?|h(h|pp|xx)?|f(77|90)?|p(l|m)|xs|sh|php|py(|x)|rb|java|vala|el|sc(i|e)|cs|pas|inc|dtd|xsl|mod|m|tex|mli?|js|html|pac|mm|asm|idl)$'; nit: Please move "mm" case near existing "m", changing it to "mm?".
Message was sent while issue was closed.
Pawel, thanks for the feedback! After moving out Android-specific copyright scanning into a separate script, this is the only change to licensecheck.pl left: my $default_check_regex = '\.(asm|c(c|pp|xx)?|h(h|pp|xx)?|f(77|90)?|p(l|m)|xs|sh|php|py(|x)|rb|idl|java|vala|el|sc(i|e)|cs|pas|inc|dtd|xsl|js|html|pac|mod|mm?|tex|mli?)$'; Which reveals a lot of .html and .js files in Chromium that doesn't contain a proper license. I don't have bandwidth now to raise bugs and whitelist files, and without that the changelist will be incomplete, so I'm abandoning this CL and will add an Android-specific script separately. Thanks again for your comments! |