|
|
Created:
5 years, 11 months ago by Alexei Svitkine (slow) Modified:
5 years, 11 months ago Reviewers:
Ilya Sherman CC:
chromium-reviews, asvitkine+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionImprove script to find user actions.
Previously, it didn't handle UserActionMetrics spanning
multiple lines which this addresses and as well as improving
the warning output produced.
Also adds some tests.
BUG=446197
Committed: https://crrev.com/3f4694f41eb28cada3f2e23161c444392dc2cbfe
Cr-Commit-Position: refs/heads/master@{#311582}
Patch Set 1 : #
Total comments: 8
Patch Set 2 : Address comments. #
Total comments: 6
Patch Set 3 : Fix nit. #Patch Set 4 : Address comments. #
Total comments: 12
Patch Set 5 : Address more comments. #
Messages
Total messages: 18 (3 generated)
Patchset #1 (id:1) has been deleted
asvitkine@chromium.org changed reviewers: + isherman@chromium.org
The whole point of the base::UserMetricsAction wrapper is to help the script find metrics. I'm worried that if we try to make the script too smart, we'll end up either (a) making it not behave quite correctly, or (b) implementing a C++ parser. If we want a really smart script, we could probably write some sort of clang-based tool, and eliminate base::UserMetricsAction. Specifically, for this CL, I think skipping over whitespace is probably fine, but doing anything more complex -- like looking for ternary operators -- is likely to cause misparses IMO. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:44: USER_METRICS_ACTION_RE = re.compile(r'UserMetricsAction\(\s*(.+?)\)') It looks like you've changed the regexes quite a bit. Why did you drop the [^a-zA-Z] check, for example? Also, are you sure that ".+" is correct? I think that would allow the match to go past any parens, e.g. to match """UserMetricsAction("I'm an action!"), "I'm some other parameter.");""" https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:44: USER_METRICS_ACTION_RE = re.compile(r'UserMetricsAction\(\s*(.+?)\)') nit: I think it might be worth using the verbose regex format for this guy, and document the individual pieces, since it's a little bit complex. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:424: The list action names (strings) that was found. Please update this -- it looks like there are two values returned. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions_test.py (right): https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions_test.py:202: code = 'base::UserMetricsAction(foo ? "Foo.Bar" : "Bar.Foo"));' I'm not convinced that we want to support this. Instead, IMO, this should be written as """foo ? base::UserMetricsAction("Foo.Bar") : base::UserMetricsAction("Bar.Foo")""".
On 2015/01/09 22:25:44, Ilya Sherman wrote: > The whole point of the base::UserMetricsAction wrapper is to help the script > find metrics. I'm worried that if we try to make the script too smart, we'll > end up either (a) making it not behave quite correctly, or (b) implementing a > C++ parser. If we want a really smart script, we could probably write some sort > of clang-based tool, and eliminate base::UserMetricsAction. > > Specifically, for this CL, I think skipping over whitespace is probably fine, > but doing anything more complex -- like looking for ternary operators -- is > likely to cause misparses IMO. I agree that we shouldn't try to make too smart. I actually don't have code that tries to look specifically for ternary operators and just try to find all quoted strings with the ()'s which seemed good enough as a heuristic that seems to work in practice for the current offenders in the codebase without trying to make a C++ parser. Let me know if you still prefer we https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:44: USER_METRICS_ACTION_RE = re.compile(r'UserMetricsAction\(\s*(.+?)\)') On 2015/01/09 22:25:44, Ilya Sherman wrote: > It looks like you've changed the regexes quite a bit. Why did you drop the > [^a-zA-Z] check, for example? I've re-added it now. > Also, are you sure that ".+" is correct? I think > that would allow the match to go past any parens, e.g. to match > """UserMetricsAction("I'm an action!"), "I'm some other parameter.");""" I added a test case for your example and it works correctly. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:44: USER_METRICS_ACTION_RE = re.compile(r'UserMetricsAction\(\s*(.+?)\)') On 2015/01/09 22:25:43, Ilya Sherman wrote: > nit: I think it might be worth using the verbose regex format for this guy, and > document the individual pieces, since it's a little bit complex. Done. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:424: The list action names (strings) that was found. On 2015/01/09 22:25:43, Ilya Sherman wrote: > Please update this -- it looks like there are two values returned. Done. https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions_test.py (right): https://codereview.chromium.org/848453002/diff/20001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions_test.py:202: code = 'base::UserMetricsAction(foo ? "Foo.Bar" : "Bar.Foo"));' On 2015/01/09 22:25:44, Ilya Sherman wrote: > I'm not convinced that we want to support this. Instead, IMO, this should be > written as """foo ? base::UserMetricsAction("Foo.Bar") : > base::UserMetricsAction("Bar.Foo")""". We could do that. The disadvantage is we'll need to go back and clean up code where code slips in that doesn't conform - which was my motivation for making the script support this - less maintenance work for us.
https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:49: (.+?) # A sequence of characters for the param. Ah, I see now that "+?" means non-greedy. Okay, that makes more sense to me now. I'm pretty sure that this regex will mismatch on the following: """base::UserMetricsAction(ReverseString("raB.ooF"));"""? In my experience, ternary operators within base::UserMetricsAction are fairly rare, so it's not too much effort to clean them up. OTOH, getting the regex right to match them, but to not accidentally mismatch other code, is fairly involved -- assuming it's even possible with a simple regex. https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:432: - The list action names (strings) that was found. nit: "list action names" -> "list of action names"
https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:49: (.+?) # A sequence of characters for the param. On 2015/01/12 22:17:19, Ilya Sherman wrote: > Ah, I see now that "+?" means non-greedy. Okay, that makes more sense to me > now. > > I'm pretty sure that this regex will mismatch on the following: > """base::UserMetricsAction(ReverseString("raB.ooF"));"""? In my experience, > ternary operators within base::UserMetricsAction are fairly rare, so it's not > too much effort to clean them up. OTOH, getting the regex right to match them, > but to not accidentally mismatch other code, is fairly involved -- assuming it's > even possible with a simple regex. Right. It will also catch base::UserMetricsAction(GetMetricNameFor("Foo")) - so there will be false positives. But I think this is OK. The script will prompt the user to add the bad strings to the actions.xml file - at which point they'll either realise that's not what they want and restructure their code or worse case scenario we catch it during metrics code review. https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:432: - The list action names (strings) that was found. On 2015/01/12 22:17:19, Ilya Sherman wrote: > nit: "list action names" -> "list of action names" Done.
https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:49: (.+?) # A sequence of characters for the param. On 2015/01/13 18:06:17, Alexei Svitkine wrote: > On 2015/01/12 22:17:19, Ilya Sherman wrote: > > Ah, I see now that "+?" means non-greedy. Okay, that makes more sense to me > > now. > > > > I'm pretty sure that this regex will mismatch on the following: > > """base::UserMetricsAction(ReverseString("raB.ooF"));"""? In my experience, > > ternary operators within base::UserMetricsAction are fairly rare, so it's not > > too much effort to clean them up. OTOH, getting the regex right to match > them, > > but to not accidentally mismatch other code, is fairly involved -- assuming > it's > > even possible with a simple regex. > > Right. It will also catch base::UserMetricsAction(GetMetricNameFor("Foo")) - so > there will be false positives. > > But I think this is OK. The script will prompt the user to add the bad strings > to the actions.xml file - at which point they'll either realise that's not what > they want and restructure their code or worse case scenario we catch it during > metrics code review. I guess this is where we disagree -- I would much rather have false negatives than false positives. I don't think that Chromium developers always understand the metrics infrastructure well enough to understand when changes to the XML files are or aren't surprising. And, if the code really looks like base::UserMetricsAction(GetMetricWithSuffix("UserClickedTheRedButton", one_of_a_hundred_possible_suffixes));, I'm not sure that I would catch the error in a metrics review, given that I'd see "UserClickedTheRedButton" in actions.xml. How common a problem are ternary statements within base::UserMetricsAction? I don't recall having encountered this issue ever before, even when I cleaned up a bunch of other warnings that the script generated.
https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/40001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:49: (.+?) # A sequence of characters for the param. On 2015/01/13 23:05:37, Ilya Sherman wrote: > On 2015/01/13 18:06:17, Alexei Svitkine wrote: > > On 2015/01/12 22:17:19, Ilya Sherman wrote: > > > Ah, I see now that "+?" means non-greedy. Okay, that makes more sense to me > > > now. > > > > > > I'm pretty sure that this regex will mismatch on the following: > > > """base::UserMetricsAction(ReverseString("raB.ooF"));"""? In my experience, > > > ternary operators within base::UserMetricsAction are fairly rare, so it's > not > > > too much effort to clean them up. OTOH, getting the regex right to match > > them, > > > but to not accidentally mismatch other code, is fairly involved -- assuming > > it's > > > even possible with a simple regex. > > > > Right. It will also catch base::UserMetricsAction(GetMetricNameFor("Foo")) - > so > > there will be false positives. > > > > But I think this is OK. The script will prompt the user to add the bad strings > > to the actions.xml file - at which point they'll either realise that's not > what > > they want and restructure their code or worse case scenario we catch it during > > metrics code review. > > I guess this is where we disagree -- I would much rather have false negatives > than false positives. I don't think that Chromium developers always understand > the metrics infrastructure well enough to understand when changes to the XML > files are or aren't surprising. And, if the code really looks like > base::UserMetricsAction(GetMetricWithSuffix("UserClickedTheRedButton", > one_of_a_hundred_possible_suffixes));, I'm not sure that I would catch the error > in a metrics review, given that I'd see "UserClickedTheRedButton" in > actions.xml. > > How common a problem are ternary statements within base::UserMetricsAction? I > don't recall having encountered this issue ever before, even when I cleaned up a > bunch of other warnings that the script generated. OK, I caved in and changed the code to not support ternary operators and to warn on all non-literal params. Turns out, to support this, I had to rejigger things quite a bit and so the code actually needed to be smarter / more complicated to make it not be smart enough to parse ternary operators - go figure. Anyway, all done, PTAL. (There were only a couple cases where the ternary operator was used.)
https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (left): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:425: malformed_action_re = re.compile(r'[^a-zA-Z]UserMetricsAction\([^"]') Hmm, if we're now in agreement to only loosen the restriction around newlines, why do you need more changes than just adding \s* to each of these regexes?
https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (left): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:425: malformed_action_re = re.compile(r'[^a-zA-Z]UserMetricsAction\([^"]') On 2015/01/14 18:48:50, Ilya Sherman wrote: > Hmm, if we're now in agreement to only loosen the restriction around newlines, > why do you need more changes than just adding \s* to each of these regexes? The old code iterates line by line, so that wouldn't work when you want to match multi-line statements. You need to use the .search() API on the full string and keep track of the |pos| pointer. The new code also provides better error messages, actually outputting the statement in question and is structured in a way that can be tested more easily.
https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (left): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:425: malformed_action_re = re.compile(r'[^a-zA-Z]UserMetricsAction\([^"]') On 2015/01/14 18:52:19, Alexei Svitkine wrote: > On 2015/01/14 18:48:50, Ilya Sherman wrote: > > Hmm, if we're now in agreement to only loosen the restriction around newlines, > > why do you need more changes than just adding \s* to each of these regexes? > > The old code iterates line by line, so that wouldn't work when you want to match > multi-line statements. You need to use the .search() API on the full string and > keep track of the |pos| pointer. > > The new code also provides better error messages, actually outputting the > statement in question and is structured in a way that can be tested more easily. Fair enough :) https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:52: re.VERBOSE | re.DOTALL # Verbose syntax and makes . also match new lines. Why does . need to match newlines? Is this behavior actually controlling whether the "\s*" matches newlines? https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:426: print >>sys.stderr, 'WARNING: ' + message Please use logging.warning() or logging.error(). (I see that you're just preserving the previous code, but I'm surprised that the code was previously written as it was.) https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:437: self.contents = contents nit: The private vars' names should all start with a double underscore. https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:482: action_name = finder.FindNextAction() It looks like you have dropped the line to update the set of actions. Is that intentional?
https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:52: re.VERBOSE | re.DOTALL # Verbose syntax and makes . also match new lines. On 2015/01/14 19:15:20, Ilya Sherman wrote: > Why does . need to match newlines? Is this behavior actually controlling > whether the "\s*" matches newlines? No, it's whether (.+?) matches newlines. Without it, it doesn't so testTernaryUserMetricsActionWithNewLines() fails without it. https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:426: print >>sys.stderr, 'WARNING: ' + message On 2015/01/14 19:15:20, Ilya Sherman wrote: > Please use logging.warning() or logging.error(). (I see that you're just > preserving the previous code, but I'm surprised that the code was previously > written as it was.) Done. https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:437: self.contents = contents On 2015/01/14 19:15:20, Ilya Sherman wrote: > nit: The private vars' names should all start with a double underscore. Done. https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:482: action_name = finder.FindNextAction() On 2015/01/14 19:15:20, Ilya Sherman wrote: > It looks like you have dropped the line to update the set of actions. Is that > intentional? Ack - no, this was accidentally lost when I removed some local print statements I was using for debugging. Re-added.
LGTM -- thanks, Alexei! https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... File tools/metrics/actions/extract_actions.py (right): https://codereview.chromium.org/848453002/diff/80001/tools/metrics/actions/ex... tools/metrics/actions/extract_actions.py:52: re.VERBOSE | re.DOTALL # Verbose syntax and makes . also match new lines. On 2015/01/14 20:28:21, Alexei Svitkine wrote: > On 2015/01/14 19:15:20, Ilya Sherman wrote: > > Why does . need to match newlines? Is this behavior actually controlling > > whether the "\s*" matches newlines? > > No, it's whether (.+?) matches newlines. Without it, it doesn't so > testTernaryUserMetricsActionWithNewLines() fails without it. Ah, I see -- it's important for the better error messages. Fair enough :)
The CQ bit was checked by isherman@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/848453002/100001
Message was sent while issue was closed.
Committed patchset #5 (id:100001)
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/3f4694f41eb28cada3f2e23161c444392dc2cbfe Cr-Commit-Position: refs/heads/master@{#311582} |