|
|
Created:
6 years, 8 months ago by vadimt Modified:
6 years, 8 months ago CC:
chromium-reviews, Ilya Sherman, jar (doing other things), asvitkine+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionNormalizing source file names before calculating hash by extracting the last component of the file path.
This hash is sent as TrackedObject.source_file_name_hash field.
This solves the problem described in the bug to a degree that we can build a functioning prototype for the server-side analysis tool. The tool will use (filename:line) pair to identify a piece of code that posts a task, and extract parameters to PostTask (and similar) functions to conveniently identify the task.
This solution is not final because the short filenames are not unique, and we’ll have about 6 conflicting pairs. This is OK for the proof-of-concept tool; further development might go in 1 of 2 different directions:
1. We like the results of the (file, line) method mentioned above, and we improve this solution, for example, by renaming files to make them unique.
2. We don’t like it, probably because of inconsistency of names on the server-side tool and the chrome://profiler page, and because it doesn’t show task entrypoints when the closure is created outside of the PostTask parameters list. In this case, we may want to explore adding an identifying parameter to the closure constructor. I’m not going to dive too deep into this, but we have a design for this. Anyways, in this case, we won’t use filename to identify tasks, and the fact that they are not unique will be OK.
For the record, there were other alternatives to extracting the last component, mainly:
1. Post-preprocessing (sic!) or compile-time or post-compile step that normalizes file names.
The issue was the high price (may be weeks of work), which is not OK for a proof-of-concept tool. For example, if we choose later to pass ID in the closure, this work would be wasted.
2. Identify the task by PC (program counter) instead of (file, line). This would require extending the UMA with a map of addresses of loaded modules, per process, and add 2 fields to TrackedObject: PC and module ID. This may be too much for a prototype. There also are some concerns regarding the server-side part of this approach.
BUG=357787
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=261407
Patch Set 1 #
Total comments: 12
Patch Set 2 : isherman@ comments #Patch Set 3 : Second round of comments. #
Total comments: 9
Patch Set 4 : N+1 st wave of comments #Patch Set 5 : N+2 nd wave of comments. #
Total comments: 2
Patch Set 6 : Final nits #Patch Set 7 : Rebasing #
Messages
Total messages: 29 (0 generated)
jar and/or isherman and/or asvitkine, please provide OWNERs approval.
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:280: std::string NormalizeFileName(const std::string& file_name) { nit: The argument is actually a const char* (before you convert it to a string). You could have just used that... but perhaps this is wasteful optimization. It would look like: const char* Normalize(const char* file_name) { const char* result = file_name; while (*file_name) { if (*file_name == '/' || *file_name == '\\') result = file_name + 1; } return result; } ...or you could use strrch() const char* better = strrchr(file_name, '/'); if (better) file_name = better + 1; better = strrchr(file_name, '\\'); if (better) file_name = better + 1; return file_name; Thinking more, I guess this is not very perf critical. It may be safer(?) to stick with conversion to strings, and only hang onto using "const char*" in the perf critical sections of the profiler. YMMV. Your choice. https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:307: MetricsLogBase::Hash(NormalizeFileName(it->birth.location.file_name))); Should we consider expanding the protobuf, and putting this in a different (additional?) slot?
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:280: std::string NormalizeFileName(const std::string& file_name) { In fact, the parameter std::string. https://code.google.com/p/chromium/codesearch#chromium/src/base/location.h&rc... So, to avoid conversion I chose to pass a string. BTW, given that these strings are passed between processes, it makes sense to keep them as std::string in Location class. https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:307: MetricsLogBase::Hash(NormalizeFileName(it->birth.location.file_name))); On 2014/03/29 00:32:25, jar wrote: > Should we consider expanding the protobuf, and putting this in a different > (additional?) slot? Given uselessness of the existing full filename hash, probably, no. We are replacing something that could not be used in any reasonable way with something less meaningless; no reason to keep the old one.
LGTM (but I'd like Ilya to chime in). https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:280: std::string NormalizeFileName(const std::string& file_name) { On 2014/03/29 00:45:50, vadimt wrote: > In fact, the parameter std::string. > https://code.google.com/p/chromium/codesearch#chromium/src/base/location.h&rc... > > So, to avoid conversion I chose to pass a string. > BTW, given that these strings are passed between processes, it makes sense to > keep them as std::string in Location class. Ah... quite correct. SGTM. https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:307: MetricsLogBase::Hash(NormalizeFileName(it->birth.location.file_name))); On 2014/03/29 00:45:50, vadimt wrote: > On 2014/03/29 00:32:25, jar wrote: > > Should we consider expanding the protobuf, and putting this in a different > > (additional?) slot? > > Given uselessness of the existing full filename hash, probably, no. We are > replacing something that could not be used in any reasonable way with something > less meaningless; no reason to keep the old one. OK. The old one disambiguates... but I guess if we *could* decode the old one, we could map to the "new" one. SGTM too.
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:290: return file_name.substr(i); IMO this method would be clearer if written using the Tokenize() function from base/strings/string_util.h: tokenize, then return the last component. https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log_unittest.cc:579: process_data.tasks.back().death_data.queue_duration_sample = 103; I would prefer if you did not repeat the constants from above here. I originally chose all of these constants to be unique. IMO, it might even be better to just write this as a separate test case, where you don't set any of the numeric constants at all, and only verify that the names are hashed as you'd expect. (Assuming I'm correctly understanding the purpose of this test.)
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:290: return file_name.substr(i); Yeah, but that is much less efficient, since we build multiple components (while needing only one), and implementation of Tokenize uses find_first_of, which build a lookup table per each token (and this inefficiency gets multiplied by the number of tokens). https://code.google.com/p/chromium/codesearch#chromium/src/base/strings/strin... https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log_unittest.cc:579: process_data.tasks.back().death_data.queue_duration_sample = 103; Yeah, I could add a separate test. But everything else is tested altogether, and I'm not motivated to break this structure. I've changed constants, though. If you feel very strong about creating an additional test, please let me know.
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:290: return file_name.substr(i); On 2014/03/31 18:10:14, vadimt wrote: > Yeah, but that is much less efficient, since we build multiple components (while > needing only one), and implementation of Tokenize uses find_first_of, which > build a lookup table per each token (and this inefficiency gets multiplied by > the number of tokens). > https://code.google.com/p/chromium/codesearch#chromium/src/base/strings/strin... I'm not convinced that this is a place where such a minor efficiency difference matters. IMO the increased code clarity is worth the very minor inefficiency, unless you have some data to indicate that this would become a hot spot in the code.
https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/1/chrome/browser/metrics/metri... chrome/browser/metrics/metrics_log.cc:290: return file_name.substr(i); Done, but using find_last_of. Looks OK?
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log.cc:281: return file_name.substr(file_name.find_last_of("\\/") + 1); I think the code works... but it probably shouldn't be trusted. When there is no character found, you get back string::npos. It turns out that this is usually -1. Hence you are counting on that, and adding one, to get 0. The function could instead have been defined to return -1, but that is not the API promize. I don't believe you should count on that use of -1. size_t offset = file_name.find_last_of("\\/"); offset = (offset == string::npos) ? 0 : offset + 1; ...
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log.cc:281: return file_name.substr(file_name.find_last_of("\\/") + 1); It's not a problem for me to change the code, but "-1" is official: http://www.cplusplus.com/reference/string/string/npos/
LGTM https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log.cc:281: return file_name.substr(file_name.find_last_of("\\/") + 1); On 2014/04/01 18:59:01, vadimt wrote: > It's not a problem for me to change the code, but "-1" is official: > http://www.cplusplus.com/reference/string/string/npos/ OK... I can live with that. I hadn't seen it. Thanks.
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log.cc:281: return file_name.substr(file_name.find_last_of("\\/") + 1); On 2014/04/01 19:16:17, jar wrote: > On 2014/04/01 18:59:01, vadimt wrote: > > It's not a problem for me to change the code, but "-1" is official: > > http://www.cplusplus.com/reference/string/string/npos/ > > OK... I can live with that. I hadn't seen it. > > Thanks. > I agree with Jim that relying on (string::npos + 1 == 0) in this way is a little too likely to confuse people, even if it is guaranteed to be correct. I'd prefer to separate out the case where no file path component is found from the case where one is. Alternately, we could just DCHECK that one of the characters exists in the file_name, since that should always be the case. https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:592: EXPECT_EQ(MetricsLogBase::Hash("file3"), IMO it's better to compare to a constant rather than to the output of a function. Otherwise, someone could come along and update MetricsLogBase::Hash() without breaking this test... but that *would* likely break the server-side processing.
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log.cc:281: return file_name.substr(file_name.find_last_of("\\/") + 1); On 2014/04/02 06:56:15, Ilya Sherman wrote: > On 2014/04/01 19:16:17, jar wrote: > > On 2014/04/01 18:59:01, vadimt wrote: > > > It's not a problem for me to change the code, but "-1" is official: > > > http://www.cplusplus.com/reference/string/string/npos/ > > > > OK... I can live with that. I hadn't seen it. > > > > Thanks. > > > > I agree with Jim that relying on (string::npos + 1 == 0) in this way is a little > too likely to confuse people, even if it is guaranteed to be correct. I'd > prefer to separate out the case where no file path component is found from the > case where one is. Alternately, we could just DCHECK that one of the characters > exists in the file_name, since that should always be the case. Done. https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:592: EXPECT_EQ(MetricsLogBase::Hash("file3"), Replacing with metrics::HashMetricName (used in MetricsLogBase::Hash implementation), which has a dedicated unit test.
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:592: EXPECT_EQ(MetricsLogBase::Hash("file3"), On 2014/04/02 17:50:25, vadimt wrote: > Replacing with metrics::HashMetricName (used in MetricsLogBase::Hash > implementation), which has a dedicated unit test. I would still prefer to keep a numeric constant as the expected value, rather than the output of a function. You can include both, if you really want to. The disadvantage of including *only* the function is that somebody can come along and change the implementation of the function. They'll know to fix the tests. They'll probably know to make sure the server is updated to decode histogram names correctly. I want to make sure that unittests remind them to also update names used by the profiler data, which they might not even be aware of.
https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:592: EXPECT_EQ(MetricsLogBase::Hash("file3"), How about this?
LGTM % a final nit. Thanks, Vadim! https://codereview.chromium.org/217483006/diff/80001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/80001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:475: metrics::HashMetricName("birth_thread*")); nit: Please indent this line to align with the "G" from "GG_UINT64_C".
https://codereview.chromium.org/217483006/diff/80001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_log_unittest.cc (right): https://codereview.chromium.org/217483006/diff/80001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_log_unittest.cc:475: metrics::HashMetricName("birth_thread*")); On 2014/04/03 00:49:58, Ilya Sherman wrote: > nit: Please indent this line to align with the "G" from "GG_UINT64_C". Done.
The CQ bit was checked by vadimt@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vadimt@chromium.org/217483006/100001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on win_chromium_compile_dbg
The CQ bit was checked by vadimt@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vadimt@chromium.org/217483006/120001
The CQ bit was unchecked by commit-bot@chromium.org
Commit queue rejected this change because the description was changed between the time the change entered the commit queue and the time it was ready to commit. You can safely check the commit box again.
The CQ bit was checked by vadimt@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vadimt@chromium.org/217483006/120001
Message was sent while issue was closed.
Change committed as 261407 |