|
|
Created:
5 years, 9 months ago by vadimt Modified:
5 years, 7 months ago Reviewers:
Alexander Potapenko, jar (doing other things), cpu_(ooo_6.6-7.5), jam, dcheng, Alexei Svitkine (slow), Ilya Sherman, Dmitry Vyukov CC:
asvitkine+watch_chromium.org, chromium-reviews, darin-cc_chromium.org, erikwright+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@phase_splitting Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionDelivering the FIRST_NONEMPTY_PAINT phase changing event to base/. This concludes the client-side “phased profiling” work.
I’m removing TODO items, and the DIFFS introduced by this should remind the reviewers that there are open questions that they wanted to re-raise, assuming the the CL doesn’t make it clear that the way it’s implemented is OK.
If there are still concerns, we should agree on one important thing before we resume the discussions: phase change delivery chain doesn’t go through metrics provider. It starts at the event source (FirstWebContentsProfiler), then goes to TrackingSynchronizer, then goes to base/. Once we agree on the design of this chain, it will make sense to go to more detailed discussions and open questions.
BUG=456354
Committed: https://crrev.com/e2de4734b01f37454f939bc493a8a5cb59ce3c42
Cr-Commit-Position: refs/heads/master@{#327143}
Patch Set 1 #Patch Set 2 : Integration #
Total comments: 12
Patch Set 3 : asvitkine@ comments. #
Total comments: 4
Patch Set 4 : More asvitkine@ comments. #
Total comments: 12
Patch Set 5 : jar@ comments. #
Total comments: 44
Patch Set 6 : isherman@ comments. #
Total comments: 8
Patch Set 7 : More isherman@ comments. #
Total comments: 22
Patch Set 8 : More asvitkine@ comments. #Patch Set 9 : Fixing a typo. #
Total comments: 10
Patch Set 10 : Even more asvitkine@ comments. #
Total comments: 4
Patch Set 11 : Continuation of asvitkine@ comments. #Patch Set 12 : Evehn more isherman@ comments. #
Total comments: 22
Patch Set 13 : Continuation of dialogue with asvitkine@. #
Total comments: 12
Patch Set 14 : Short isherman@ note. #
Total comments: 2
Patch Set 15 : asvitkine@: more of. #
Total comments: 7
Patch Set 16 : Final asvitkine@ comments. #
Total comments: 34
Patch Set 17 : Next batch if isherman@ comments. #
Total comments: 4
Patch Set 18 : asvitkine@ post-review comments. #
Total comments: 4
Patch Set 19 : Attempting to fix a compiler error in Linux. #Patch Set 20 : Another platform-dependent error: no 'override' for virtual destructor. #
Total comments: 37
Patch Set 21 : Another platform-dependent error fix: refcounted objects shouldn't have public destructors. #Patch Set 22 : Continuing fighting with virtual destructors. #Patch Set 23 : More from isherman@ #
Total comments: 6
Patch Set 24 : Final isherman@ coments. #
Total comments: 14
Patch Set 25 : dchen@ comments #Patch Set 26 : More wisdom from isherman@. #
Total comments: 12
Patch Set 27 : More comments. #
Total comments: 6
Messages
Total messages: 121 (32 generated)
vadimt@chromium.org changed reviewers: + asvitkine@chromium.org, isherman@chromium.org
isherman@chromium.org, asvitkine@chromium.org: Please review an pre-review.
https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h#... base/tracked_objects.h:393: // phases, including the current one, identified by |current_profiling_phase|. I think I've made this comment before and don't remember the reply, but it's still confusing to me (which means there's room to improve this even if there's a good reason to keep the way it is). But if this is filling the snapshot for all profile phases, why is the |current_profiling_phase| needed? Is it because this tells the profiler that this phase ended or something for keeping track of for later? If so, this should be called out in the comment (and maybe the function name should change to reflect that in that case). https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h#... base/tracked_objects.h:427: Births* birth); Could you update comments for these two methods to mention what changes to |birth| this can cause? https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... base/tracked_objects_unittest.cc:443: return; Why is this a return and not a failure? Needs a comment. https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... base/tracked_objects_unittest.cc:451: const base::TimeTicks kDelayedStartTime = base::TimeTicks(); No need for base:: Please fix throughout. https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... base/tracked_objects_unittest.cc:456: const unsigned int kStartOfRun = 5; Nit: "unsigned int" is rarely used in Chromium code unles when interfacing with third-party APIs that require it. Use size_t instead. https://codereview.chromium.org/1021053003/diff/20001/chrome/browser/metrics/... File chrome/browser/metrics/chrome_metrics_service_client.cc (left): https://codereview.chromium.org/1021053003/diff/20001/chrome/browser/metrics/... chrome/browser/metrics/chrome_metrics_service_client.cc:381: // to ReceivedProfilerData settles. crbug/456354. Did you consider this? What's the reasoning for not doing it? https://codereview.chromium.org/1021053003/diff/20001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/20001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:81: explicit TestTrackingSynchronizer(const base::TimeTicks& now) I think this should be passed by value. Same below.
https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h#... base/tracked_objects.h:393: // phases, including the current one, identified by |current_profiling_phase|. First, please rest assured that no comment gets ignored and quietly forgotten :) |profiling_phase| is necessary because a child process can start after several phase-changing events, so it needs to receive the phase number from the central process. (If the process didn't receive phase-changing events after it was started, it has no information to deduce the current phase from). Added a comment. https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects.h#... base/tracked_objects.h:427: Births* birth); On 2015/03/30 18:31:21, Alexei Svitkine wrote: > Could you update comments for these two methods to mention what changes to > |birth| this can cause? Done. https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... base/tracked_objects_unittest.cc:451: const base::TimeTicks kDelayedStartTime = base::TimeTicks(); On 2015/03/30 18:31:21, Alexei Svitkine wrote: > No need for base:: > > Please fix throughout. Actually, we are in tracked_objects namespace, so we do need base::. https://codereview.chromium.org/1021053003/diff/20001/base/tracked_objects_un... base/tracked_objects_unittest.cc:456: const unsigned int kStartOfRun = 5; These unsigned ints are ultimately returned by a a function of type NowFunction: https://code.google.com/p/chromium/codesearch#chromium/src/base/profiler/alte... So, it's right to keep them matching the return type of that function. https://codereview.chromium.org/1021053003/diff/20001/chrome/browser/metrics/... File chrome/browser/metrics/chrome_metrics_service_client.cc (left): https://codereview.chromium.org/1021053003/diff/20001/chrome/browser/metrics/... chrome/browser/metrics/chrome_metrics_service_client.cc:381: // to ReceivedProfilerData settles. crbug/456354. I'm waiting till we converge on the exact set of params that we pass here, which in turn depends completion of the whole CR. We might end up with a pretty short list of params, in which case wrapping won't be necessary. I have a big red item in my spreadsheet to not forget doing this.
https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h#... base/tracked_objects.h:433: // We don't modify anything in |birth| in this method. We store the pointer as Nit: Rephrase without "we". Same below. https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h#... base/tracked_objects.h:435: // SubtractBirths() on it. I think this comment needs to discuss the ownership guarantees of |birth| the caller is expected to make for this to be safe. I guess the assumption is that the birth should be something returned by ThreadData::TallyABirthIfActive() and thus owned by ThreadData? If so, this should be mentioned - as this API allows calling with any birth. In fact, I also suggest adding a DCHECK_EQ(birth, ThreadData::TallyABirthIfActive(birth->location()); to validate the contract. Also, shouldn't the param be called |births| rather than birth? Same for the other function below.
Rebase + comments. https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h#... base/tracked_objects.h:433: // We don't modify anything in |birth| in this method. We store the pointer as On 2015/03/30 22:05:21, Alexei Svitkine wrote: > Nit: Rephrase without "we". Same below. N/A anymore. https://codereview.chromium.org/1021053003/diff/40001/base/tracked_objects.h#... base/tracked_objects.h:435: // SubtractBirths() on it. Renamed |birth| to |births|. On "the ownership guarantees of |birth| the caller is expected to make for this to be safe.": Thanks for raising this! Long story short, I realized I had a race condition. I undone removing consts from Births. Resetting death counts become more complicated, and now requires 2 passes: 1 to set death counts to 0 under their thread' locks, and the other - to subtract death counts from birth counts under correct locks. Note that a thread, under its lock, can modify birth counts from its birth map and death counts from its death map. It cannot modify birth counts from its death map (and that's why we have const there). Also note that a Births instance, which is a value in the birth map of one thread, can be a (const) key in the death map of another thread (which means that the task was posted by one thread to the other). We'd have less complexities if we could eliminate birth counts. They are necessary for little-known and unused feature called "Still_Alive tasks". But I couldn't prove to jar@ that we should remove it, so, we are keeping birth counts in this CL.
jar@chromium.org changed reviewers: + jar@chromium.org
There is a lot of good stuff in this CL... but the attempt to reset the counts to zero seems IMO to be a wrong direction. I'd suggest having an in-person code review to discuss strategy, and allow you a chance to argue for why the changes are safe. This code was very carefully written... and is quite (unavoidably) complex. The challenge is to avoid the use of locks as much as possible, and rely on external locks (such as are applied during PostTask) to provide the requisite memory barriers to make this all safe without significant marginal cost. Most of this functionality can be found in files containing the word "subtle" and a warning not to casually use or interact with these features... since it is super delicate. There are also numerous data structures, such as lists, which are guaranteed to be invariant only in their tails, and hence safe to walk despite async (ongoing) insertions. There are maps which are (commonly) read from the owning thread with no lock (providing speed), but which use locks when they are rarely mutated (with corresponding locks taken to read from non-owning threads). There is a tangle of delicate pointers, and you've started to notice the const being used to prevent accidental violations of several of these invariant (and requirements). As it currently stands, there are, as I recall, no races with data. The snapshotting code has a racy read, which can be compensated for using a delta strategy. ...but there is probably no way to safely reset the data, which seems to be attempted in this CL. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc... base/tracked_objects.cc:232: birth_count_ -= count; This is unsafe, unless *all* folks that access this are using a lock. Since there is no apparent lock here... this is completely unsafe. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... base/tracked_objects.h:550: bool reset, This is totally unsafe, and will instigate errors as large as the tally of births, due to races. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... base/tracked_objects.h:566: // |death_reset_results| is not null, store current death counts in it and You can't get away with doing atomic operation so casually, as a race (without a lock) will result in garbage values (delta as large as the size of the current tally generally... and perhaps it will actually be worse).
https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:145: void ProfilerControllerImpl::OnProfilingPhaseCompletion(int profiling_phase) { Should probably have a comment in this function explaining why it's OK not to wait for the child processes to acknowledge this. i.e. if the assumption is that if this phase completes before the data collection for UMA, then that will be done in a follow-up Send() also on UI thread and that messages will be processed in sequence - so when we get reply with the profiler data, that would be guaranteed after those children handled any phase completion events they've received. (If this description is correct - please add something among these lines as a comment here.) https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:148: for (RenderProcessHost::iterator it(RenderProcessHost::AllHostsIterator()); Hmm, I realise you're just following the same pattern as the previous code, but I am confused why we notify renderers and child processes separately - and what prevents renderer processes being notified twice? (i.e. wouldn't renderers already be included in BrowserChildProcessHostIterator?). If indeed this is the correct thing to do, please add a comment explaining why that's the case - since it's entirely not obvious just from reading the code.
On 2015/04/01 17:10:25, jar (doing other things) wrote: > There is a lot of good stuff in this CL... but the attempt to reset the counts > to zero seems IMO to be a wrong direction. > > I'd suggest having an in-person code review to discuss strategy, and allow you a > chance to argue for why the changes are safe. This code was very carefully > written... and is quite (unavoidably) complex. > > The challenge is to avoid the use of locks as much as possible, and rely on > external locks (such as are applied during PostTask) to provide the requisite > memory barriers to make this all safe without significant marginal cost. Most > of this functionality can be found in files containing the word "subtle" and a > warning not to casually use or interact with these features... since it is super > delicate. > > There are also numerous data structures, such as lists, which are guaranteed to > be invariant only in their tails, and hence safe to walk despite async (ongoing) > insertions. There are maps which are (commonly) read from the owning thread > with no lock (providing speed), but which use locks when they are rarely mutated > (with corresponding locks taken to read from non-owning threads). There is a > tangle of delicate pointers, and you've started to notice the const being used > to prevent accidental violations of several of these invariant (and > requirements). > > As it currently stands, there are, as I recall, no races with data. The > snapshotting code has a racy read, which can be compensated for using a delta > strategy. > > ...but there is probably no way to safely reset the data, which seems to be > attempted in this CL. > > https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc > File base/tracked_objects.cc (right): > > https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc... > base/tracked_objects.cc:232: birth_count_ -= count; > This is unsafe, unless *all* folks that access this are using a lock. Since > there is no apparent lock here... this is completely unsafe. > > https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h > File base/tracked_objects.h (right): > > https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... > base/tracked_objects.h:550: bool reset, > This is totally unsafe, and will instigate errors as large as the tally of > births, due to races. > > https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... > base/tracked_objects.h:566: // |death_reset_results| is not null, store current > death counts in it and > You can't get away with doing atomic operation so casually, as a race (without a > lock) will result in garbage values (delta as large as the size of the current > tally generally... and perhaps it will actually be worse). FWIW: Summarizing the results of a meeting between VadimT and Jar: VadimT will add a second counter, parallel to the current death count. This new counter will only be for use as a denominator when deciding about recording a sample (re: cute trick for getting an independently random sample from an unbounded stream of events). The new counter will be the *only* counter that is "cleared asynchronously" towards zero. Any miscommunication in this smashing (due to races) will then *only* impact the future probability of taking a sample. Said another way: If all goes as planned with the async clearing, then the next death event will indeed be taken with probability 1. If things "Don't go well" in the clearing, and perhaps the old (larger) value persists, then the divisor will be large, and the only downside will be that it is less likely that a new sample will replace any existing sample. Hence there will be rare, and minimal damage ;-). This allows all existing code (birth/death counters) to continue to act in a race-free manner, and the snapshot-delta approach can be used to grab counts, and (when snapshot races appear) heal the delta during future snapshots.
jar@, thanks for the summary! Also, we'll clear asynchronously, towards zero, run_duration_max_ and queue_duration_max_.
On 2015/04/03 21:02:18, vadimt wrote: > jar@, thanks for the summary! > > Also, we'll clear asynchronously, towards zero, run_duration_max_ and > queue_duration_max_. +1 (Agreed) Those statistics are also all about what happened in an interval... and will not have the potential to cause "cascading damage" to anything else (in later intervals) if they are corrupted.
Modified base/ changes to make them race-proof. Now not resetting counters and sums on phase change from the main thread. The counters and sums keep growing, but every task’s DeathData has a list of snapshots made on phase completions. When it’s time to send the profiler data, we iterate through snapshots for each task and send back deltas for each profiling phase. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.cc... base/tracked_objects.cc:232: birth_count_ -= count; On 2015/04/01 17:10:25, jar (doing other things) wrote: > This is unsafe, unless *all* folks that access this are using a lock. Since > there is no apparent lock here... this is completely unsafe. Done. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... base/tracked_objects.h:550: bool reset, On 2015/04/01 17:10:25, jar (doing other things) wrote: > This is totally unsafe, and will instigate errors as large as the tally of > births, due to races. Done. https://codereview.chromium.org/1021053003/diff/60001/base/tracked_objects.h#... base/tracked_objects.h:566: // |death_reset_results| is not null, store current death counts in it and On 2015/04/01 17:10:25, jar (doing other things) wrote: > You can't get away with doing atomic operation so casually, as a race (without a > lock) will result in garbage values (delta as large as the size of the current > tally generally... and perhaps it will actually be worse). Done. https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:145: void ProfilerControllerImpl::OnProfilingPhaseCompletion(int profiling_phase) { Hmm. I don't wait for any result simply because there is no result from phase completion... It's simply a signal to a child to start labelling tasks with a new phase number. Not sure that the order of messages is important too, and we don't rely on it. If a message collecting data for sending is sent after phase change, but somehow gets processed first in the child process - no big deal: the child will respond to Send with less phases, i.e. tasks from the last very short phase will be in the previous phase... So, I'm not sure what to write in the comment. https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:148: for (RenderProcessHost::iterator it(RenderProcessHost::AllHostsIterator()); I checked that everything works correctly, and that this is a correct way to do things. As for the deep roots for such separation, it would take time to figure this out. Not sure this belongs to the scope of this CL.
Sorry for being quite slow in reviewing this. I've finally had a bit of a break in my incoming review volume, so I had a chance to give this an earnest look. I've still only skimmed over the //base changes for now, as they require a much more closely focused review than the rest (and also Jim is very kindly reviewing those details -- thanks, Jim!). https://codereview.chromium.org/1021053003/diff/80001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/80001/base/tracked_objects.cc... base/tracked_objects.cc:33: const bool kTrackAllTaskObjects = true; Is this flag still needed? It might be helpful to remove it in a precursor CL, so that the new code doesn't need to worry about it. https://codereview.chromium.org/1021053003/diff/80001/chrome/browser/metrics/... File chrome/browser/metrics/first_web_contents_profiler.cc (right): https://codereview.chromium.org/1021053003/diff/80001/chrome/browser/metrics/... chrome/browser/metrics/first_web_contents_profiler.cc:15: #include "components/metrics/metrics_log.h" nit: This include seems wrong. If you just need the protocol buffer values, then please include the protocol buffer header directly. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer.cc:233: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); Throughout, please use DCHECK_CURRENTLY_ON https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer.h:58: // associated with completing the phase and starting a new one. nit: Suggested rewording: "|profiling_event| is the event that triggered the completion of the current phase, and begins a new phase." Of course, feel free to improve further. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_observer.h:48: const ProfilerEvents& past_events) = 0; I'd still prefer to pass TimeTicks in place of TimeDeltas, and I'd still prefer to omit the past_events list. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } nit: Can this be written simply as "using TrackingSynchronizer::RegisiterPhaseCompletion;"? https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:91: ~TestTrackingSynchronizer() {} nit: Please move this up to the constructor. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:99: }; I would strongly prefer to use a TickClock in the TrackingSynchronizer class, rather than needing to jump through hoops to pass a test time to each method that wants one. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:90: sequence_number, current_profiling_phase))) { Do you understand why |sequence_number| is used in this IPC message? If so, could you please explain to me why it's needed here, and not needed for the new code path? If not, let's try to figure that out. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:108: DCHECK_CURRENTLY_ON(BrowserThread::UI); Hmm, shouldn't this be the IO thread? Are you not running with DCHECKS? https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:125: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); Please use DCHECK_CURRENTLY_ON https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:149: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); Please use DCHECK_CURRENTLY_ON https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... File content/browser/profiler_controller_impl.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.h:49: void OnProfilingPhaseCompletion(int profiling_phase) override; nit: "Completion" -> "Completed" https://codereview.chromium.org/1021053003/diff/80001/content/child/child_thr... File content/child/child_thread_impl.cc (right): https://codereview.chromium.org/1021053003/diff/80001/content/child/child_thr... content/child/child_thread_impl.cc:613: OnProfilingPhaseCompletion) nit: s/completion/completed https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... content/common/child_process_messages.h:111: // tracked_objects). Why is this comment the same as the one above? How do the two IPC methods differ? (Please reply by clarifying the comments, not by replying to this CL comment.) https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... content/common/child_process_messages.h:112: IPC_MESSAGE_CONTROL1(ChildProcessMsg_OnProfilingPhase, nit: "OnProfilingPhase" -> "ProfilingPhaseCompleted" https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... File content/public/browser/profiler_controller.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... content/public/browser/profiler_controller.h:45: // |current_profiling_phase| is the number of the current profiling phase. nit: s/number/0-indexed identifier, or something like that. https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... content/public/browser/profiler_controller.h:50: // |profiling_phase| completion. nit: "notify them of the profiling phase completion" -> "notify them that the |profiling_phase| has completed". https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... content/public/browser/profiler_controller.h:51: virtual void OnProfilingPhaseCompletion(int profiling_phase) = 0; nit: "Completion" -> "Completed"
Thanks! Addressed comments + rebase. https://codereview.chromium.org/1021053003/diff/80001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/80001/base/tracked_objects.cc... base/tracked_objects.cc:33: const bool kTrackAllTaskObjects = true; Done; plase see (and LGTM) https://codereview.chromium.org/1068933003/ https://codereview.chromium.org/1021053003/diff/80001/chrome/browser/metrics/... File chrome/browser/metrics/first_web_contents_profiler.cc (right): https://codereview.chromium.org/1021053003/diff/80001/chrome/browser/metrics/... chrome/browser/metrics/first_web_contents_profiler.cc:15: #include "components/metrics/metrics_log.h" On 2015/04/07 01:15:30, Ilya Sherman wrote: > nit: This include seems wrong. If you just need the protocol buffer values, > then please include the protocol buffer header directly. Done. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer.cc:233: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); On 2015/04/07 01:15:30, Ilya Sherman wrote: > Throughout, please use DCHECK_CURRENTLY_ON Done. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer.h:58: // associated with completing the phase and starting a new one. On 2015/04/07 01:15:30, Ilya Sherman wrote: > nit: Suggested rewording: "|profiling_event| is the event that triggered the > completion of the current phase, and begins a new phase." Of course, feel free > to improve further. Done. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_observer.h:48: const ProfilerEvents& past_events) = 0; IMHO, time deltas make sense. phase_start and _end are relative times (i.e. deltas between the phase event and the start of recording), hence, we use TimeDelta. This makes the code easier to comprehend. Speaking of past_events: as you can see, the whole accounting for phases and events is done by one central component: tracking synchronizer. It has 2 possible consumers of the information: metrics component and the profiler tab. It makes the phase-related information equally available to both consumers. If metrics component didn’t exist (say, on a new hypothetical platform), the profiler tab would still have access to phasing information from the synchronizer. It would be strange to move the accounting to one of the consumers, i.e. metrics component, creating strange asymmetry between consumers. Hence, the accounting is done by the central component - tracking synchroniser, metrics component is NOT notified about phase change events (only about end results of multiphased snapshots), and tracking synchronizer reports the whole set of information, including past events, to consumers. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } Technically, this works, at least on Windows, but our guide allows using this only for constructors. I'd not do this. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:91: ~TestTrackingSynchronizer() {} On 2015/04/07 01:15:30, Ilya Sherman wrote: > nit: Please move this up to the constructor. Done. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:99: }; On 2015/04/07 01:15:30, Ilya Sherman wrote: > I would strongly prefer to use a TickClock in the TrackingSynchronizer class, > rather than needing to jump through hoops to pass a test time to each method > that wants one. Done. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:90: sequence_number, current_profiling_phase))) { See RequestContextMap. Sequence numbers are necessary to properly route responses from child processes to corresponding callbacks. Since the phase change broadcast doesn't take a callback, there is no need in unique numbers. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:108: DCHECK_CURRENTLY_ON(BrowserThread::UI); Must have sneaked in since I ran this last time with UMA and phasing enabled; thanks! (I'd catch it later anyways, since I was planning another end-to-end test at the end anyways.) https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:125: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); On 2015/04/07 01:15:30, Ilya Sherman wrote: > Please use DCHECK_CURRENTLY_ON Done. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.cc:149: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); On 2015/04/07 01:15:30, Ilya Sherman wrote: > Please use DCHECK_CURRENTLY_ON Done. https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... File content/browser/profiler_controller_impl.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/browser/profile... content/browser/profiler_controller_impl.h:49: void OnProfilingPhaseCompletion(int profiling_phase) override; On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: "Completion" -> "Completed" Done. https://codereview.chromium.org/1021053003/diff/80001/content/child/child_thr... File content/child/child_thread_impl.cc (right): https://codereview.chromium.org/1021053003/diff/80001/content/child/child_thr... content/child/child_thread_impl.cc:613: OnProfilingPhaseCompletion) On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: s/completion/completed Done. https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... content/common/child_process_messages.h:111: // tracked_objects). On 2015/04/07 01:15:31, Ilya Sherman wrote: > Why is this comment the same as the one above? How do the two IPC methods > differ? (Please reply by clarifying the comments, not by replying to this CL > comment.) Done. https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... content/common/child_process_messages.h:112: IPC_MESSAGE_CONTROL1(ChildProcessMsg_OnProfilingPhase, On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: "OnProfilingPhase" -> "ProfilingPhaseCompleted" Done. https://codereview.chromium.org/1021053003/diff/80001/content/common/child_pr... content/common/child_process_messages.h:112: IPC_MESSAGE_CONTROL1(ChildProcessMsg_OnProfilingPhase, On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: "OnProfilingPhase" -> "ProfilingPhaseCompleted" Done. https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... File content/public/browser/profiler_controller.h (right): https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... content/public/browser/profiler_controller.h:45: // |current_profiling_phase| is the number of the current profiling phase. On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: s/number/0-indexed identifier, or something like that. Done. https://codereview.chromium.org/1021053003/diff/80001/content/public/browser/... content/public/browser/profiler_controller.h:50: // |profiling_phase| completion. On 2015/04/07 01:15:31, Ilya Sherman wrote: > nit: "notify them of the profiling phase completion" -> "notify them that the > |profiling_phase| has completed". Done.
Thanks, Vadim. I've still just focused on the parts outside of //base for now. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_observer.h:48: const ProfilerEvents& past_events) = 0; On 2015/04/07 21:44:14, vadimt wrote: > IMHO, time deltas make sense. phase_start and _end are relative times (i.e. > deltas between the phase event and the start of recording), hence, we use > TimeDelta. This makes the code easier to comprehend. My claim is that relative times only make sense for the metrics component, and not for the profiler UI. I'd prefer to use absolute timestamps and let clients process those timestamps as they wish. > Speaking of past_events: as you can see, the whole accounting for phases and > events is done by one central component: tracking synchronizer. It has 2 > possible consumers of the information: metrics component and the profiler tab. > It makes the phase-related information equally available to both consumers. If > metrics component didn’t exist (say, on a new hypothetical platform), the > profiler tab would still have access to phasing information from the > synchronizer. It would be strange to move the accounting to one of the > consumers, i.e. metrics component, creating strange asymmetry between consumers. > Hence, the accounting is done by the central component - tracking synchroniser, > metrics component is NOT notified about phase change events (only about end > results of multiphased snapshots), and tracking synchronizer reports the whole > set of information, including past events, to consumers. I agree that the tracking synchronizer is currently the code that is responsible for keeping track of the basics of phases. However, I'm also seeing the needs of the metrics uploader, which are distinct from the needs of the profiler tab, leak into the internals of the tracking synchronizer. I'd prefer that the tracking synchronizer be responsible for the minimum set of things that are actually shared between the two clients, and that everything that's only needed for the uploader actually be handled by the uploader. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } On 2015/04/07 21:44:14, vadimt wrote: > Technically, this works, at least on Windows, but our guide allows using this > only for constructors. I'd not do this. Can you please link to the section of the guide where this is described? I've seen this used fairly frequently in tests. https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.h (right): https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.h:50: explicit TrackingSynchronizer(base::TickClock* clock); nit: Please pass a scoped_ptr, rather than simply documenting in comments about the ownership transfer. https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:98: }; Can you test the public API of the class, rather than exposing these implementation details for the test to call directly? https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:108: auto clock = new base::SimpleTestTickClock(); Please document why this isn't leaked, since it looks a lot like a memory leak. https://codereview.chromium.org/1021053003/diff/100001/content/common/child_p... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/100001/content/common/child_p... content/common/child_process_messages.h:111: // new one. nit: "to finish current profiling phase" -> "to mark the current profiling phase as finished"
Thanks! https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_observer.h:48: const ProfilerEvents& past_events) = 0; Done with the TimeTicks part. On past_events: I believe we should make it possible (in future) for the chrome://profiler to be able to filter by before/after a given phase. The current data passed to it doesn’t seem unreasonable, since it supports exactly this kind of query. This doesn’t seem like an obvious leaking of the UMA-related details. The current protocol may in fact help to avoid duplication between the tab and UMA code. Of course, there are some uncertainties about the future, and it may turn out that the current protocol will become inadequate. But we don’t know, and I’m not sure that we have to change the current implementation, perhaps to better, perhaps to worse (from the hypothetical future needs’ point of view), right now. So, the claim is twofold: (1) the current protocol doesn’t look leaky to me; (2) we don’t know the future. This is my motivation for having exactly this protocol. YMMV. https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } https://chromium-cpp.appspot.com/, Inherited Constructors. BTW, I've misread this: inherited constructors are blacklisted. Speaking of inherited methods, they are not present in the allowed features list, which makes me think that the tests you've seen are wrong. https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.h (right): https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.h:50: explicit TrackingSynchronizer(base::TickClock* clock); On 2015/04/07 22:58:35, Ilya Sherman wrote: > nit: Please pass a scoped_ptr, rather than simply documenting in comments about > the ownership transfer. Done. https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:98: }; Not sure it's worth the effort. The purpose of the test if to test the non-trivial logic of TrackingSynchronizer::SendData rather than the boilerplate logic of collecting snapshots from the system. I'd need to mock the snapshot collection (which at the first sight requires significant effort) for essentially same result. So, testing SendData directly looks like a reasonable shortcut for me... https://codereview.chromium.org/1021053003/diff/100001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:108: auto clock = new base::SimpleTestTickClock(); On 2015/04/07 22:58:35, Ilya Sherman wrote: > Please document why this isn't leaked, since it looks a lot like a memory leak. Done. https://codereview.chromium.org/1021053003/diff/100001/content/common/child_p... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/100001/content/common/child_p... content/common/child_process_messages.h:111: // new one. On 2015/04/07 22:58:35, Ilya Sherman wrote: > nit: "to finish current profiling phase" -> "to mark the current profiling phase > as finished" Done.
https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; Instead of having all of these as members - which also exist in DeathDataSnapshot - why not actually have DeathDataSnapshot be a member of this? https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:189: last_phase_snapshot_ = new DeathDataPhaseSnapshot(profiling_phase, *this); Eek, I really don't like this circular reference between DeathData and DeathDataPhaseSnapshot. However, if DeathData instead has a DeathDataSnapshot as a member, this circle can be broken by making the ctor take the DeathDataSnapshot member param and the last_phase_snapshot, instead of *this. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:212: const tracked_objects::DeathData& death_data) Nit: No need for tracked_objects:: namespace. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:652: if (death_data.count > 0) Nit: {} https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:264: // Access to members of this class is never protected by the lock. The fields Nit: This comment should be above the class itself rather than above the public: statement. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:299: // Called only in the main thread. Hmm, base/ generally doesn't know about "main thread" - since *specific* threads are a higher-level concept. Also, the tense of this sentence is describing behavior outside of this class - which doesn't make sense. The existing comments in this file discuss "death thread" and "birth thread" - which make sense - since they're relative. I'm guessing those can both be distinct from the "main thread". Is this object always created on the main thread? If so, you can say "Should only be called from the same thread that the object was created on.". If that's not correct, let me know and we can discuss more. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. But who does? It seems in some cases, the object does - i.e. through the linked list chain on DeathData. This seems quite messy. How about having DeathData just keep a ScopedVector of DeathDataPhaseSnapshot snapshots so that the ownership is clear - and no manual iterating through the linked lists to delete these is needed?
2 snapshots for the last set of comments. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; As the comment says, "Members are ordered from most regularly read and updated, to least". Looks like an optimization, like to get them to same cache line. Placing DeathDataSnapshot to replace snapshotable fields in principle kills this optimization, or has a potency to introduce inefficiencies if future, when/if we introduce more non-snapshotable fields between snapshotable ones. I'd leave the separation between serialized data and low-level-optimized realtime data representations. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:189: last_phase_snapshot_ = new DeathDataPhaseSnapshot(profiling_phase, *this); Beautiful, but perhaps suboptimal, as per previous comment. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:212: const tracked_objects::DeathData& death_data) On 2015/04/08 15:55:38, Alexei Svitkine wrote: > Nit: No need for tracked_objects:: namespace. Done. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:652: if (death_data.count > 0) On 2015/04/08 15:55:39, Alexei Svitkine wrote: > Nit: {} Done. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:299: // Called only in the main thread. I added some comments. The object gets created in the death thread for adding to the death map, and in the snapshot thread, as a part for creating snapshots for Still_Alive thread. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. Having list nodes that don't own 'next' is a very common practice (the list itself is owned by some parent structure); not sure we can call this "messy". I’m not a big fan of scoped_vector. Vector has higher memory overhead than a list, and we shouldn't ignore this, given that we have thousands of DeathDatas. (Also, I like an elegant loop in ThreadData::SnapshotExecutedTasks, which starts at the current snapshot (with is not in the snapshotted list, and would not be in the vector), and goes on to the list of old snapshotted elements. This loop would look heavier if we used a vector, since we'd have to special case the current snapshot.) I changed comments; hoping that the perception of “mess” is less now.
https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; On 2015/04/08 20:32:34, vadimt wrote: > As the comment says, "Members are ordered from most regularly read and updated, > to least". Looks like an optimization, like to get them to same cache line. > Placing DeathDataSnapshot to replace snapshotable fields in principle kills this > optimization, or has a potency to introduce inefficiencies if future, when/if we > introduce more non-snapshotable fields between snapshotable ones. > I'd leave the separation between serialized data and low-level-optimized > realtime data representations. I am actually very skeptical of that original comment, especially since it says "it might help" - but there doesn't appear to be data to back that up. But - we can totally order the fields in the same way in the DeathDataSnapshot struct (with this same comment) and by including it at the same place in DeathData, the memory layout and machine code to update those values will be the same. Given that it's entirely possible to maintain the previous concerns about layout and that I believe it will improve the code quite a bit, I think its worth making this change. (The other alternative would be to introduce a 3rd struct - one that's only used for the in-memory data and use it within DeathData in the same way I suggest DeathDataSnapshot be used - this will also fix the same circular dependency issue that I mentioned, so I would still prefer that over the current approach - but I don't buy the value of keeping the runtime and snapshot data separate if the same struct can be used to represent both.) https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. On 2015/04/08 20:32:34, vadimt wrote: > Having list nodes that don't own 'next' is a very common practice (the list > itself is owned by some parent structure); not sure we can call this "messy". > > I’m not a big fan of scoped_vector. Vector has higher memory overhead than a > list, and we shouldn't ignore this, given that we have thousands of DeathDatas. > (Also, I like an elegant loop in ThreadData::SnapshotExecutedTasks, which starts > at the current snapshot (with is not in the snapshotted list, and would not be > in the vector), and goes on to the list of old snapshotted elements. This loop > would look heavier if we used a vector, since we'd have to special case the > current snapshot.) > > I changed comments; hoping that the perception of “mess” is less now. Sorry, let me clarify - I don't mind that the "prev" pointer doesn't own the data. My concern is that it's not consistent - because in some cases this structure is indeed used for ownership. But I agree that we should be careful about performance. Can you clarify (not in the code, just in the review) when exactly DeathData is destroyed? If the events are relatively rare (i.e. snapshotting and destruction), it seems better to opt for code clarity than raw performance. On the other hand, if it's frequent, of course performance is a bigger concern. My impression was that snapshotting and destruction of DeathData would be rare. Is that assumption correct or am I wrong? https://codereview.chromium.org/1021053003/diff/160001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/160001/base/tracked_objects.h... base/tracked_objects.h:304: // Must be called only in the snapshot thread. Nit: "in" -> "on"
https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; jar@ told me that cache lines on Android may be very small. I'd not risk with reordering the fields. It's not hard for me to reorder the fields in DeathData etc, and the could would indeed become more elegant. However, consider a non-snapshottable field sample_probability_count_. You can't include it in DeathDataSnapshot (no need to snapshot). So, you'd have to place it before DeathDataSnapshot. That's OK for now, but may a an early sign of future problems. Some non-snapshottable fields may need to be inserted in the middle of DeathData, and at that point, we are broken. This separation seems to be essentially right thing, and we don't leak runtime considerations to the serialized structure. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. I still fail to see the difference between the patterns I use and the general list pattern, where there is a root object (List), pointing to the first element (Node), which points ... etc. The whole list is owned by the root object. Here, List == DeathData, and Node == DeathDataPhaseSnapshot, without any deviations, hacks and tricks. So, your "concern is that it's not consistent" is equally applicable to the general List pattern. However, our misunderstanding is perhaps because you think that I deviate from the classic List pattern? You phrase "because in some cases this structure is indeed used for ownership" seems to suggest this. Could you point to where this happens? DeatData is destroyed at the process' exit, and phase change/snapshottings are rare. However, my main concern regarding the vector is it's memory, not CPU, overhead. You know, 2 values (count, capacity), + allocated memory (of 'capacity' size). It's more expensive than just the list, both for 0 and non-0 list sizes. https://codereview.chromium.org/1021053003/diff/160001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/160001/base/tracked_objects.h... base/tracked_objects.h:304: // Must be called only in the snapshot thread. On 2015/04/08 20:54:26, Alexei Svitkine wrote: > Nit: "in" -> "on" Done.
https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; On 2015/04/08 21:27:17, vadimt wrote: > jar@ told me that cache lines on Android may be very small. I'd not risk with > reordering the fields. > > It's not hard for me to reorder the fields in DeathData etc, and the could would > indeed become more elegant. However, consider a non-snapshottable field > sample_probability_count_. You can't include it in DeathDataSnapshot (no need to > snapshot). So, you'd have to place it before DeathDataSnapshot. > That's OK for now, but may a an early sign of future problems. Some > non-snapshottable fields may need to be inserted in the middle of DeathData, and > at that point, we are broken. > This separation seems to be essentially right thing, and we don't leak runtime > considerations to the serialized structure. Fine, but I'm still not OK with the circular dependency between the two classes. I suggest then either passing all the values as params to the ctor of the snapshot - instead of *this - or just setting them on the snapshot after construction. That way, there's only a dependency in a single direction. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. On 2015/04/08 21:27:17, vadimt wrote: > I still fail to see the difference between the patterns I use and the general > list pattern, where there is a root object (List), pointing to the first element > (Node), which points ... etc. The whole list is owned by the root object. > Here, List == DeathData, and Node == DeathDataPhaseSnapshot, without any > deviations, hacks and tricks. > > So, your "concern is that it's not consistent" is equally applicable to the > general List pattern. However, our misunderstanding is perhaps because you think > that I deviate from the classic List pattern? You phrase "because in some cases > this structure is indeed used for ownership" seems to suggest this. Could you > point to where this happens? > > DeatData is destroyed at the process' exit, and phase change/snapshottings are > rare. > > However, my main concern regarding the vector is it's memory, not CPU, overhead. > You know, 2 values (count, capacity), + allocated memory (of 'capacity' size). > It's more expensive than just the list, both for 0 and non-0 list sizes. My original comment was because you had a comment mentioning that this object doesn't own the prev pointer. Yet now you say that it does own it (and indeed this seems to match the dtor of DeathData's use of this - where it's used to delete the chain). I don't have a problem with the pattern per-se, but I do want the ownership model in the code to be very clear. If the snapshot indeed owns its prev chain, then it should be done using constructs that indicate this - i.e. using a scoped_ptr. If it doesn't own it, then a different construct (e.g. scoped vector) should be used for managing ownership (in parallel to the list). From looking at the code, it seems to me (correct me if I'm wrong) that in *some* cases it owns it (i.e. for the list in ThreadData) - but in other cases it doesn't (e.g. when the struct is copied around?) - that's the issue I have - the inconsistency. Now, perhaps you are right that the memory overhead of a scoped_vector is too much - and thinking about this, I guess I agree that it probably is, since this is once per task type and there are a lot of types of tasks.. Ideally, I'd like to have it either always own it, or never own it - but if there really is no good way to do this and stay performant - then we should at least add better comments above where the root of this owning list is declared about it. https://codereview.chromium.org/1021053003/diff/180001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/180001/base/tracked_objects.h... base/tracked_objects.h:304: // Must be called only on the snapshot thread. Is there a way to enforce these threading requirements in debug builds using some of the helper classes we have, such as ThreadChecker?
https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.c... base/tracked_objects.cc:104: queue_duration_sample_ = 0; Done, but not sure I like the result... Since DeathDataSnapshot and DeathDataPhaseSnapshot are created in several places, I chose to pass all fields as params, for safety. So, we see long lists of params here and there. Perhaps, there is a way to introduce an additional structure, but this'd come with added complexity (for understanding), too. Here we go. https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. Hmmm; DeathDataPhaseSnapshot is the node in the list, and, according to the list pattern, doesn't own its 'prev'. Always. DeathData is the root of the list, and (according to the same pattern) owns the whole list of DeathDataPhaseSnapshot's. Always. So, yes, I'm correcting you. Everything is consistent. Now, regarding comments: The comment before DeathDataPhaseSnapshot says "Used as an element of the list of phase snapshots owned by DeathData." This indicates not-ownership of prev. The comment before DeathData::last_phase_snapshot_ says "DeathData owns the whole list starting with this pointer." Did you somehow mix DeathData and DeathDataSnapshot while reading code? I'm trying to understand why we are seeing different things... https://codereview.chromium.org/1021053003/diff/180001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/180001/base/tracked_objects.h... base/tracked_objects.h:304: // Must be called only on the snapshot thread. Speaking of birth and death threads, I’d not bother. The very fact that methods like ThreadData::TallyADeath happen on an instance of ThreadData taken from to a certain thread’s TLS, guarantee that the operations happen in a correct thread. Also, I’m hesitant to overload pretty frequently executed code (task start/finish) with more checks even in debug mode. YMMV. I’ve included checks for the snapshot thread. Not using LazyInstance because in non-debug mode the instance resolves to an empty structure without a constructor.
https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } On 2015/04/08 01:09:50, vadimt wrote: > https://chromium-cpp.appspot.com/, Inherited Constructors. > BTW, I've misread this: inherited constructors are blacklisted. > Speaking of inherited methods, they are not present in the allowed features > list, which makes me think that the tests you've seen are wrong. This isn't a C++11 feature, so that list is not relevant. You're welcome to raise the issue on chromium-dev if you're really concerned. https://codereview.chromium.org/1021053003/diff/160001/chrome/browser/metrics... File chrome/browser/metrics/chrome_metrics_service_client.h (right): https://codereview.chromium.org/1021053003/diff/160001/chrome/browser/metrics... chrome/browser/metrics/chrome_metrics_service_client.h:188: base::WeakPtrFactory<ChromeMetricsServiceClient> weak_ptr_factory_; nit: The WeakPtrFactory should always be the final member, so that it is destroyed first. This is a helpful pattern to follow in general, so that weak pointers are always invalidated before other members are destroyed, and possibly try to access those weak pointers. In this case, there is no such risk; but there's also no harm to following the pattern, and this is the sort of pattern that is best to follow as uniformly as possible, so that readers don't need to think about whether an exception is justified or not. https://codereview.chromium.org/1021053003/diff/160001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/160001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:185: clock_(std::move(clock)) { nit: std::move is not yet permitted in Chromium code, except in a few exceptional circumstances. Please write "clock.Pass()" instead. https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... content/common/child_process_messages.h:110: // Send to all the child processes to to mark the current profiling phase as nit: "to to" -> "to" https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... content/common/child_process_messages.h:111: // finished and start a new one. nit: "new one" -> "new phase" https://codereview.chromium.org/1021053003/diff/180001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/180001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:40: // FinishedReceivingData() is called. I'm still not thrilled with passing the set of |past_events|, because I think it's simpler to write the chrome://profiler UI page without using this data structure, and I think the data structure is relatively non-obvious in terms of how it's used. But, I think it's not really useful to spend more time bikeshedding over it, so let's go with your preference and keep it here. If we do so, let's make the documentation very clear about why such a structure is important and useful. Here's my attempt at refining the comment; feel free to refine further: "Each completed phase is associated with an event that triggered the completion of the phase. |past_events| contains the set of events that completed prior to the reported phase. This data structure is useful for quickly computing the full set of profiled traces that occurred before or after a given event."
https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/80001/components/metrics/prof... components/metrics/profiler/tracking_synchronizer_unittest.cc:89: } On 2015/04/08 23:36:14, Ilya Sherman wrote: > On 2015/04/08 01:09:50, vadimt wrote: > > https://chromium-cpp.appspot.com/, Inherited Constructors. > > BTW, I've misread this: inherited constructors are blacklisted. > > Speaking of inherited methods, they are not present in the allowed features > > list, which makes me think that the tests you've seen are wrong. > > This isn't a C++11 feature, so that list is not relevant. You're welcome to > raise the issue on chromium-dev if you're really concerned. Done. https://codereview.chromium.org/1021053003/diff/160001/chrome/browser/metrics... File chrome/browser/metrics/chrome_metrics_service_client.h (right): https://codereview.chromium.org/1021053003/diff/160001/chrome/browser/metrics... chrome/browser/metrics/chrome_metrics_service_client.h:188: base::WeakPtrFactory<ChromeMetricsServiceClient> weak_ptr_factory_; On 2015/04/08 23:36:14, Ilya Sherman wrote: > nit: The WeakPtrFactory should always be the final member, so that it is > destroyed first. This is a helpful pattern to follow in general, so that weak > pointers are always invalidated before other members are destroyed, and possibly > try to access those weak pointers. In this case, there is no such risk; but > there's also no harm to following the pattern, and this is the sort of pattern > that is best to follow as uniformly as possible, so that readers don't need to > think about whether an exception is justified or not. Done. https://codereview.chromium.org/1021053003/diff/160001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/160001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:185: clock_(std::move(clock)) { On 2015/04/08 23:36:14, Ilya Sherman wrote: > nit: std::move is not yet permitted in Chromium code, except in a few > exceptional circumstances. Please write "clock.Pass()" instead. Done. https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... File content/common/child_process_messages.h (right): https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... content/common/child_process_messages.h:110: // Send to all the child processes to to mark the current profiling phase as On 2015/04/08 23:36:15, Ilya Sherman wrote: > nit: "to to" -> "to" Done. https://codereview.chromium.org/1021053003/diff/160001/content/common/child_p... content/common/child_process_messages.h:111: // finished and start a new one. On 2015/04/08 23:36:15, Ilya Sherman wrote: > nit: "new one" -> "new phase" Done. https://codereview.chromium.org/1021053003/diff/180001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/180001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:40: // FinishedReceivingData() is called. Done. Thanks for the suggested comment!
https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:148: for (RenderProcessHost::iterator it(RenderProcessHost::AllHostsIterator()); On 2015/04/06 23:25:12, vadimt wrote: > I checked that everything works correctly, and that this is a correct way to do > things. As for the deep roots for such separation, it would take time to figure > this out. Not sure this belongs to the scope of this CL. Sorry, forgot to follow-up on this earlier. When you said you checked, could you clarify how? I just want to make sure we don't send a message to the same process twice (as a result of the two mechanisms) that accidentally doesn't result in a problem. To avoid more back and forth discussing this - let me suggest a concrete thing to do that would convince me this is ok: Can you make the code print the process ids/types in both of the loops (the RenderProcessHost and BrowserChildProcessHostIterator loops) - and reply with the output from a Chrome that has a few tabs open. This should answer my question (i.e. whether the lists overlap or not - and if they don't overlap, the reason for it - i.e. presumably based on the type of process). Thanks! https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/120001/base/tracked_objects.h... base/tracked_objects.h:371: // pointed object. On 2015/04/08 23:31:24, vadimt wrote: > Hmmm; > DeathDataPhaseSnapshot is the node in the list, and, according to the list > pattern, doesn't own its 'prev'. Always. > > DeathData is the root of the list, and (according to the same pattern) owns the > whole list of DeathDataPhaseSnapshot's. Always. > > So, yes, I'm correcting you. Everything is consistent. > > Now, regarding comments: > The comment before DeathDataPhaseSnapshot says "Used as an element of the list > of phase snapshots owned by DeathData." This indicates not-ownership of prev. > > The comment before DeathData::last_phase_snapshot_ says "DeathData owns the > whole list starting with this pointer." > > Did you somehow mix DeathData and DeathDataSnapshot while reading code? I'm > trying to understand why we are seeing different things... I admit I missed the updated comments you added - probably because I was still looking at the old snapshot (which is where this comment is on) and not at the new one. The list root owning the whole list makes sense - though I guess I wasn't thinking of the DeathData as the root of the list (I was thinking of last_phase_snapshot_). I guess DeathData is just too decomposed for my preference - doing multiple things at the same time at a low level - i.e. if its runtime data was one struct and there was another struct/class member that was the root of the list (that encapsulated the ownership), I would find it cleaner. But I agree that it has a cost of complexity - as there's already a lot of classes in this module and adding more just makes it harder to understand (I already have a hard time reasoning about the existing ones because there are so many. :\). So I won't insist on more classes here, even though I think it would make things a bit cleaner in this instance - at the cost of increasing complexity overall as I said. :\ So I guess I'm OK with this now that I've read your updated comments. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (left): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:612: for (const auto& parent_child : parent_child_set) { Is it intentional that this feature is being removed at the same time? https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:193: // 0. This comment needs to discuss *why* it's done this way - which as I understand to avoid problems in case of races. If so, I would mention that explicitly and go into detail into what's expected to happen - i.e. "it's okay if the sample_probability_count_ reset is not seen by the other thread immediately because XYZ". https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:229: void DeathDataSnapshot::CalculateDelta(const DeathDataSnapshot& older) { How about SubtractDelta()? Calculate doesn't make it obvious that it's changing this struct. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:303: base::ThreadChecker ThreadData::snapshot_thread_checker_; Hmm, this isn't correct actually. I think this causes the initialization of the object to happen with a static initializer - whereas it should be done on the snapshot thread. It should probably be created on the first call to Snapshot() or OnProfilingPhaseComplete(). Honestly, I was hoping we could check some of the other thread assumptions - the snapshotting seems low value, so I would remove this one. I guess the value this provides is ensuring both Snapshot and OnProfilingPhaseCompleted() are not running at the same time? If so, perhaps we can just do that with a bool (i.e. snapshot_operation_running_) which we DCHECK is false on entry to Snapshot() and OnProfilingPhaseCompleted() and set true for the duration of the functions. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:455: .tasks.push_back(TaskSnapshot( Nit: Can you make a local variable pointer outside the loop to: process_data_snapshot->phased_process_data_snapshots[current_profiling_phase].tasks Then, this statement can be much cleaner. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:651: for (const auto& death : deaths) { Add a short comment above this outlining what this is doing. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:656: DeathDataSnapshot death_data = phase->death_data; I'm guessing it's important that you do this on the temporary death_data - so that we don't subtract the deltas multiple times if multiple snapshots are taken. If so, I would add a comment pointing this out here since it's an important detail. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), To make this cleaner, how about adding a Snapshot(phase) method to DeathData, which can be called both from here and DeathData::OnProfilingPhaseCompleted(). That way, this long ctor invocation is only in one place (while still avoiding the circular dependency). https://codereview.chromium.org/1021053003/diff/220001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/220001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:87: explicit TestTrackingSynchronizer(base::TickClock* clock) This should take a scoped_ptr to indicate it's taking ownership. Then, you can just move the make_scoped_ptr() call to line 109.
https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/60001/content/browser/profile... content/browser/profiler_controller_impl.cc:148: for (RenderProcessHost::iterator it(RenderProcessHost::AllHostsIterator()); I've set a breakpoint in the renderer, and checked that the message gets delivered only once. Will this work? https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (left): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:612: for (const auto& parent_child : parent_child_set) { Not quite; feel through cracks :) I'd remove it though. How about in this CL? Or, I can send a post-CL with a cleanup. (I prefer post- not pre- CL, since our consumers have already waited ridiculously long for this feature :( ) https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:193: // 0. Done https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:229: void DeathDataSnapshot::CalculateDelta(const DeathDataSnapshot& older) { Done, but we are not subtracting the delta, we are getting it. Hence the new name. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:303: base::ThreadChecker ThreadData::snapshot_thread_checker_; Fixed the lazy instance initialization. I think, per-task birth/death thread checks would be too expensive, so I'd be hesitant to add checks. Speaking of the snapshot/reset thread check, we really need all resets and snapshots to happen in the same thread, not just serially. The reason is that executing serially in different threads may cause the second call to not see results of the first call, which can lead to unpleasant things like memory leaks in the list construction. In addition, a boolean check would catch potential issues with very low probability and reproability. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:455: .tasks.push_back(TaskSnapshot( On 2015/04/09 15:39:05, Alexei Svitkine wrote: > Nit: Can you make a local variable pointer outside the loop to: > > process_data_snapshot->phased_process_data_snapshots[current_profiling_phase].tasks > > Then, this statement can be much cleaner. Done. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:651: for (const auto& death : deaths) { On 2015/04/09 15:39:06, Alexei Svitkine wrote: > Add a short comment above this outlining what this is doing. Done. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:656: DeathDataSnapshot death_data = phase->death_data; On 2015/04/09 15:39:06, Alexei Svitkine wrote: > I'm guessing it's important that you do this on the temporary death_data - so > that we don't subtract the deltas multiple times if multiple snapshots are > taken. If so, I would add a comment pointing this out here since it's an > important detail. Done. https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), I'm not super-thrilled with the additional amount of copying of DeathDataPhaseSnapshot that this introduces. Like, we create a local object of type DeathDataPhaseSnapshot inside the helper function, copy it on return of the function, copy one more time to the object allocated with new(). Do you think it's worth doing? https://codereview.chromium.org/1021053003/diff/220001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/220001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_unittest.cc:87: explicit TestTrackingSynchronizer(base::TickClock* clock) On 2015/04/09 15:39:06, Alexei Svitkine wrote: > This should take a scoped_ptr to indicate it's taking ownership. Then, you can > just move the make_scoped_ptr() call to line 109. Done.
https://codereview.chromium.org/1021053003/diff/240001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/240001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:37: // with an instance of ProfilerEventProto::ProfilerEvent. Note that my suggested comment was meant to replace the sentence starting with "All profiling phases prior..."
https://codereview.chromium.org/1021053003/diff/240001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/240001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:37: // with an instance of ProfilerEventProto::ProfilerEvent. On 2015/04/09 21:37:32, Ilya Sherman wrote: > Note that my suggested comment was meant to replace the sentence starting with > "All profiling phases prior..." Done.
https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), On 2015/04/09 21:28:39, vadimt wrote: > I'm not super-thrilled with the additional amount of copying of > DeathDataPhaseSnapshot that this introduces. Like, we create a local object of > type DeathDataPhaseSnapshot inside the helper function, copy it on return of the > function, copy one more time to the object allocated with new(). Do you think > it's worth doing? I believe modern compilers will actually optimize the return value copies out, per: http://en.wikipedia.org/wiki/Return_value_optimization But yeah, I missed that the other one is allocated with new, so the return of the value would have to be copied over that. The other alternative would be to make it an out param that's filled. But I guess in that case there's extra zero-ing of the data before we fill it for the 'new' case - but can be avoided in this loop by re-using the same DeathDataPhaseSnapshot instance on the stack that's re-set for each entry (and push_backed() which does cause a copy - but the same one that's already the case). https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:471: .tasks; Nit: The .tasks can definitely go on the previous line.
https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), Given these not-quite-pleasant choices, the current implementation doesn't look that bad, right? https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:471: .tasks; git cl format will return it to the new line after that.
Getting close I think :) https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), On 2015/04/09 22:42:20, vadimt wrote: > Given these not-quite-pleasant choices, the current implementation doesn't look > that bad, right? All right, though probably worth recording the rationale somewhere in a comment. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:18: #include "base/stl_util.h" Nit: Is this include needed? https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:209: // can cause accumulating or cascading effects. Thanks for this write-up - this makes sense. I would also mention why this is still correct if no new deaths happen for this task - i.e. what ensures that we don't send a sample value from the previous phase? (i.e. the check that we don't include this snapshot if it had a count of 0 during this phase ... done in function XYZ) https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:447: BirthCountMap birth_counts; Nit: Move this closer to where it's used .. e.g. above the loop below the "Gather data serially." comment block. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:471: .tasks; On 2015/04/09 22:42:20, vadimt wrote: > git cl format will return it to the new line after that. Hmm, that's too bad. I really dislike excessive wrapping since it makes code much less concise and harder to read. How about renaming phased_process_data_snapshots to |phased_snapshots|? It's already a member of ProcessDataSnapshot - so the "process_data" substring is redundant. Then, it should make wrapping here and on line 460 better I think. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:671: (*birth_counts)[death.first] -= death.first->birth_count(); Can you add a DCHECK that the existing value is in the map and is >= death.first->birth_count()? (Or if it's not guaranteed due to subtle expected races that can happen, we should have extra code that protects this.) https://codereview.chromium.org/1021053003/diff/260001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/260001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:308: if (variations::GetVariationParamValue("UMALogUploadInterval", I suggest using a different field trial for rolling this out rather than re-purposing UMALogUploadInterval
https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/220001/base/tracked_objects.c... base/tracked_objects.cc:683: DeathDataPhaseSnapshot(profiling_phase, death.second.count(), On 2015/04/10 15:27:26, Alexei Svitkine wrote: > On 2015/04/09 22:42:20, vadimt wrote: > > Given these not-quite-pleasant choices, the current implementation doesn't > look > > that bad, right? > > All right, though probably worth recording the rationale somewhere in a comment. Done. Also reordered declarations to avoid forward declarations now that the circle is broken. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:18: #include "base/stl_util.h" On 2015/04/10 15:27:27, Alexei Svitkine wrote: > Nit: Is this include needed? Done. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:447: BirthCountMap birth_counts; On 2015/04/10 15:27:27, Alexei Svitkine wrote: > Nit: Move this closer to where it's used .. e.g. above the loop below the > "Gather data serially." comment block. Done. https://codereview.chromium.org/1021053003/diff/240001/base/tracked_objects.c... base/tracked_objects.cc:671: (*birth_counts)[death.first] -= death.first->birth_count(); This value can temporarily be negative. We may first encounter a thread with deaths for a given birth (and the counter becomes -deaths), and later go through a thread that contains births (the, we have births-deaths). I.e. the 2 loops in this method are NOT supposed to balance each other. ThreadData::Snapshot checks that the final value >= 0. https://codereview.chromium.org/1021053003/diff/260001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/260001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:308: if (variations::GetVariationParamValue("UMALogUploadInterval", On 2015/04/10 15:27:27, Alexei Svitkine wrote: > I suggest using a different field trial for rolling this out rather than > re-purposing UMALogUploadInterval Done.
lgtm https://codereview.chromium.org/1021053003/diff/280001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/280001/base/tracked_objects.h... base/tracked_objects.h:306: int profiling_phase; Nit: Separate by empty lines since they have comments. https://codereview.chromium.org/1021053003/diff/280001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/280001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:279: int current_profiling_phase = phase_completion_events_sequence_.size(); Nit: const int https://codereview.chromium.org/1021053003/diff/280001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:313: int profiling_phase = phase_completion_events_sequence_.size(); Nit: const int https://codereview.chromium.org/1021053003/diff/280001/content/browser/profil... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/280001/content/browser/profil... content/browser/profiler_controller_impl.cc:164: base::Unretained(this), profiling_phase)); I think you can make NotifyChildProcessesOfProfilingPhaseCompletion static and then you don't need to pass base::Unretained(this).
Thanks asvitkine@! https://codereview.chromium.org/1021053003/diff/280001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/280001/base/tracked_objects.h... base/tracked_objects.h:306: int profiling_phase; On 2015/04/14 16:32:48, Alexei Svitkine wrote: > Nit: Separate by empty lines since they have comments. Done. https://codereview.chromium.org/1021053003/diff/280001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/280001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:313: int profiling_phase = phase_completion_events_sequence_.size(); On 2015/04/14 16:32:48, Alexei Svitkine wrote: > Nit: const int Done. https://codereview.chromium.org/1021053003/diff/280001/content/browser/profil... File content/browser/profiler_controller_impl.cc (right): https://codereview.chromium.org/1021053003/diff/280001/content/browser/profil... content/browser/profiler_controller_impl.cc:164: base::Unretained(this), profiling_phase)); On 2015/04/14 16:32:48, Alexei Svitkine wrote: > I think you can make NotifyChildProcessesOfProfilingPhaseCompletion static and > then you don't need to pass base::Unretained(this). Done.
Still wading through the //base changes, but here are some initial comments: https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:104: last_phase_snapshot_ = nullptr; Why aren't these being set as part of the initializer list? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:264: // The alternative would be taking a DeathData parameter, bout this would nit: s/bout/but https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:266: // wrapper structure as a param or using an empty constructor for snapshotting nit: "Passing wrapper structure" -> "Passing a wrapper structure" https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:267: // DeathData would be less efficient. It looks like DeathData only refers to this class via a pointer. Why not forward-declare these classes, so that DeathData can refer to them, but define them below? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; Why not use a scoped_ptr, both here and below? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:327: // Default initializer. nit: Please omit this comment. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:329: nit: Please omit this newline. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:612: // |return_of_snapshot_hrenina|. Also updates the |birth_counts| tally for return_of_snapshot_hrenina? https://codereview.chromium.org/1021053003/diff/300001/content/browser/profil... File content/browser/profiler_controller_impl.h (right): https://codereview.chromium.org/1021053003/diff/300001/content/browser/profil... content/browser/profiler_controller_impl.h:60: int profiling_phase); Can this be moved into the anonymous namespace?
As-implemented, it's possible for the list of snapshotted phases to grow without bound, which could possibly impose a significant memory overhead on the Chrome browser. Could you please add a CHECK somewhere to ensure that this is not possible? If we ever want to implement more than one phase transition, we should give some thought to possibly switching over to a more efficient representation. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:142: // currrent profiling phase. nit: "currrent" -> "the current" (add "the", drop an "r") https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:454: // This hackish approach *can* get some slighly corrupt tallies, as we are nit: hackish -> hacky or hackyish? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:489: // Add snapshots for all death datas in all threads serially. nit: "data" is already plural. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:684: // results. It seems easier for SubtractSnapshot() to just return a copy, rather than modifying the object in place. Why did you decide not to follow that route? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:688: death_data.SubtractOlderSnapshot(phase->prev->death_data); It looks like you have to repeat this calculation multiple times. Why not just store the deltas? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:729: void ThreadData::OnProfilingPhaseCompletionOnThread(int profiling_phase) { What does "OnThread" mean? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:729: void ThreadData::OnProfilingPhaseCompletionOnThread(int profiling_phase) { nit: Should "Completion" be "Completed"?
Added a check for indefinite snapshots growth to TrackingSynchronizer::NotifyAllProcessesOfProfilingPhaseCompletion. Also wrapped part of arguments of ReceivedProfilerData. Now we have 3, which is not too many. I didn’t wrap the ones that are relatively expensive to copy. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:104: last_phase_snapshot_ = nullptr; On 2015/04/15 00:37:35, Ilya Sherman wrote: > Why aren't these being set as part of the initializer list? Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:142: // currrent profiling phase. On 2015/04/15 00:59:12, Ilya Sherman wrote: > nit: "currrent" -> "the current" (add "the", drop an "r") Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:454: // This hackish approach *can* get some slighly corrupt tallies, as we are Hackish is more correct than Hackyish (based on Web frequency). As for Hacky, I believe it better reflects jar's idea that it's not very hacky, just slightly... https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:489: // Add snapshots for all death datas in all threads serially. On 2015/04/15 00:59:12, Ilya Sherman wrote: > nit: "data" is already plural. Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:684: // results. Done. The original idea was to reduce number of copies. But seems like return value optimization can solve this. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:688: death_data.SubtractOlderSnapshot(phase->prev->death_data); For the majority of users who don't open chrome://profiler, we calculate deltas only when/if we send profiler data, so I don't see perf issues. Storing deltas would make the algorithms less clear. It would be less clear what we store: actual values in DeathData, deltas in all chained snapshots except the earliest one, actual numbers again for the first one. We'd have to calculate deltas in 2 different places: when making snapshot - for the current numbers in DeathData, and also when making phase change-related snapshot. It's easier to mess with this... https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:729: void ThreadData::OnProfilingPhaseCompletionOnThread(int profiling_phase) { On 2015/04/15 00:59:12, Ilya Sherman wrote: > nit: Should "Completion" be "Completed"? Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.c... base/tracked_objects.cc:729: void ThreadData::OnProfilingPhaseCompletionOnThread(int profiling_phase) { This is to differentiate the instance member from the static ThreadData::OnProfilingPhaseCompleted. Better ideas are welcome, but I'd dislike something like OnProfilingPhaseCompletedNonStatic :) https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:264: // The alternative would be taking a DeathData parameter, bout this would On 2015/04/15 00:37:35, Ilya Sherman wrote: > nit: s/bout/but Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:266: // wrapper structure as a param or using an empty constructor for snapshotting On 2015/04/15 00:37:35, Ilya Sherman wrote: > nit: "Passing wrapper structure" -> "Passing a wrapper structure" Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:267: // DeathData would be less efficient. We spent lots of effort breaking the loop between DeathDataSnapshot and DeathData (asvitkine@ preference). I think it would be in the spirit of these changes to now order the classes correspondingly, otherwise, what was the point of breaking the loop? https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; DeathData owns the whole list of DeathDataPhaseSnapshot's and deletes everything in its destructor in a loop. However, we happen to create copies of DeathDataPhaseSnapshot, and we don't want to deleted the list pointed by 'prev' when the copy is destructed. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:327: // Default initializer. On 2015/04/15 00:37:35, Ilya Sherman wrote: > nit: Please omit this comment. Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:329: On 2015/04/15 00:37:35, Ilya Sherman wrote: > nit: Please omit this newline. Done. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:612: // |return_of_snapshot_hrenina|. Also updates the |birth_counts| tally for Done :) One of the primordial names leaked into the final code :) https://codereview.chromium.org/1021053003/diff/300001/content/browser/profil... File content/browser/profiler_controller_impl.h (right): https://codereview.chromium.org/1021053003/diff/300001/content/browser/profil... content/browser/profiler_controller_impl.h:60: int profiling_phase); I'd prefer this method to stay next to the similar GetProfilerDataFromChildProcesses, and be as similar to it as possible...
(still lgtm) https://codereview.chromium.org/1021053003/diff/320001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/320001/base/tracked_objects.c... base/tracked_objects.cc:683: const DeathDataSnapshot death_data = Nit: You can make this a reference and then in the case when there's no prev, it gets bound to phase->death_data avoiding the copy, otherwise it gets bound to the return value of Delta() (whose lifetime is extended to that of the reference). https://codereview.chromium.org/1021053003/diff/320001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/320001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:319: DCHECK(profiling_phase < 1); DCHECK_LT?
https://codereview.chromium.org/1021053003/diff/320001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/320001/base/tracked_objects.c... base/tracked_objects.cc:683: const DeathDataSnapshot death_data = Nice! https://codereview.chromium.org/1021053003/diff/320001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer.cc (right): https://codereview.chromium.org/1021053003/diff/320001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer.cc:319: DCHECK(profiling_phase < 1); On 2015/04/15 19:22:50, Alexei Svitkine wrote: > DCHECK_LT? Done.
I'm going to try CQ dry run. This is not an attempt to land before getting all approvals ;)
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps340001 (title: "asvitkine@ post-review comments.")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/340001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_gn_chromeos_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_gn_dbg on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) linux_chromium_gn_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
Had a compiler error in Linux because DeathData doesn't have a copy constructor. Uploaded a fix, will try it now.
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps360001 (title: "Attempting to fix a compiler error in Linux.")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/360001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_gn_chromeos_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps380001 (title: "Another platform-dependent error: no 'override' for virtual destructor.")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/380001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps400001 (title: "Another platform-dependent error fix: refcounted objects shouldn't have public destructors.")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/400001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps420001 (title: "Continuing fighting with virtual destructors.")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/420001
Please note that there is a comment on patch set 16, and two on patch set 18. The rest are all on patch set 20. https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; On 2015/04/15 19:14:31, vadimt wrote: > DeathData owns the whole list of DeathDataPhaseSnapshot's and deletes everything > in its destructor in a loop. > However, we happen to create copies of DeathDataPhaseSnapshot, and we don't want > to deleted the list pointed by 'prev' when the copy is destructed. Hmm, where do we create copies? This seems like a somewhat fragile memory management situation. What guarantees that the memory won't be deleted out from under the copy? https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.cc (right): https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.cc:10: // ProfilerDataAttributes methods and members. nit: This comment does not seem very necessary in this tiny file. https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:52: base::TimeTicks phase_finish; nit: Can all of the attributes be marked as const? https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:120: DCHECK(other.last_phase_snapshot_ == nullptr); nit: DCHECK_EQ https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:196: } nit: Hmm, all of these accessors should be inlined, per the common Chromium style. Could you please fix this up in a follow-up CL? https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:203: queue_duration_sample_, last_phase_snapshot_); Man, I really disagree with Alexei about not just passing |*this| here. It's just so easy to accidentally swap the order of two arguments, and not realize that you've done so. Can you please remind us to come back to this discussion as a follow-up? https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:206: // snapshot from previos phase. Resetting other fields. Sample values will be nit: "from previos" -> "from the previous" https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:479: nit: Please omit this newline. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:317: // Access to members of this class is never protected by the lock. The fields nit: s/the lock/a lock https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:321: // thread. It doesn't matter what thread it is, but it's important the same I find the sentence "All snapshot and phase change notifications ..." confusing. It looks like this refers to precisely two methods, OnProfilingPhaseCompleted() and last_phase_snapshot(). Is that accurate? If so, it might be clearer to simply refer to the methods by name. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:346: DeathDataPhaseSnapshot* last_phase_snapshot() const; nit: Const methods should never return non-const pointers. Please either return a const pointer or mark this method as non-const. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:377: // and rarely updated. They can be modified only on the death thread. nit: The existing comments in this file seem to use two periods between sentences. Please maintain consistency with that precedent. (Applies throughout.) https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:387: void operator=(DeathData const&); nit: Please use DISALLOW_ASSIGN. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:126: // Don't run the test if task tracking is not compiled in. It looks to me like this case is no longer reachable. Can we change ThreadData::InitializeAndSetTrackingStatus(), as well as ThreadData::Initialize() and ThreadLocalStorage::StaticSlot::Initialize(), to be a void functions; and remove these checks from the tests? https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:333: data->OnProfilingPhaseCompleted(123); From here down seems to be an unrelated tests that you've appended to an existing test. Please create a new test instead. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:436: auto it0 = process_data.phased_snapshots.find(0); Why did you add a "0" to this variable name? (Applies throughout) https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:440: ASSERT_EQ(0u, process_data_phase0.tasks.size()); Why did you change this to an ASSERT, and why did you remove the descendants check? (Applies throughout) https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:862: } Would it be useful to also add a FirstEmpty test?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/300001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; Terminological clarification: not quite a 'copy', but an instance of DeathDataPhaseSnapshot gets created in ThreadData::SnapshotMaps. DeathDatas are simply leaked in normal Chrome, i.e. ThreadDatas are never deleted. This is by design, or at least, how it always was. Note that DeathDatas are never removed from the death map. In tests, ThreadData, and all DeathDatas that it owns inside its death map, are deleted at the end of each test. Just in case, I added a thread check in ~ThreadData to totally rule out races now and for all. https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.cc (right): https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.cc:10: // ProfilerDataAttributes methods and members. On 2015/04/17 02:14:26, Ilya Sherman wrote: > nit: This comment does not seem very necessary in this tiny file. Done. https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... File components/metrics/profiler/tracking_synchronizer_observer.h (right): https://codereview.chromium.org/1021053003/diff/340001/components/metrics/pro... components/metrics/profiler/tracking_synchronizer_observer.h:52: base::TimeTicks phase_finish; On 2015/04/17 02:14:26, Ilya Sherman wrote: > nit: Can all of the attributes be marked as const? Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:120: DCHECK(other.last_phase_snapshot_ == nullptr); That's what I originally tried. Got an error. DCHECK_EQ doesn't know how to print values of DeathDataPhaseSnapshot. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:196: } Will do. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:203: queue_duration_sample_, last_phase_snapshot_); Will do. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:206: // snapshot from previos phase. Resetting other fields. Sample values will be On 2015/04/17 02:14:26, Ilya Sherman wrote: > nit: "from previos" -> "from the previous" Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.c... base/tracked_objects.cc:479: On 2015/04/17 02:14:26, Ilya Sherman wrote: > nit: Please omit this newline. Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:317: // Access to members of this class is never protected by the lock. The fields On 2015/04/17 02:14:27, Ilya Sherman wrote: > nit: s/the lock/a lock Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:321: // thread. It doesn't matter what thread it is, but it's important the same On 2015/04/17 02:14:26, Ilya Sherman wrote: > I find the sentence "All snapshot and phase change notifications ..." confusing. > It looks like this refers to precisely two methods, OnProfilingPhaseCompleted() > and last_phase_snapshot(). Is that accurate? If so, it might be clearer to > simply refer to the methods by name. Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:346: DeathDataPhaseSnapshot* last_phase_snapshot() const; On 2015/04/17 02:14:27, Ilya Sherman wrote: > nit: Const methods should never return non-const pointers. Please either return > a const pointer or mark this method as non-const. Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects.h... base/tracked_objects.h:377: // and rarely updated. They can be modified only on the death thread. On 2015/04/17 02:14:27, Ilya Sherman wrote: > nit: The existing comments in this file seem to use two periods between > sentences. Please maintain consistency with that precedent. (Applies > throughout.) Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:126: // Don't run the test if task tracking is not compiled in. How about a separate CL? https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:333: data->OnProfilingPhaseCompleted(123); This test checks states of a single DeathData object. I don't break this paradigm by adding a phase completion... I could add another test, but it would have to include the first one, which makes the first one unnecessary, so we arrive to the same place. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:436: auto it0 = process_data.phased_snapshots.find(0); Uniformity. Some (other) tests also have it1 for phase 1 etc. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:440: ASSERT_EQ(0u, process_data_phase0.tasks.size()); Checks of size() are ASSERTS in other tests because they are followed by accessing elements of the array. Here too, for uniformity. Descendants got broken by this CL, and I make no attempts to fix them. Per an comment exchange with asvitkine@, I'll remove the descendants-collecting code in a separate CL. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:862: } Why not? Done.
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps440001 (title: "More from isherman@")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/440001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
LGTM % the remaining nits. Thanks, Vadim. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:126: // Don't run the test if task tracking is not compiled in. On 2015/04/17 16:15:51, vadimt wrote: > How about a separate CL? Okay. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:333: data->OnProfilingPhaseCompleted(123); On 2015/04/17 16:15:51, vadimt wrote: > This test checks states of a single DeathData object. I don't break this > paradigm by adding a phase completion... > I could add another test, but it would have to include the first one, which > makes the first one unnecessary, so we arrive to the same place. You wouldn't need to include all of the EXPECTs from the first one, as they wouldn't be what the test is about. I'd still prefer that you separate this test case, because smaller tests are easier to reason about and maintain. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:436: auto it0 = process_data.phased_snapshots.find(0); On 2015/04/17 16:15:51, vadimt wrote: > Uniformity. Some (other) tests also have it1 for phase 1 etc. IMO this uniformity is more confusing than helpful. It makes me look around for an it1, and when I don't find one in this test, I'm annoyed to have spent time looking for it. I'd prefer to only use it0 as a name when necessary to distinguish from an it1. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:440: ASSERT_EQ(0u, process_data_phase0.tasks.size()); On 2015/04/17 16:15:51, vadimt wrote: > Checks of size() are ASSERTS in other tests because they are followed by > accessing elements of the array. Here too, for uniformity. > > Descendants got broken by this CL, and I make no attempts to fix them. Per an > comment exchange with asvitkine@, I'll remove the descendants-collecting code in > a separate CL. I'd slightly prefer to only use ASSERTs where necessary, because that seems to be the prevailing style in the Chromium code base. https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.c... base/tracked_objects.cc:207: // snapshot from the previois phase. Resetting other fields. Sample values nit: s/previois/previous https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; nit: Can this be a "const DeathDataPhaseSnapshot*"? If so, perhaps you could restore the const on line 724 in the .cc file.
isherman@, thanks! https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... File base/tracked_objects_unittest.cc (right): https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:126: // Don't run the test if task tracking is not compiled in. On 2015/04/18 04:32:19, Ilya Sherman wrote: > On 2015/04/17 16:15:51, vadimt wrote: > > How about a separate CL? > > Okay. Acknowledged. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:333: data->OnProfilingPhaseCompleted(123); On 2015/04/18 04:32:19, Ilya Sherman wrote: > On 2015/04/17 16:15:51, vadimt wrote: > > This test checks states of a single DeathData object. I don't break this > > paradigm by adding a phase completion... > > I could add another test, but it would have to include the first one, which > > makes the first one unnecessary, so we arrive to the same place. > > You wouldn't need to include all of the EXPECTs from the first one, as they > wouldn't be what the test is about. I'd still prefer that you separate this > test case, because smaller tests are easier to reason about and maintain. Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:436: auto it0 = process_data.phased_snapshots.find(0); On 2015/04/18 04:32:19, Ilya Sherman wrote: > On 2015/04/17 16:15:51, vadimt wrote: > > Uniformity. Some (other) tests also have it1 for phase 1 etc. > > IMO this uniformity is more confusing than helpful. It makes me look around for > an it1, and when I don't find one in this test, I'm annoyed to have spent time > looking for it. I'd prefer to only use it0 as a name when necessary to > distinguish from an it1. Done. https://codereview.chromium.org/1021053003/diff/380001/base/tracked_objects_u... base/tracked_objects_unittest.cc:440: ASSERT_EQ(0u, process_data_phase0.tasks.size()); Yeah, but this piece may be copied somewhere else, with checking the array's elements added after it, in which case, we'd get a problem. So... here we are speaking not only about uniformity, but also safety, which is a slightly stronger argument. I'd leave as is with full respect etc. https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.c... base/tracked_objects.cc:207: // snapshot from the previois phase. Resetting other fields. Sample values On 2015/04/18 04:32:19, Ilya Sherman wrote: > nit: s/previois/previous Done. https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; We need a non-const pointer because we use this pointer for deletion from ~DeathData.
vadimt@chromium.org changed reviewers: + dcheng@chromium.org
dcheng@chromium.org: Please review changes in child_process_messages.h Some background is here: https://codereview.chromium.org/985773002#msg32 What's changed: 1. Not the browser process sends commands to child processes to finish recording the current profiling phase and start recording new profiling phase. This will ultimately allow getting profiling information about tasks executed specifically during or after Chrome startup. (ChildProcessMsg_ProfilingPhaseCompleted) 2. Added to ChildProcessMsg_GetChildProfilerData the number of the current profiling phase (needed for child processes that started after a profiling phase completed).
vadimt@chromium.org changed reviewers: + cpu@chromium.org
cpu@chromium.org: Please provide OWNERs approval for: * base/ * chrome/browser/chrome_browser_main.cc * chrome/browser/ui/webui/profiler_ui.h * content/* (excluding child_process_messages.h) Sorry for the size, but this is pretty much an atomic change.
https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.c... base/tracked_objects.cc:725: deaths->push_back(DeathsSnapshot::value_type( std::make_pair would be slightly shorter to write (and make it slightly clearer, IMO) https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:89: // instance at a Location. In many cases, instances are only created on one These comment changes are kind of distracting. If it's really that important, normalize them in a separate patch? FWIW: dcheng@nausicaa:~/src/chrome/src$ git gs "\. " | wc -l 58515 dcheng@nausicaa:~/src/chrome/src$ git gs "\. " | grep -v "\. " | wc -l 287446 https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:268: DeathDataSnapshot(int count, It looks a little funny that this one parameter is int. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:386: DeathDataPhaseSnapshot* last_phase_snapshot_; Any particular reason not to make this a scoped_ptr?
https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.c... base/tracked_objects.cc:725: deaths->push_back(DeathsSnapshot::value_type( On 2015/04/20 17:34:07, dcheng wrote: > std::make_pair would be slightly shorter to write (and make it slightly clearer, > IMO) Done. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:89: // instance at a Location. In many cases, instances are only created on one Yeah, here we have differences in opinions of different code reviewers. I was explicitly asked to do so. Now, I can remove the changes, but this will be a considerable amount of extra work since the original comments had a mix of 2-space and 1-space separators between sentences, and also I'd need (in a cascading way) to move words 1 line higher to avoid deltas. At this point, my question is: would it be OK if I left everything as is? https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:268: DeathDataSnapshot(int count, Because it's the only int member in DeathData. There, it's int because it's a counter. Others are int32 because they are time durations and are results of Duration::InMilliseconds(), which is int32. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:386: DeathDataPhaseSnapshot* last_phase_snapshot_; DeathData owns the whole linked list DeathDataPhaseSnapshot's, which head is last_phase_snapshot_. So, for this to make sense, we'd need to also make DeathDataPhaseSnapshot::prev a scoped_ptr. However, we happen to create temporary copies of DeathDataPhaseSnapshot that point to the list in DeathData. We don't want them to free the list upon destruction, hence not using scoped_ptr.
Will try a CQ dry run one more time...
The CQ bit was checked by vadimt@chromium.org to run a CQ dry run
The patchset sent to the CQ was uploaded after l-g-t-m from isherman@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps480001 (title: "dchen@ comments")
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/480001
https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; On 2015/04/20 15:23:23, vadimt wrote: > We need a non-const pointer because we use this pointer for deletion from > ~DeathData. It should be possible to delete a const pointer. Note that I'm recommending "const DeathDataPhaseSnapshot* prev", which is semantically equivalent to "DeathDataPhaseSnapshot const * prev", not "DeathDataPhaseSnapshot* const prev". https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:89: // instance at a Location. In many cases, instances are only created on one On 2015/04/20 19:43:27, vadimt wrote: > Yeah, here we have differences in opinions of different code reviewers. > I was explicitly asked to do so. > > Now, I can remove the changes, but this will be a considerable amount of extra > work since the original comments had a mix of 2-space and 1-space separators > between sentences, and also I'd need (in a cascading way) to move words 1 line > higher to avoid deltas. At this point, my question is: would it be OK if I left > everything as is? I guess I'm fine with normalizing to either option, and doing so in a follow-up CL if necessary. I called this out because *this* file tends to use double-spacing most everywhere, and it looked very weird to have mixed spacing within individual comments.
https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/440001/base/tracked_objects.h... base/tracked_objects.h:312: DeathDataPhaseSnapshot* prev; Indeed. Looks C++ has changed in last XX years. Way back, you couldn't delete a const pointer. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:89: // instance at a Location. In many cases, instances are only created on one My point is that restoring the original state can't be simply done using grep. It's relatively unpleasant manual procedure. I wonder if we could land as is, since now the comments are normalized, and the only problem is some extra diffs, which IMHO one can tolerate...
https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:89: // instance at a Location. In many cases, instances are only created on one On 2015/04/21 00:42:52, vadimt wrote: > My point is that restoring the original state can't be simply done using grep. > It's relatively unpleasant manual procedure. > I wonder if we could land as is, since now the comments are normalized, and the > only problem is some extra diffs, which IMHO one can tolerate... I'm OK with leaving comments in their current state in this CL. That being said, it wouldn't actually be that hard to pull those changes out with git add -p. But it's not really worth spending time over at this point IMO. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:386: DeathDataPhaseSnapshot* last_phase_snapshot_; On 2015/04/20 19:43:27, vadimt wrote: > DeathData owns the whole linked list DeathDataPhaseSnapshot's, which head is > last_phase_snapshot_. So, for this to make sense, we'd need to also make > DeathDataPhaseSnapshot::prev a scoped_ptr. > However, we happen to create temporary copies of DeathDataPhaseSnapshot that > point to the list in DeathData. We don't want them to free the list upon > destruction, hence not using scoped_ptr. I don't follow. I see one place that copies these (presumably when they're inserted into a STL container), and it asserts that last_phase_snapshot_ is null. But here, you are claiming that ownership of last_phase_snapshot_ is not unique? Who is clearing those last_phase_snapshot_ pointers on these copies then? I can't seem to find it in the code.
I'll review chrome/browser but please add jam@ for content. Also ipc changes need security review.
cpu@, security changes are already being reviewed by dcheng@. And I'll add jam@. https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:386: DeathDataPhaseSnapshot* last_phase_snapshot_; The DeathData copy constructor is indeed called upon construction, and at that point, the pointer is NULL, so no problems here. However, ThreadData::SnapshotMaps creates a copy (more precisely, an instance) of DeathDataPhaseSnapshot that contains a snapshot of DeathData, and points at the list that hangs from that DeathData. We don't want do delete that tail when the copy created in ThreadData::SnapshotMaps gets deleted. Hence, not using scoped_ptr in DeathDataPhaseSnapshot, hence in DeathData.
vadimt@chromium.org changed reviewers: + jam@chromium.org
jam@, please review content/ changes.
On 2015/04/21 16:55:00, vadimt wrote: > jam@, please review content/ changes. content lgtm
lgtm https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:215: // off-by-little corruptions to be likely fixed at the next snapshot. that sounds scary. Is TSAN going to complain? I guess this has not been too bad since we had the old code at line 550 - 580.
https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:215: // off-by-little corruptions to be likely fixed at the next snapshot. The old code did practically same thing, and TSAN didn't complain. Snapshotting of profiling data existed for long time, and snapshots were made when we send UMA data, and when the user opens chrome://profiler page. Now we also do a snapshot upon the first nonempty Paint. Given that there are no existing suppressions, and that TSAN didn't complain, it probably just doesn't detect the old pattern, so it won't be detecting the new one.
IPC changes lgtm https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/460001/base/tracked_objects.h... base/tracked_objects.h:386: DeathDataPhaseSnapshot* last_phase_snapshot_; On 2015/04/21 16:51:47, vadimt wrote: > The DeathData copy constructor is indeed called upon construction, and at that > point, the pointer is NULL, so no problems here. > However, ThreadData::SnapshotMaps creates a copy (more precisely, an instance) > of DeathDataPhaseSnapshot that contains a snapshot of DeathData, and points at > the list that hangs from that DeathData. We don't want do delete that tail when > the copy created in ThreadData::SnapshotMaps gets deleted. Hence, not using > scoped_ptr in DeathDataPhaseSnapshot, hence in DeathData. OK, I understand the problem now. The problem isn't DeathData; it's that making DeathDataPhaseSnapshot's prev pointer a scoped_ptr would cause problems. https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:492: auto current_phase_tasks = Nit: I like LLVM's style (http://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto) to use auto* when copying pointers, to make it clear that this isn't an expensive copy.
The CQ bit was checked by vadimt@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org, isherman@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps500001 (title: "More wisdom from isherman@.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/500001
glider@chromium.org changed reviewers: + glider@chromium.org
https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:215: // off-by-little corruptions to be likely fixed at the next snapshot. On 2015/04/23 17:57:17, vadimt wrote: > The old code did practically same thing, and TSAN didn't complain. Not sure why we don't have suppressions for the existing code (which exactly code are you talking about BTW?), but it has always been known that tracked_objects.cc is a pile of races. We've discussed that with Jim Roskind a couple of times, and his position was that this code is localized at a single place, so it'll be easy to fix when it breaks. On the other hand, fixing it prematurely was likely to slow this code down, so he didn't want to do that. > Snapshotting of profiling data existed for long time, and snapshots were made > when we send UMA data, and when the user opens chrome://profiler page. > Now we also do a snapshot upon the first nonempty Paint. > Given that there are no existing suppressions, and that TSAN didn't complain, it > probably just doesn't detect the old pattern, so it won't be detecting the new > one. Perhaps we're lacking tests that execute this code in multiple threads?
https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:215: // off-by-little corruptions to be likely fixed at the next snapshot. Everything jar@ said still holds now. And I discussed with him the updated design. We still do DeathData snapshotting in a racy way, still possible errors caused by this are not disastrous and "self-healing" if you repeat a snapshot etc. Speaking of adding tests that would execute the code in multiple threads, I don't have a good idea if we need them. If the only outcome would be getting new TSAN warnings that we'll have to immediately suppress, then perhaps we shouldn't do this. If there are real benefits, it would be interesting to learn about them, obviously. Thanks!
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:492: auto current_phase_tasks = Almost missed this comment. Will look at.
> https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... > base/tracked_objects.cc:215: // off-by-little corruptions to be likely fixed at > the next snapshot. > Everything jar@ said still holds now. And I discussed with him the updated > design. > We still do DeathData snapshotting in a racy way, still possible errors caused > by this are not disastrous and "self-healing" if you repeat a snapshot etc. Our main concern is that data races do sometimes lead to miscompilations which may affect other data in addition to these counters. I agree that this is no worse than the current situation. > Speaking of adding tests that would execute the code in multiple threads, I > don't have a good idea if we need them. If the only outcome would be getting new > TSAN warnings that we'll have to immediately suppress, then perhaps we shouldn't > do this. If there are real benefits, it would be interesting to learn about > them, obviously. The only potential benefit is early detection of data corruptions in the case the compiler tries to take advantage of the undefined behavior here. Given its probability, I think we can live without these tests now. > Thanks!
dvyukov@google.com changed reviewers: + dvyukov@google.com
Haven't read all the code here. But if there are tracked_objects.cc, they can well be harmful today. https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:149: ++sample_probability_count_; Can RecordDeath race with itself? If so, sample_probability_count_ can overflow negative values (which is probably bad) and then back to zero (which will cause CHECK failure and/or division by zero again). https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:229: sample_probability_count_ = 0; What kind of races does happen on sample_probability_count_? Unless I am missing something, this reset to 0 can cause CHECK failure of division by zero in DeathData::RecordDeath.
https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:149: ++sample_probability_count_; RecordDeath can' race. It's always called in the same thread for a given DeathData instance. What can race is RecordDeath vs. snapshotting. https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:229: sample_probability_count_ = 0; You are right; thanks for pointing this out before we started to get crash reports! Fixed. https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.c... base/tracked_objects.cc:492: auto current_phase_tasks = On 2015/04/23 20:50:40, vadimt wrote: > Almost missed this comment. Will look at. Done. https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.h File base/tracked_objects.h (right): https://codereview.chromium.org/1021053003/diff/500001/base/tracked_objects.h... base/tracked_objects.h:696: snapshot_thread_checker_; Removing snapshot_thread_checker_. Its checks fail in browser_tests. Presumably, browser_tests starts a new thread per each test, hence the failures. This could have been fixed by deinitializing snapshot_thread_checker_ between tests, but LazyInstance doesn't allow dropping the value to NULL. At this point, it's time to recall that this checker was squeezed in opportunistically, and as far as I understand asvitkine comments, it doesn't provide a huge value in its current form. So, I'm just removing it. We can consider re-adding it, if we decide that it's worth the effort.
The CQ bit was checked by vadimt@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from cpu@chromium.org, jam@chromium.org, dcheng@chromium.org, asvitkine@chromium.org, isherman@chromium.org Link to the patchset: https://codereview.chromium.org/1021053003/#ps520001 (title: "More comments.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1021053003/520001
Message was sent while issue was closed.
Committed patchset #27 (id:520001)
Message was sent while issue was closed.
Patchset 27 (id:??) landed as https://crrev.com/e2de4734b01f37454f939bc493a8a5cb59ce3c42 Cr-Commit-Position: refs/heads/master@{#327143}
Message was sent while issue was closed.
https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:149: int sample_probability_count = sample_probability_count_; This is still racy and can cause all the same effects. This needs to use atomic operations. https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:233: sample_probability_count_ = 0; This store needs to be an atomic store.
Message was sent while issue was closed.
Dmitry, thanks for the comments! https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:149: int sample_probability_count = sample_probability_count_; Please note that we use local variable sample_probability_count as the denominator, and I don't see how it could be 0. To be 0, it has to receive -1 value in line 149. I can't see how it can receive a negative value at all, given that we have an overflow check. Speaking about atomicity: are you saying that if another thread zeroes sample_probability_count_, we may get a value with some bytes replaced with zeros, and some bytes still non-zero? On which platforms? https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:233: sample_probability_count_ = 0; We should try avoiding using barriers. Performance is a big concern here. It's OK to have some rare corruptions from time to time (of course, except zero divide). So, let's try looking at the RecordDeath code and seeing how we can avoid zero divide.
Message was sent while issue was closed.
https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:149: int sample_probability_count = sample_probability_count_; On 2015/04/28 15:15:37, vadimt wrote: > Please note that we use local variable sample_probability_count as the > denominator, and I don't see how it could be 0. > > To be 0, it has to receive -1 value in line 149. I can't see how it can receive > a negative value at all, given that we have an overflow check. This code contains a data race, sample_probability_count can have any value by definition. > Speaking about atomicity: are you saying that if another thread zeroes > sample_probability_count_, we may get a value with some bytes replaced with > zeros, and some bytes still non-zero? On which platforms? You misinterpret C++ semantics. It is not about "some bytes replaced with zeros". If you have a data race, program has undefined behavior. https://codereview.chromium.org/1021053003/diff/520001/base/tracked_objects.c... base/tracked_objects.cc:233: sample_probability_count_ = 0; On 2015/04/28 15:15:37, vadimt wrote: > We should try avoiding using barriers. Performance is a big concern here. It is you who said about barriers. I did not. > It's > OK to have some rare corruptions from time to time (of course, except zero > divide). So, let's try looking at the RecordDeath code and seeing how we can > avoid zero divide. You misinterpret C++ guarantees. It is not about rare corruptions of the counter. It is about undefined behavior of the whole program.
Message was sent while issue was closed.
This code was racy for years. This racy-ness was known, and apparently, taking into account that it didn't cause glitches, it was safe. Per earlier comments with Alexander Potapenko (https://codereview.chromium.org/1021053003/#msg102), it'll be easy to fix when it breaks. So I wonder if I changed something in this scheme by adding a new member, or the old reasoning that it still safe still applies.
Message was sent while issue was closed.
On 2015/04/28 15:51:43, vadimt wrote: > This code was racy for years. This racy-ness was known, and apparently, taking > into account that it didn't cause glitches, it was safe. To conclude that it is reasonably safe, one needs to know reasons for all Chrome crashes that happen in the wild. We don't know these reasons. > Per earlier comments with Alexander Potapenko > (https://codereview.chromium.org/1021053003/#msg102), it'll be easy to fix when > it breaks. > > So I wonder if I changed something in this scheme by adding a new member, or the > old reasoning that it still safe still applies. Racy code can well be sensitive to "unrelated" code changes around. I understand what you are saying. This code is racy for a long time. But you are adding more races.
Message was sent while issue was closed.
Alright, thanks for the information. Given the importance of the added code for the Chrome-Back-to-Speed project, performance requirements, and the fact that we have no evidence that existing races cause problems, I believe that the best thing we can do now is to wait and see. If we see something unusual, we can reconsider current solution. It would be strange to introduce atomic storage just for the new member, or convert all members to atomic after years of apparently problem-less work without evidence that the current efficient causes glitches. This seems like a justified risk to me. |