|
|
Created:
7 years, 2 months ago by oshima Modified:
7 years, 2 months ago CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionDon't commit prefs::kStabilitySessionEndCompleted change immediately on chromeos
Chrome on ChromeOS gets killed and generate dump when hangs or didn't respond in time, so this isn't necessary.
BUG=302578
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=229535
Patch Set 1 #
Total comments: 1
Patch Set 2 : #Patch Set 3 : #
Total comments: 4
Patch Set 4 : #Patch Set 5 : #
Total comments: 2
Patch Set 6 : don't commit prefs::kStabilitySessionEndCompleted change immediately on chromeos #
Total comments: 2
Patch Set 7 : fix define #Messages
Total messages: 31 (0 generated)
I don't think this change is correct. While MetricsService:: RecordStartOfSessionEnd() does force a commit, it sets some prefs *after* that first commit. Hence, I suspect the second one is still needed. Jim knows the subtle details here better than I do, so adding him as a reviewer.
On 2013/10/01 22:40:23, Ilya Sherman wrote: > I don't think this change is correct. While MetricsService:: > RecordStartOfSessionEnd() does force a commit, it sets some prefs *after* that > first commit. Hence, I suspect the second one is still needed. > > Jim knows the subtle details here better than I do, so adding him as a reviewer. I see. Right now, EndSession saves the local state 2 times, which may be contributing the SIGABRTs during shutdown on ChromeOS (ChromeOS kills chrome if chrome doesn't shutdown in time) and I'm wondering if we can eliminate to one save. If this is necessary and no room to improve, I'll abandon this. Jim please let me know.
https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... File chrome/browser/browser_process_impl.cc (right): https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... chrome/browser/browser_process_impl.cc:365: // MetricsService::RecordStartOfSessionEnd writes to prefs immediately. Can you point to where this happens? I don't see it. If it is true, then the comment should be between lines 366 and 367, and hence inside curlies.
On 2013/10/02 01:18:30, jar wrote: > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > File chrome/browser/browser_process_impl.cc (right): > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > chrome/browser/browser_process_impl.cc:365: // > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > Can you point to where this happens? I don't see it. I was wrong about it. I uploaded new patch which does what I originally intended. Here is what the existing implementation does: kStabilityExitedCleanly -> true CommitPendingWrite clean_shutdown_status_ -> CLEANLY_SHUTDOWN kStabilityExitedCleanly -> true RecordCurrentState(prefs) kStabilitySessionEndCompleted -> false RecordCurrentState(prefs) then caller calls ommitPendingWrite() and here is what new impl does instead: clean_shutdown_status_ -> CLEANLY_SHUTDOWN kStabilitySessionEndCompleted -> false RecordCurrentState(prefs) kStabilityExitedCleanly -> true CommitPendingWrite() The new impl omits the first CommitPendingWrite, therefore if chrome crashes in RecordCurrentState, it will makes the difference. If this is not desirable, I have to abandon this change. Please let me know. > If it is true, then the comment should be between lines 366 and 367, and hence > inside curlies. Done.
ping?
On 2013/10/02 22:59:05, oshima wrote: > On 2013/10/02 01:18:30, jar wrote: > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > File chrome/browser/browser_process_impl.cc (right): > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > chrome/browser/browser_process_impl.cc:365: // > > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > > Can you point to where this happens? I don't see it. > > I was wrong about it. I uploaded new patch which does what I originally > intended. > > Here is what the existing implementation does: > > kStabilityExitedCleanly -> true > CommitPendingWrite > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > kStabilityExitedCleanly -> true > RecordCurrentState(prefs) > kStabilitySessionEndCompleted -> false > RecordCurrentState(prefs) > > then caller calls ommitPendingWrite() > > > and here is what new impl does instead: > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > kStabilitySessionEndCompleted -> false > RecordCurrentState(prefs) > kStabilityExitedCleanly -> true > CommitPendingWrite() > > The new impl omits the first CommitPendingWrite, therefore if chrome crashes in > RecordCurrentState, > it will makes the difference. If this is not desirable, I have to abandon this > change. > Please let me know. Jim, WDYT? My understanding is that the early CommitPendingWrite was somehow important to try to better capture shutdown metrics, but I'm not super familiar with the shutdown code path.
Some comments on the details, though I don't have enough context to comment on the larger picture: https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (left): https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:1612: // (and that we don't use some alternate path, and not call LogCleanShutdown). What happened to this comment? Is it no longer relevant? https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (right): https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:1604: RecordCurrentState(pref); Btw, why did you inline the call to RecordBooleanPrefVaue()?
https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (left): https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:1612: // (and that we don't use some alternate path, and not call LogCleanShutdown). On 2013/10/04 03:26:59, Ilya Sherman wrote: > What happened to this comment? Is it no longer relevant? Added back. Thank you for pointing out. https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (right): https://codereview.chromium.org/25484004/diff/16001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:1604: RecordCurrentState(pref); On 2013/10/04 03:26:59, Ilya Sherman wrote: > Btw, why did you inline the call to RecordBooleanPrefVaue()? Because it's calling RecordCurrentState(pref) inside, which results in redundant call to RecordCurrentState(pref). (see my comment in #5). Actually, it is no longer used after this change, so I removed it.
On 2013/10/04 03:22:09, Ilya Sherman wrote: > On 2013/10/02 22:59:05, oshima wrote: > > On 2013/10/02 01:18:30, jar wrote: > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > File chrome/browser/browser_process_impl.cc (right): > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > chrome/browser/browser_process_impl.cc:365: // > > > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > > > Can you point to where this happens? I don't see it. > > > > I was wrong about it. I uploaded new patch which does what I originally > > intended. > > > > Here is what the existing implementation does: > > > > kStabilityExitedCleanly -> true > > CommitPendingWrite > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > kStabilityExitedCleanly -> true > > RecordCurrentState(prefs) > > kStabilitySessionEndCompleted -> false > > RecordCurrentState(prefs) > > > > then caller calls ommitPendingWrite() > > > > > > and here is what new impl does instead: > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > kStabilitySessionEndCompleted -> false > > RecordCurrentState(prefs) > > kStabilityExitedCleanly -> true > > CommitPendingWrite() > > > > The new impl omits the first CommitPendingWrite, therefore if chrome crashes > in > > RecordCurrentState, > > it will makes the difference. If this is not desirable, I have to abandon this > > change. > > Please let me know. > > Jim, WDYT? My understanding is that the early CommitPendingWrite was somehow > important to try to better capture shutdown metrics, Yes, that's the question. The key difference is that this CL writes the file only after RecordCurrentState(pref); while the original code writes the file before and after this call. > but I'm not super familiar > with the shutdown code path.
On 2013/10/04 06:09:21, oshima wrote: > On 2013/10/04 03:22:09, Ilya Sherman wrote: > > On 2013/10/02 22:59:05, oshima wrote: > > > On 2013/10/02 01:18:30, jar wrote: > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > File chrome/browser/browser_process_impl.cc (right): > > > > > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > chrome/browser/browser_process_impl.cc:365: // > > > > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > > > > Can you point to where this happens? I don't see it. > > > > > > I was wrong about it. I uploaded new patch which does what I originally > > > intended. > > > > > > Here is what the existing implementation does: > > > > > > kStabilityExitedCleanly -> true > > > CommitPendingWrite > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > kStabilityExitedCleanly -> true > > > RecordCurrentState(prefs) > > > kStabilitySessionEndCompleted -> false > > > RecordCurrentState(prefs) > > > > > > then caller calls ommitPendingWrite() > > > > > > > > > and here is what new impl does instead: > > > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > kStabilitySessionEndCompleted -> false > > > RecordCurrentState(prefs) > > > kStabilityExitedCleanly -> true > > > CommitPendingWrite() > > > > > > The new impl omits the first CommitPendingWrite, therefore if chrome crashes > > in > > > RecordCurrentState, > > > it will makes the difference. If this is not desirable, I have to abandon > this > > > change. > > > Please let me know. > > > > Jim, WDYT? My understanding is that the early CommitPendingWrite was somehow > > important to try to better capture shutdown metrics, > > Yes, that's the question. The key difference is that this CL writes the file > only after > > RecordCurrentState(pref); > > while the original code writes the file before and after this call. > > > but I'm not super familiar > > with the shutdown code path. It has been a while since I noodled on this... but the general strategy is indeed to set flags that can be detected on startup during each phase of our shutdown. As a result, we can generally tell where we were when we hung by watching these "shutdown state" markers. Reading your summary, it sure looks like you have significantly re-ordered a number of these settings... and that is very likely a bad thing. Conceptually, we could do fewer disk writes... but we would not have the tracking of what transaction was committed (what state we crashed in). It is a bit disconcerting to look at your summary, and see kStabilityExitedCleanly get assigned to be true twice... but that is probably not a big deal, and may(?) fall out of some other code flow. The call to RecordCurrentState seems to only update a time slot in prefs, so that seems to be no big deal... although not fun to see twice (before and after kStabilitySessionEndCompleted gets set).
On 2013/10/05 01:07:56, jar wrote: > On 2013/10/04 06:09:21, oshima wrote: > > On 2013/10/04 03:22:09, Ilya Sherman wrote: > > > On 2013/10/02 22:59:05, oshima wrote: > > > > On 2013/10/02 01:18:30, jar wrote: > > > > > > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > > File chrome/browser/browser_process_impl.cc (right): > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > > chrome/browser/browser_process_impl.cc:365: // > > > > > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > > > > > Can you point to where this happens? I don't see it. > > > > > > > > I was wrong about it. I uploaded new patch which does what I originally > > > > intended. > > > > > > > > Here is what the existing implementation does: > > > > > > > > kStabilityExitedCleanly -> true > > > > CommitPendingWrite > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > > kStabilityExitedCleanly -> true > > > > RecordCurrentState(prefs) > > > > kStabilitySessionEndCompleted -> false > > > > RecordCurrentState(prefs) > > > > > > > > then caller calls ommitPendingWrite() > > > > > > > > > > > > and here is what new impl does instead: > > > > > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > > kStabilitySessionEndCompleted -> false > > > > RecordCurrentState(prefs) > > > > kStabilityExitedCleanly -> true > > > > CommitPendingWrite() > > > > > > > > The new impl omits the first CommitPendingWrite, therefore if chrome > crashes > > > in > > > > RecordCurrentState, > > > > it will makes the difference. If this is not desirable, I have to abandon > > this > > > > change. > > > > Please let me know. > > > > > > Jim, WDYT? My understanding is that the early CommitPendingWrite was > somehow > > > important to try to better capture shutdown metrics, > > > > Yes, that's the question. The key difference is that this CL writes the file > > only after > > > > RecordCurrentState(pref); > > > > while the original code writes the file before and after this call. > > > > > but I'm not super familiar > > > with the shutdown code path. > > It has been a while since I noodled on this... but the general strategy is > indeed to set flags that can be detected on startup during each phase of our > shutdown. As a result, we can generally tell where we were when we hung by > watching these "shutdown state" markers. Yes I understand that part, although important file writer writes the file in SequencedWorkerPool, so you can't really tell exactly which stage it crashed. > > Reading your summary, it sure looks like you have significantly re-ordered a > number of these settings... and that is very likely a bad thing. I didn't really reorder in terms of these markers. I did change the order several prefs are set, but you can only tells if chrome crashes between markers, not in which calls right? (But if the order is really important, I can change CL and keep the order) The key change I made is the removal of one commit. We used to commit right after kStabilityExitedCleanly -> true and after RecordStartOfSessionEnd It how happens only at the end of ::SessionEnd. The only meaningfull difference is that we used to be able to roughly tell if the crash might happened in RecordCurrentState(prefs), (again, commit happens in another thread), which we should be able to tell from crash stack, if it ever crashed. > Conceptually, > we could do fewer disk writes... but we would not have the tracking of what > transaction was committed (what state we crashed in). > > It is a bit disconcerting to look at your summary, and see > kStabilityExitedCleanly get assigned to be true twice... but that is probably > not a big deal, That's how current code works. kStabilityExitedCleanly isn't big deal, but I think we don't want to do RecordCurrentState(prefs) twice, as you stated below. > and may(?) fall out of some other code flow. These happen in one method call (RecordStartOfSessionEnd), and it wasn't due to other call flow. > The call to > RecordCurrentState seems to only update a time slot in prefs, so that seems to > be no big deal... although not fun to see twice (before and after > kStabilitySessionEndCompleted gets set). Another improvement of this CL is that we weren't committing after RecordCompletedSessionEnd, which means we most likely have been losing kStabilitySessionEndCompleted -> true marker.
On 2013/10/08 23:02:40, oshima wrote: > On 2013/10/05 01:07:56, jar wrote: > > On 2013/10/04 06:09:21, oshima wrote: > > > On 2013/10/04 03:22:09, Ilya Sherman wrote: > > > > On 2013/10/02 22:59:05, oshima wrote: > > > > > On 2013/10/02 01:18:30, jar wrote: > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > > > File chrome/browser/browser_process_impl.cc (right): > > > > > > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/25484004/diff/1/chrome/browser/browser_proces... > > > > > > chrome/browser/browser_process_impl.cc:365: // > > > > > > MetricsService::RecordStartOfSessionEnd writes to prefs immediately. > > > > > > Can you point to where this happens? I don't see it. > > > > > > > > > > I was wrong about it. I uploaded new patch which does what I originally > > > > > intended. > > > > > > > > > > Here is what the existing implementation does: > > > > > > > > > > kStabilityExitedCleanly -> true > > > > > CommitPendingWrite > > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > > > kStabilityExitedCleanly -> true > > > > > RecordCurrentState(prefs) > > > > > kStabilitySessionEndCompleted -> false > > > > > RecordCurrentState(prefs) > > > > > > > > > > then caller calls ommitPendingWrite() > > > > > > > > > > > > > > > and here is what new impl does instead: > > > > > > > > > > clean_shutdown_status_ -> CLEANLY_SHUTDOWN > > > > > kStabilitySessionEndCompleted -> false > > > > > RecordCurrentState(prefs) > > > > > kStabilityExitedCleanly -> true > > > > > CommitPendingWrite() > > > > > > > > > > The new impl omits the first CommitPendingWrite, therefore if chrome > > crashes > > > > in > > > > > RecordCurrentState, > > > > > it will makes the difference. If this is not desirable, I have to > abandon > > > this > > > > > change. > > > > > Please let me know. > > > > > > > > Jim, WDYT? My understanding is that the early CommitPendingWrite was > > somehow > > > > important to try to better capture shutdown metrics, > > > > > > Yes, that's the question. The key difference is that this CL writes the file > > > only after > > > > > > RecordCurrentState(pref); > > > > > > while the original code writes the file before and after this call. > > > > > > > but I'm not super familiar > > > > with the shutdown code path. > > > > It has been a while since I noodled on this... but the general strategy is > > indeed to set flags that can be detected on startup during each phase of our > > shutdown. As a result, we can generally tell where we were when we hung by > > watching these "shutdown state" markers. > > Yes I understand that part, although important file writer writes the file in > SequencedWorkerPool, so you can't really tell exactly which stage it crashed. > > > > > Reading your summary, it sure looks like you have significantly re-ordered a > > number of these settings... and that is very likely a bad thing. > > I didn't really reorder in terms of these markers. I did change the order > several prefs are set, but you can only tells if chrome crashes between markers, > not in which calls right? (But if the order is really important, I can change CL > and keep > the order) > > The key change I made is the removal of one commit. We used to commit right > after > > kStabilityExitedCleanly -> true > > and after RecordStartOfSessionEnd > > It how happens only at the end of ::SessionEnd. The only meaningfull difference > is that Sorry for typo. should read: It now happens only at the end of ::RecordStartOfSessionEnd(). > we used to be able to roughly tell if the crash might happened in > RecordCurrentState(prefs), > (again, commit happens in another thread), which we should be able to tell from > crash stack, > if it ever crashed. > > > Conceptually, > > we could do fewer disk writes... but we would not have the tracking of what > > transaction was committed (what state we crashed in). > > > > It is a bit disconcerting to look at your summary, and see > > kStabilityExitedCleanly get assigned to be true twice... but that is probably > > not a big deal, > > That's how current code works. kStabilityExitedCleanly isn't big deal, but I > think > we don't want to do RecordCurrentState(prefs) twice, as you stated below. > > > and may(?) fall out of some other code flow. > > These happen in one method call (RecordStartOfSessionEnd), and it wasn't due to > other call flow. > > > The call to > > RecordCurrentState seems to only update a time slot in prefs, so that seems to > > be no big deal... although not fun to see twice (before and after > > kStabilitySessionEndCompleted gets set). > > Another improvement of this CL is that we weren't committing after > RecordCompletedSessionEnd, > which means we most likely have been losing kStabilitySessionEndCompleted -> > true marker.
IMO, we really need to look at how the prefs settings are "harvested" to deduce crash location. That would make it clearer whether your changes will impact crash stats. I think the plan needs to be documented. If it was possible, it would be very nice to have a unit test that validated the exact shutdown sequence... but there is not (currently) a clean way to inject dependency on test code. https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (right): https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:1608: MarkAppCleanShutdownAndCommit(); I'm trying to understand what you are doing here, vs what the code used to do. In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. This function did a MarkAppCleanShutdownAndCommit(), and *then* set the pref. Now, you first change the prefs, and then call MarkAppCleanShutdownAndCommit(). It appears you have changed the order (setting the state, before the commit). IMO, it would be great to add a file-level comment indicating the exact shutdown sequence, and exactly what state is observable (via prefs) to deduce where (if any) the previous run of the browser crashed (stopped updating prefs files). We could use that as the spec, verify that it is indeed what was currently done.... and then verify that it is preserved by your change. WDYT? https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.h (right): https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.h:371: // |kStatabilitySessionEndCompleted|. nit: I think you didn't finish this sentence. Also, when you use the "|" around a name, it usually means the variable by said name. In this case, you're talking about a pref that has a given name, and setting its value. Clearer might be: It also sets the boolean preference "kStatabilitySessionEndCompleted" to the specified |value|. It is probably critical to say which was done first, especially if one of them commits preference values (writes them to disk).
Maybe my problem is that I don't fully understand why the order (or the commit before kStabilityExitedCleanly) is important. Here is my current understanding. First of all, ImportantFileWrite *always* post task to SequencedWorkerPool, so you can't deduce the crash location using value in prefs. In addition, RecordCompletedSessionEnd doesn't commit immediately, so the condition kStabilitySessionEndCompleted=true can't be used to deduce crash location, even if ImportantFileWriter were to commit the change immediately. Lastly crash location is available in crash reports. It tells exactly where it crashed, not in between location, which makes these flags less useful. The committing kStabilityExitedCleanly as soon as possible is important because we don't want to show crash info bar when chrome crashes after this point. This is best effort and may fail (if chrome crashes after CommitPendingWrite, but before ImportantFileWrite writes the file). Also it's quite unlikely that chrome crashes between them without us noticing it (we can tell from crash reports). With this, I thought this isn't useful. Am I missing something? Is this stats really used now to tell where crash is happening? On 2013/10/11 22:55:43, jar wrote: > IMO, we really need to look at how the prefs settings are "harvested" to deduce > crash location. That would make it clearer whether your changes will impact > crash stats. > > I think the plan needs to be documented. If it was possible, it would be very > nice to have a unit test that validated the exact shutdown sequence... but there > is not (currently) a clean way to inject dependency on test code. > > https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... > File chrome/browser/metrics/metrics_service.cc (right): > > https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... > chrome/browser/metrics/metrics_service.cc:1608: MarkAppCleanShutdownAndCommit(); > I'm trying to understand what you are doing here, vs what the code used to do. > > In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. This > function did a MarkAppCleanShutdownAndCommit(), and *then* set the pref. > > Now, you first change the prefs, and then call MarkAppCleanShutdownAndCommit(). > > It appears you have changed the order (setting the state, before the commit). > > > IMO, it would be great to add a file-level comment indicating the exact shutdown > sequence, and exactly what state is observable (via prefs) to deduce where (if > any) the previous run of the browser crashed (stopped updating prefs files). We > could use that as the spec, verify that it is indeed what was currently done.... > and then verify that it is preserved by your change. > > WDYT? > > https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... > File chrome/browser/metrics/metrics_service.h (right): > > https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/me... > chrome/browser/metrics/metrics_service.h:371: // > |kStatabilitySessionEndCompleted|. > nit: I think you didn't finish this sentence. > > Also, when you use the "|" around a name, it usually means the variable by said > name. In this case, you're talking about a pref that has a given name, and > setting its value. Clearer might be: > > It also sets the boolean preference "kStatabilitySessionEndCompleted" to the > specified |value|. > > It is probably critical to say which was done first, especially if one of them > commits preference values (writes them to disk).
Easy answer with regard to flags vs crash dumps: We don't always get dumps. We often get hangs at various points, and hence no crash dump whatever. We use these flags to get information about where the "dirty shutdown" (a.k.a., crash; hang; process-killed; power off) happened. On Wed, Oct 16, 2013 at 5:21 PM, <oshima@chromium.org> wrote: > > Maybe my problem is that I don't fully understand why the order (or the > commit > before > kStabilityExitedCleanly) is important. Here is my current understanding. > > First of all, ImportantFileWrite *always* post task to > SequencedWorkerPool, so > you can't deduce the crash location using value in prefs. In addition, > RecordCompletedSessionEnd doesn't commit immediately, so the condition > kStabilitySessionEndCompleted=**true can't be used to deduce crash > location, > even if ImportantFileWriter were to commit the change immediately. > Lastly crash location is available in crash reports. It tells exactly > where it > crashed, not in between location, which makes these flags less useful. > > The committing kStabilityExitedCleanly as soon as possible is important > because > we don't want to show crash info bar when chrome crashes after this point. > This > is best effort and may fail (if chrome crashes after CommitPendingWrite, > but > before > ImportantFileWrite writes the file). Also it's quite unlikely that chrome > crashes > between them without us noticing it (we can tell from crash reports). > > With this, I thought this isn't useful. > > Am I missing something? Is this stats really used now to tell where crash > is > happening? > > > On 2013/10/11 22:55:43, jar wrote: > >> IMO, we really need to look at how the prefs settings are "harvested" to >> > deduce > >> crash location. That would make it clearer whether your changes will >> impact >> crash stats. >> > > I think the plan needs to be documented. If it was possible, it would be >> very >> nice to have a unit test that validated the exact shutdown sequence... but >> > there > >> is not (currently) a clean way to inject dependency on test code. >> > > > https://codereview.chromium.**org/25484004/diff/31001/** > chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> > >> File chrome/browser/metrics/**metrics_service.cc (right): >> > > > https://codereview.chromium.**org/25484004/diff/31001/** > chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> > >> chrome/browser/metrics/**metrics_service.cc:1608: >> > MarkAppCleanShutdownAndCommit(**); > >> I'm trying to understand what you are doing here, vs what the code used >> to do. >> > > In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. >> This >> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the >> pref. >> > > Now, you first change the prefs, and then call >> > MarkAppCleanShutdownAndCommit(**). > > It appears you have changed the order (setting the state, before the >> commit). >> > > > IMO, it would be great to add a file-level comment indicating the exact >> > shutdown > >> sequence, and exactly what state is observable (via prefs) to deduce >> where (if >> any) the previous run of the browser crashed (stopped updating prefs >> files). >> > We > >> could use that as the spec, verify that it is indeed what was currently >> > done.... > >> and then verify that it is preserved by your change. >> > > WDYT? >> > > > https://codereview.chromium.**org/25484004/diff/31001/** > chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> > >> File chrome/browser/metrics/**metrics_service.h (right): >> > > > https://codereview.chromium.**org/25484004/diff/31001/** > chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> > >> chrome/browser/metrics/**metrics_service.h:371: // >> |**kStatabilitySessionEndComplete**d|. >> nit: I think you didn't finish this sentence. >> > > Also, when you use the "|" around a name, it usually means the variable by >> > said > >> name. In this case, you're talking about a pref that has a given name, >> and >> setting its value. Clearer might be: >> > > It also sets the boolean preference "**kStatabilitySessionEndComplete**d" >> to the >> specified |value|. >> > > It is probably critical to say which was done first, especially if one of >> them >> commits preference values (writes them to disk). >> > > > https://codereview.chromium.**org/25484004/<https://codereview.chromium.org/2... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
ChromeOS sends SIGFPE/SIGABORT when chrome hangs or didn't respond in time hence we do get crash dump for these cases (and this CL is to address this type of crash). Can we skip the 2nd commit on ChromeOS then? On Thu, Oct 17, 2013 at 9:24 AM, Jim Roskind <jar@chromium.org> wrote: > Easy answer with regard to flags vs crash dumps: > > We don't always get dumps. We often get hangs at various points, and > hence no crash dump whatever. > > We use these flags to get information about where the "dirty shutdown" > (a.k.a., crash; hang; process-killed; power off) happened. > > > On Wed, Oct 16, 2013 at 5:21 PM, <oshima@chromium.org> wrote: > >> >> Maybe my problem is that I don't fully understand why the order (or the >> commit >> before >> kStabilityExitedCleanly) is important. Here is my current understanding. >> >> First of all, ImportantFileWrite *always* post task to >> SequencedWorkerPool, so >> you can't deduce the crash location using value in prefs. In addition, >> RecordCompletedSessionEnd doesn't commit immediately, so the condition >> kStabilitySessionEndCompleted=**true can't be used to deduce crash >> location, >> even if ImportantFileWriter were to commit the change immediately. >> Lastly crash location is available in crash reports. It tells exactly >> where it >> crashed, not in between location, which makes these flags less useful. >> >> The committing kStabilityExitedCleanly as soon as possible is important >> because >> we don't want to show crash info bar when chrome crashes after this >> point. This >> is best effort and may fail (if chrome crashes after CommitPendingWrite, >> but >> before >> ImportantFileWrite writes the file). Also it's quite unlikely that chrome >> crashes >> between them without us noticing it (we can tell from crash reports). >> >> With this, I thought this isn't useful. >> >> Am I missing something? Is this stats really used now to tell where crash >> is >> happening? >> >> >> On 2013/10/11 22:55:43, jar wrote: >> >>> IMO, we really need to look at how the prefs settings are "harvested" to >>> >> deduce >> >>> crash location. That would make it clearer whether your changes will >>> impact >>> crash stats. >>> >> >> I think the plan needs to be documented. If it was possible, it would >>> be very >>> nice to have a unit test that validated the exact shutdown sequence... >>> but >>> >> there >> >>> is not (currently) a clean way to inject dependency on test code. >>> >> >> >> https://codereview.chromium.**org/25484004/diff/31001/** >> chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> >> >>> File chrome/browser/metrics/**metrics_service.cc (right): >>> >> >> >> https://codereview.chromium.**org/25484004/diff/31001/** >> chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> >> >>> chrome/browser/metrics/**metrics_service.cc:1608: >>> >> MarkAppCleanShutdownAndCommit(**); >> >>> I'm trying to understand what you are doing here, vs what the code used >>> to do. >>> >> >> In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. >>> This >>> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the >>> pref. >>> >> >> Now, you first change the prefs, and then call >>> >> MarkAppCleanShutdownAndCommit(**). >> >> It appears you have changed the order (setting the state, before the >>> commit). >>> >> >> >> IMO, it would be great to add a file-level comment indicating the exact >>> >> shutdown >> >>> sequence, and exactly what state is observable (via prefs) to deduce >>> where (if >>> any) the previous run of the browser crashed (stopped updating prefs >>> files). >>> >> We >> >>> could use that as the spec, verify that it is indeed what was currently >>> >> done.... >> >>> and then verify that it is preserved by your change. >>> >> >> WDYT? >>> >> >> >> https://codereview.chromium.**org/25484004/diff/31001/** >> chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> >> >>> File chrome/browser/metrics/**metrics_service.h (right): >>> >> >> >> https://codereview.chromium.**org/25484004/diff/31001/** >> chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> >> >>> chrome/browser/metrics/**metrics_service.h:371: // >>> |**kStatabilitySessionEndComplete**d|. >>> nit: I think you didn't finish this sentence. >>> >> >> Also, when you use the "|" around a name, it usually means the variable >>> by >>> >> said >> >>> name. In this case, you're talking about a pref that has a given name, >>> and >>> setting its value. Clearer might be: >>> >> >> It also sets the boolean preference "**kStatabilitySessionEndComplete**d" >>> to the >>> specified |value|. >>> >> >> It is probably critical to say which was done first, especially if one >>> of them >>> commits preference values (writes them to disk). >>> >> >> >> https://codereview.chromium.**org/25484004/<https://codereview.chromium.org/2... >> > > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2013/10/17 16:38:15, oshima wrote: > ChromeOS sends SIGFPE/SIGABORT when chrome hangs or didn't respond in time > hence we do get crash dump for these cases (and this CL is to address this > type of crash). > Can we skip the 2nd commit on ChromeOS then? I uploaded new patch. PTAL. > > On Thu, Oct 17, 2013 at 9:24 AM, Jim Roskind <mailto:jar@chromium.org> wrote: > > > Easy answer with regard to flags vs crash dumps: > > > > We don't always get dumps. We often get hangs at various points, and > > hence no crash dump whatever. > > > > We use these flags to get information about where the "dirty shutdown" > > (a.k.a., crash; hang; process-killed; power off) happened. > > > > > > On Wed, Oct 16, 2013 at 5:21 PM, <mailto:oshima@chromium.org> wrote: > > > >> > >> Maybe my problem is that I don't fully understand why the order (or the > >> commit > >> before > >> kStabilityExitedCleanly) is important. Here is my current understanding. > >> > >> First of all, ImportantFileWrite *always* post task to > >> SequencedWorkerPool, so > >> you can't deduce the crash location using value in prefs. In addition, > >> RecordCompletedSessionEnd doesn't commit immediately, so the condition > >> kStabilitySessionEndCompleted=**true can't be used to deduce crash > >> location, > >> even if ImportantFileWriter were to commit the change immediately. > >> Lastly crash location is available in crash reports. It tells exactly > >> where it > >> crashed, not in between location, which makes these flags less useful. > >> > >> The committing kStabilityExitedCleanly as soon as possible is important > >> because > >> we don't want to show crash info bar when chrome crashes after this > >> point. This > >> is best effort and may fail (if chrome crashes after CommitPendingWrite, > >> but > >> before > >> ImportantFileWrite writes the file). Also it's quite unlikely that chrome > >> crashes > >> between them without us noticing it (we can tell from crash reports). > >> > >> With this, I thought this isn't useful. > >> > >> Am I missing something? Is this stats really used now to tell where crash > >> is > >> happening? > >> > >> > >> On 2013/10/11 22:55:43, jar wrote: > >> > >>> IMO, we really need to look at how the prefs settings are "harvested" to > >>> > >> deduce > >> > >>> crash location. That would make it clearer whether your changes will > >>> impact > >>> crash stats. > >>> > >> > >> I think the plan needs to be documented. If it was possible, it would > >>> be very > >>> nice to have a unit test that validated the exact shutdown sequence... > >>> but > >>> > >> there > >> > >>> is not (currently) a clean way to inject dependency on test code. > >>> > >> > >> > >> https://codereview.chromium.**org/25484004/diff/31001/** > >> > chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> > >> > >>> File chrome/browser/metrics/**metrics_service.cc (right): > >>> > >> > >> > >> https://codereview.chromium.**org/25484004/diff/31001/** > >> > chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> > >> > >>> chrome/browser/metrics/**metrics_service.cc:1608: > >>> > >> MarkAppCleanShutdownAndCommit(**); > >> > >>> I'm trying to understand what you are doing here, vs what the code used > >>> to do. > >>> > >> > >> In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. > >>> This > >>> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the > >>> pref. > >>> > >> > >> Now, you first change the prefs, and then call > >>> > >> MarkAppCleanShutdownAndCommit(**). > >> > >> It appears you have changed the order (setting the state, before the > >>> commit). > >>> > >> > >> > >> IMO, it would be great to add a file-level comment indicating the exact > >>> > >> shutdown > >> > >>> sequence, and exactly what state is observable (via prefs) to deduce > >>> where (if > >>> any) the previous run of the browser crashed (stopped updating prefs > >>> files). > >>> > >> We > >> > >>> could use that as the spec, verify that it is indeed what was currently > >>> > >> done.... > >> > >>> and then verify that it is preserved by your change. > >>> > >> > >> WDYT? > >>> > >> > >> > >> https://codereview.chromium.**org/25484004/diff/31001/** > >> > chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> > >> > >>> File chrome/browser/metrics/**metrics_service.h (right): > >>> > >> > >> > >> https://codereview.chromium.**org/25484004/diff/31001/** > >> > chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> > >> > >>> chrome/browser/metrics/**metrics_service.h:371: // > >>> |**kStatabilitySessionEndComplete**d|. > >>> nit: I think you didn't finish this sentence. > >>> > >> > >> Also, when you use the "|" around a name, it usually means the variable > >>> by > >>> > >> said > >> > >>> name. In this case, you're talking about a pref that has a given name, > >>> and > >>> setting its value. Clearer might be: > >>> > >> > >> It also sets the boolean preference "**kStatabilitySessionEndComplete**d" > >>> to the > >>> specified |value|. > >>> > >> > >> It is probably critical to say which was done first, especially if one > >>> of them > >>> commits preference values (writes them to disk). > >>> > >> > >> > >> > https://codereview.chromium.**org/25484004/%3Chttps://codereview.chromium.org...> > >> > > > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
I'm fine with skipping the second commit if you feel you have adequate coverage. IMO, we should *really* document what the heck we're doing, and that will help to preserve the semantics, and not be overly constrained (I'm wary that you may be right... and I'm being too conservative... but we really need a clear list of what we're doing here to nail things down. I'd also add that the code is often way too cryptic, and difficult to follow (you should see how bad it was in years of old when we first started fixing it up.). We're always between a rock and a hard place, wanting to clean it up... and not wanting to destroy the current crash signals which are directly compared with "yesterday's" crash signals :-(. If you land a CrOS only patch... be super careful to see if the stats suddenly change in some way... as it is oh so easy to corrupt critical signals as we change metrics. Jim On Thu, Oct 17, 2013 at 9:37 AM, oshima <oshima@chromium.org> wrote: > > ChromeOS sends SIGFPE/SIGABORT when chrome hangs or didn't respond in time > hence we do get crash dump for these cases (and this CL is to address this > type of crash). > Can we skip the 2nd commit on ChromeOS then? > > On Thu, Oct 17, 2013 at 9:24 AM, Jim Roskind <jar@chromium.org> wrote: > >> Easy answer with regard to flags vs crash dumps: >> >> We don't always get dumps. We often get hangs at various points, and >> hence no crash dump whatever. >> >> We use these flags to get information about where the "dirty shutdown" >> (a.k.a., crash; hang; process-killed; power off) happened. >> >> >> On Wed, Oct 16, 2013 at 5:21 PM, <oshima@chromium.org> wrote: >> >>> >>> Maybe my problem is that I don't fully understand why the order (or the >>> commit >>> before >>> kStabilityExitedCleanly) is important. Here is my current understanding. >>> >>> First of all, ImportantFileWrite *always* post task to >>> SequencedWorkerPool, so >>> you can't deduce the crash location using value in prefs. In addition, >>> RecordCompletedSessionEnd doesn't commit immediately, so the condition >>> kStabilitySessionEndCompleted=**true can't be used to deduce crash >>> location, >>> even if ImportantFileWriter were to commit the change immediately. >>> Lastly crash location is available in crash reports. It tells exactly >>> where it >>> crashed, not in between location, which makes these flags less useful. >>> >>> The committing kStabilityExitedCleanly as soon as possible is important >>> because >>> we don't want to show crash info bar when chrome crashes after this >>> point. This >>> is best effort and may fail (if chrome crashes after CommitPendingWrite, >>> but >>> before >>> ImportantFileWrite writes the file). Also it's quite unlikely that chrome >>> crashes >>> between them without us noticing it (we can tell from crash reports). >>> >>> With this, I thought this isn't useful. >>> >>> Am I missing something? Is this stats really used now to tell where >>> crash is >>> happening? >>> >>> >>> On 2013/10/11 22:55:43, jar wrote: >>> >>>> IMO, we really need to look at how the prefs settings are "harvested" to >>>> >>> deduce >>> >>>> crash location. That would make it clearer whether your changes will >>>> impact >>>> crash stats. >>>> >>> >>> I think the plan needs to be documented. If it was possible, it would >>>> be very >>>> nice to have a unit test that validated the exact shutdown sequence... >>>> but >>>> >>> there >>> >>>> is not (currently) a clean way to inject dependency on test code. >>>> >>> >>> >>> https://codereview.chromium.**org/25484004/diff/31001/** >>> chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> >>> >>>> File chrome/browser/metrics/**metrics_service.cc (right): >>>> >>> >>> >>> https://codereview.chromium.**org/25484004/diff/31001/** >>> chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> >>> >>>> chrome/browser/metrics/**metrics_service.cc:1608: >>>> >>> MarkAppCleanShutdownAndCommit(**); >>> >>>> I'm trying to understand what you are doing here, vs what the code used >>>> to do. >>>> >>> >>> In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. >>>> This >>>> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the >>>> pref. >>>> >>> >>> Now, you first change the prefs, and then call >>>> >>> MarkAppCleanShutdownAndCommit(**). >>> >>> It appears you have changed the order (setting the state, before the >>>> commit). >>>> >>> >>> >>> IMO, it would be great to add a file-level comment indicating the exact >>>> >>> shutdown >>> >>>> sequence, and exactly what state is observable (via prefs) to deduce >>>> where (if >>>> any) the previous run of the browser crashed (stopped updating prefs >>>> files). >>>> >>> We >>> >>>> could use that as the spec, verify that it is indeed what was currently >>>> >>> done.... >>> >>>> and then verify that it is preserved by your change. >>>> >>> >>> WDYT? >>>> >>> >>> >>> https://codereview.chromium.**org/25484004/diff/31001/** >>> chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> >>> >>>> File chrome/browser/metrics/**metrics_service.h (right): >>>> >>> >>> >>> https://codereview.chromium.**org/25484004/diff/31001/** >>> chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> >>> >>>> chrome/browser/metrics/**metrics_service.h:371: // >>>> |**kStatabilitySessionEndComplete**d|. >>>> nit: I think you didn't finish this sentence. >>>> >>> >>> Also, when you use the "|" around a name, it usually means the variable >>>> by >>>> >>> said >>> >>>> name. In this case, you're talking about a pref that has a given name, >>>> and >>>> setting its value. Clearer might be: >>>> >>> >>> It also sets the boolean preference "**kStatabilitySessionEndComplete**d" >>>> to the >>>> specified |value|. >>>> >>> >>> It is probably critical to say which was done first, especially if one >>>> of them >>>> commits preference values (writes them to disk). >>>> >>> >>> >>> https://codereview.chromium.**org/25484004/<https://codereview.chromium.org/2... >>> >> >> > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
lgtm
On 2013/10/18 01:02:50, jar wrote: > lgtm thanks!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/oshima@chromium.org/25484004/49001
Retried try job too often on chromium_presubmit for step(s) presubmit http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=chromium_p...
+thestig for OWNERS approval
https://codereview.chromium.org/25484004/diff/49001/chrome/browser/browser_pr... File chrome/browser/browser_process_impl.cc (right): https://codereview.chromium.org/25484004/diff/49001/chrome/browser/browser_pr... chrome/browser/browser_process_impl.cc:376: #if !defined(OP_CHROMEOS) OS_CHROMEOS
https://codereview.chromium.org/25484004/diff/49001/chrome/browser/browser_pr... File chrome/browser/browser_process_impl.cc (right): https://codereview.chromium.org/25484004/diff/49001/chrome/browser/browser_pr... chrome/browser/browser_process_impl.cc:376: #if !defined(OP_CHROMEOS) On 2013/10/18 21:31:02, Lei Zhang wrote: > OS_CHROMEOS doh, fixed. thank you for the catch.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/oshima@chromium.org/25484004/162001
Message was sent while issue was closed.
Change committed as 229535
On Thu, Oct 17, 2013 at 6:01 PM, Jim Roskind <jar@chromium.org> wrote: > I'm fine with skipping the second commit if you feel you have adequate > coverage. > > IMO, we should *really* document what the heck we're doing, and that will > help to preserve the semantics, and not be overly constrained (I'm wary > that you may be right... and I'm being too conservative... but we really > need a clear list of what we're doing here to nail things down. I'd also > add that the code is often way too cryptic, and difficult to follow (you > should see how bad it was in years of old when we first started fixing it > up.). We're always between a rock and a hard place, wanting to clean it > up... and not wanting to destroy the current crash signals which are > directly compared with "yesterday's" crash signals :-(. > Do you know who is in charge of this stats now (in other words, who is monitoring this)? I'd like to know if this is being used, or was useful to find issues during shutdown. I agree that it's bad if signals are different between today and yesterday, but that's true only if it's used in meaningful way. Maybe the better way is to use the approach used in cros (generate dump when it hangs). > > If you land a CrOS only patch... be super careful to see if the stats > suddenly change in some way... as it is oh so easy to corrupt critical > signals as we change metrics. > I'm pretty sure CrOS isn't using this metrics. I'm actually the person who's looking into issues during shutdown and I've never seen or used it, as we always get crash dump for hangs during shutdown. > > Jim > > > On Thu, Oct 17, 2013 at 9:37 AM, oshima <oshima@chromium.org> wrote: > >> >> ChromeOS sends SIGFPE/SIGABORT when chrome hangs or didn't respond in time >> hence we do get crash dump for these cases (and this CL is to address >> this type of crash). >> Can we skip the 2nd commit on ChromeOS then? >> >> On Thu, Oct 17, 2013 at 9:24 AM, Jim Roskind <jar@chromium.org> wrote: >> >>> Easy answer with regard to flags vs crash dumps: >>> >>> We don't always get dumps. We often get hangs at various points, and >>> hence no crash dump whatever. >>> >>> We use these flags to get information about where the "dirty shutdown" >>> (a.k.a., crash; hang; process-killed; power off) happened. >>> >>> >>> On Wed, Oct 16, 2013 at 5:21 PM, <oshima@chromium.org> wrote: >>> >>>> >>>> Maybe my problem is that I don't fully understand why the order (or the >>>> commit >>>> before >>>> kStabilityExitedCleanly) is important. Here is my current understanding. >>>> >>>> First of all, ImportantFileWrite *always* post task to >>>> SequencedWorkerPool, so >>>> you can't deduce the crash location using value in prefs. In addition, >>>> RecordCompletedSessionEnd doesn't commit immediately, so the condition >>>> kStabilitySessionEndCompleted=**true can't be used to deduce crash >>>> location, >>>> even if ImportantFileWriter were to commit the change immediately. >>>> Lastly crash location is available in crash reports. It tells exactly >>>> where it >>>> crashed, not in between location, which makes these flags less useful. >>>> >>>> The committing kStabilityExitedCleanly as soon as possible is important >>>> because >>>> we don't want to show crash info bar when chrome crashes after this >>>> point. This >>>> is best effort and may fail (if chrome crashes after >>>> CommitPendingWrite, but >>>> before >>>> ImportantFileWrite writes the file). Also it's quite unlikely that >>>> chrome >>>> crashes >>>> between them without us noticing it (we can tell from crash reports). >>>> >>>> With this, I thought this isn't useful. >>>> >>>> Am I missing something? Is this stats really used now to tell where >>>> crash is >>>> happening? >>>> >>>> >>>> On 2013/10/11 22:55:43, jar wrote: >>>> >>>>> IMO, we really need to look at how the prefs settings are "harvested" >>>>> to >>>>> >>>> deduce >>>> >>>>> crash location. That would make it clearer whether your changes will >>>>> impact >>>>> crash stats. >>>>> >>>> >>>> I think the plan needs to be documented. If it was possible, it would >>>>> be very >>>>> nice to have a unit test that validated the exact shutdown sequence... >>>>> but >>>>> >>>> there >>>> >>>>> is not (currently) a clean way to inject dependency on test code. >>>>> >>>> >>>> >>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>> chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> >>>> >>>>> File chrome/browser/metrics/**metrics_service.cc (right): >>>>> >>>> >>>> >>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>> chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> >>>> >>>>> chrome/browser/metrics/**metrics_service.cc:1608: >>>>> >>>> MarkAppCleanShutdownAndCommit(**); >>>> >>>>> I'm trying to understand what you are doing here, vs what the code >>>>> used to do. >>>>> >>>> >>>> In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. >>>>> This >>>>> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the >>>>> pref. >>>>> >>>> >>>> Now, you first change the prefs, and then call >>>>> >>>> MarkAppCleanShutdownAndCommit(**). >>>> >>>> It appears you have changed the order (setting the state, before the >>>>> commit). >>>>> >>>> >>>> >>>> IMO, it would be great to add a file-level comment indicating the exact >>>>> >>>> shutdown >>>> >>>>> sequence, and exactly what state is observable (via prefs) to deduce >>>>> where (if >>>>> any) the previous run of the browser crashed (stopped updating prefs >>>>> files). >>>>> >>>> We >>>> >>>>> could use that as the spec, verify that it is indeed what was currently >>>>> >>>> done.... >>>> >>>>> and then verify that it is preserved by your change. >>>>> >>>> >>>> WDYT? >>>>> >>>> >>>> >>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>> chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> >>>> >>>>> File chrome/browser/metrics/**metrics_service.h (right): >>>>> >>>> >>>> >>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>> chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> >>>> >>>>> chrome/browser/metrics/**metrics_service.h:371: // >>>>> |**kStatabilitySessionEndComplete**d|. >>>>> nit: I think you didn't finish this sentence. >>>>> >>>> >>>> Also, when you use the "|" around a name, it usually means the >>>>> variable by >>>>> >>>> said >>>> >>>>> name. In this case, you're talking about a pref that has a given >>>>> name, and >>>>> setting its value. Clearer might be: >>>>> >>>> >>>> It also sets the boolean preference "**kStatabilitySessionEndComplete* >>>>> *d" to the >>>>> specified |value|. >>>>> >>>> >>>> It is probably critical to say which was done first, especially if one >>>>> of them >>>>> commits preference values (writes them to disk). >>>>> >>>> >>>> >>>> https://codereview.chromium.**org/25484004/<https://codereview.chromium.org/2... >>>> >>> >>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
UMA crash stats (dirty shutdown) are the primary metric for estimating stability. It is used on a daily and weekly basis to decide go/no-go for new releases. Stats are backed by crash dumps, which are uploaded (and often conflict). We have code that crashes during hangs, and even code that snapshots the stack (without crashing) when we hang various interesting threads. On Wed, Oct 23, 2013 at 9:44 AM, oshima <oshima@chromium.org> wrote: > > On Thu, Oct 17, 2013 at 6:01 PM, Jim Roskind <jar@chromium.org> wrote: > >> I'm fine with skipping the second commit if you feel you have adequate >> coverage. >> >> IMO, we should *really* document what the heck we're doing, and that will >> help to preserve the semantics, and not be overly constrained (I'm wary >> that you may be right... and I'm being too conservative... but we really >> need a clear list of what we're doing here to nail things down. I'd also >> add that the code is often way too cryptic, and difficult to follow (you >> should see how bad it was in years of old when we first started fixing it >> up.). We're always between a rock and a hard place, wanting to clean it >> up... and not wanting to destroy the current crash signals which are >> directly compared with "yesterday's" crash signals :-(. >> > > Do you know who is in charge of this stats now (in other words, who is > monitoring this)? I'd like to know if this is being used, or was useful to > find issues during shutdown. > I agree that it's bad if signals are different between today and > yesterday, but that's true only if it's used in meaningful way. Maybe the > better way is to use the approach used in cros (generate dump when it > hangs). > > >> >> If you land a CrOS only patch... be super careful to see if the stats >> suddenly change in some way... as it is oh so easy to corrupt critical >> signals as we change metrics. >> > > I'm pretty sure CrOS isn't using this metrics. I'm actually the person > who's looking into issues during shutdown and I've never seen or used it, > as we always get crash dump for hangs during shutdown. > > > >> >> Jim >> >> >> On Thu, Oct 17, 2013 at 9:37 AM, oshima <oshima@chromium.org> wrote: >> >>> >>> ChromeOS sends SIGFPE/SIGABORT when chrome hangs or didn't respond in >>> time >>> hence we do get crash dump for these cases (and this CL is to address >>> this type of crash). >>> Can we skip the 2nd commit on ChromeOS then? >>> >>> On Thu, Oct 17, 2013 at 9:24 AM, Jim Roskind <jar@chromium.org> wrote: >>> >>>> Easy answer with regard to flags vs crash dumps: >>>> >>>> We don't always get dumps. We often get hangs at various points, and >>>> hence no crash dump whatever. >>>> >>>> We use these flags to get information about where the "dirty shutdown" >>>> (a.k.a., crash; hang; process-killed; power off) happened. >>>> >>>> >>>> On Wed, Oct 16, 2013 at 5:21 PM, <oshima@chromium.org> wrote: >>>> >>>>> >>>>> Maybe my problem is that I don't fully understand why the order (or >>>>> the commit >>>>> before >>>>> kStabilityExitedCleanly) is important. Here is my current >>>>> understanding. >>>>> >>>>> First of all, ImportantFileWrite *always* post task to >>>>> SequencedWorkerPool, so >>>>> you can't deduce the crash location using value in prefs. In addition, >>>>> RecordCompletedSessionEnd doesn't commit immediately, so the condition >>>>> kStabilitySessionEndCompleted=**true can't be used to deduce crash >>>>> location, >>>>> even if ImportantFileWriter were to commit the change immediately. >>>>> Lastly crash location is available in crash reports. It tells exactly >>>>> where it >>>>> crashed, not in between location, which makes these flags less useful. >>>>> >>>>> The committing kStabilityExitedCleanly as soon as possible is >>>>> important because >>>>> we don't want to show crash info bar when chrome crashes after this >>>>> point. This >>>>> is best effort and may fail (if chrome crashes after >>>>> CommitPendingWrite, but >>>>> before >>>>> ImportantFileWrite writes the file). Also it's quite unlikely that >>>>> chrome >>>>> crashes >>>>> between them without us noticing it (we can tell from crash reports). >>>>> >>>>> With this, I thought this isn't useful. >>>>> >>>>> Am I missing something? Is this stats really used now to tell where >>>>> crash is >>>>> happening? >>>>> >>>>> >>>>> On 2013/10/11 22:55:43, jar wrote: >>>>> >>>>>> IMO, we really need to look at how the prefs settings are "harvested" >>>>>> to >>>>>> >>>>> deduce >>>>> >>>>>> crash location. That would make it clearer whether your changes will >>>>>> impact >>>>>> crash stats. >>>>>> >>>>> >>>>> I think the plan needs to be documented. If it was possible, it >>>>>> would be very >>>>>> nice to have a unit test that validated the exact shutdown >>>>>> sequence... but >>>>>> >>>>> there >>>>> >>>>>> is not (currently) a clean way to inject dependency on test code. >>>>>> >>>>> >>>>> >>>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>>> chrome/browser/metrics/**metrics_service.cc<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc> >>>>> >>>>>> File chrome/browser/metrics/**metrics_service.cc (right): >>>>>> >>>>> >>>>> >>>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>>> chrome/browser/metrics/**metrics_service.cc#newcode1608<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.cc#newcode1608> >>>>> >>>>>> chrome/browser/metrics/**metrics_service.cc:1608: >>>>>> >>>>> MarkAppCleanShutdownAndCommit(**); >>>>> >>>>>> I'm trying to understand what you are doing here, vs what the code >>>>>> used to do. >>>>>> >>>>> >>>>> In the past, we *first* did a LogCleanShutdown() on lines 740 or 745. >>>>>> This >>>>>> function did a MarkAppCleanShutdownAndCommit(**), and *then* set the >>>>>> pref. >>>>>> >>>>> >>>>> Now, you first change the prefs, and then call >>>>>> >>>>> MarkAppCleanShutdownAndCommit(**). >>>>> >>>>> It appears you have changed the order (setting the state, before the >>>>>> commit). >>>>>> >>>>> >>>>> >>>>> IMO, it would be great to add a file-level comment indicating the >>>>>> exact >>>>>> >>>>> shutdown >>>>> >>>>>> sequence, and exactly what state is observable (via prefs) to deduce >>>>>> where (if >>>>>> any) the previous run of the browser crashed (stopped updating prefs >>>>>> files). >>>>>> >>>>> We >>>>> >>>>>> could use that as the spec, verify that it is indeed what was >>>>>> currently >>>>>> >>>>> done.... >>>>> >>>>>> and then verify that it is preserved by your change. >>>>>> >>>>> >>>>> WDYT? >>>>>> >>>>> >>>>> >>>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>>> chrome/browser/metrics/**metrics_service.h<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h> >>>>> >>>>>> File chrome/browser/metrics/**metrics_service.h (right): >>>>>> >>>>> >>>>> >>>>> https://codereview.chromium.**org/25484004/diff/31001/** >>>>> chrome/browser/metrics/**metrics_service.h#newcode371<https://codereview.chromium.org/25484004/diff/31001/chrome/browser/metrics/metrics_service.h#newcode371> >>>>> >>>>>> chrome/browser/metrics/**metrics_service.h:371: // >>>>>> |**kStatabilitySessionEndComplete**d|. >>>>>> nit: I think you didn't finish this sentence. >>>>>> >>>>> >>>>> Also, when you use the "|" around a name, it usually means the >>>>>> variable by >>>>>> >>>>> said >>>>> >>>>>> name. In this case, you're talking about a pref that has a given >>>>>> name, and >>>>>> setting its value. Clearer might be: >>>>>> >>>>> >>>>> It also sets the boolean preference "**kStatabilitySessionEndComplete >>>>>> **d" to the >>>>>> specified |value|. >>>>>> >>>>> >>>>> It is probably critical to say which was done first, especially if >>>>>> one of them >>>>>> commits preference values (writes them to disk). >>>>>> >>>>> >>>>> >>>>> https://codereview.chromium.**org/25484004/<https://codereview.chromium.org/2... >>>>> >>>> >>>> >>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org. |