|
|
Created:
10 years, 2 months ago by Simon Que Modified:
9 years, 7 months ago CC:
chromium-os-reviews_chromium.org Visibility:
Public. |
DescriptionCollect suspend and resume info from power manager
Updated the unclean shutdown collector to check and report the
modification times of trace files left by power manager to indicate that
the system was suspended and that it was running on low battery.
These files are interpreted by the unclean shutdown collector to
determine the cause of an unclean shutdown -- either the battery ran
out during suspend or the battery ran out after resuming from a low
battery-induced suspend and it continued to run on battery.
BUG=chromium-os:1472
TEST=Let the battery run below the cutoff, suspend & resume, and then
remove the battery to simulate running dry; check crash reporter
logs. Alternatively/additionally, let it suspend and remove the battery
while suspended. Restart and check crash reporter logs.
Signed-off-by: Simon Que <sque@chromium.org>
Change-Id: I8e6767e8457afb7abf1e7300eac020adda1ebb48
Committed: http://chrome-svn/viewvc/chromeos?view=rev&revision=2d2f2bb
Patch Set 1 #Patch Set 2 : Cleanup of old files #
Total comments: 5
Patch Set 3 : Code cleanup, comments end with period, interpretation of events #
Total comments: 20
Patch Set 4 : Reduced to two files and w/o reading timestamps #
Total comments: 3
Patch Set 5 : Reorganized deletions, added unit tests #
Total comments: 6
Patch Set 6 : Updated unit test, minor fixes #
Total comments: 4
Patch Set 7 : ABC order, ASSERT_EQ order #
Messages
Total messages: 19 (0 generated)
This change is still pending review.
I'm not really excited about this change as is because it's only for diagnostic purposes and the diagnostics are fairly low level. Why not just add the extra logic to actually check if the last unclean shutdown was due to power failure and properly count/not-count it? http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.cc File unclean_shutdown_collector.cc (right): http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:8: #include "base/time.h" abc order http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:15: // Files created by power manager used for crash reporting Periods at end of comments please. http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:19: static const char kPowerdSuspendFile[] = TRACE_PATH "/powerd_suspending"; Just make this be the filename, make kPowerdTracePath be a const FilePath, and use kPowerdTracePath.Append(kPowerdSuspendFile) http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:19: static const char kPowerdSuspendFile[] = TRACE_PATH "/powerd_suspending"; I suggest making this be the filename, make kPowerdTracePath be a const FilePath, and use kPowerdTracePath.Append(kPowerdSuspendFile). http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:82: ReadModificationTime(kPowerdSuspendFile); I really would like to see the logic of figuring out if the unclean shutdown was real or not. I think the current diagnostic output would only be valuable to you (and maybe me though I'd have to think through the file times it outputs to figure out what really happened) and not to other developers.
> http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:8: #include "base/time.h" > abc order Done. > http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:15: // Files created by power manager used for > crash reporting > Periods at end of comments please. Done. > http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:19: static const char kPowerdSuspendFile[] = > TRACE_PATH "/powerd_suspending"; > Just make this be the filename, make kPowerdTracePath be a const FilePath, and > use kPowerdTracePath.Append(kPowerdSuspendFile) Done. > http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:19: static const char kPowerdSuspendFile[] = > TRACE_PATH "/powerd_suspending"; > I suggest making this be the filename, make kPowerdTracePath be a const > FilePath, and use kPowerdTracePath.Append(kPowerdSuspendFile). Done. > http://codereview.chromium.org/3644007/diff/3001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:82: ReadModificationTime(kPowerdSuspendFile); > I really would like to see the logic of figuring out if the unclean shutdown was > real or not. I think the current diagnostic output would only be valuable to > you (and maybe me though I'd have to think through the file times it outputs to > figure out what really happened) and not to other developers. Done.
My style nits below. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.cc File unclean_shutdown_collector.cc (right): http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:15: static const FilePath kPowerdTracePath = FilePath("/var/lib/power_manager"); Please, don't do this. See http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thre... http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:15: static const FilePath kPowerdTracePath = FilePath("/var/lib/power_manager"); Can you use FILE_PATH_LITERAL instead? http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:57: // Check for unclean shutdown due to battery running out. This comment is good. I'd add it in the header file and remove it from here. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:76: { the { should be in the end of the previous line. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:77: file_util::FileInfo file_info; Please, move this inside the first if, close where it's used. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:78: base::Time::Exploded timestamp; Please, move this inside the if too. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:84: if (file_util::PathExists(file_path)) { You can return early. I think it's better. if (!file_util::Path(file_path)) return; http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:99: { the { should be on the previous line. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:104: Append(kPowerdSuspendFile), &suspend_time); wrong indentation. Indent 4 spaces. Better, if possible, align Append with kPowerdTracePath, put the &suspend_time on the next line aligned with Append. Same thing below. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:121: remove this blank line. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.h File unclean_shutdown_collector.h (right): http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.h... unclean_shutdown_collector.h:38: bool ReadModificationTime(const FilePath &file_path, base::Time *mod_time); the & should be with the type. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.h... unclean_shutdown_collector.h:43: const char *unclean_shutdown_file_; Since you are here, please fix this too.
I'm not understanding the value in tracking the suspend/resume times or looking at them in crash_reporter. All we care about is "was the battery low and power unplugged before an unclean shutdown", right? If you do want to clean this up and keep track of suspend/resume times, please add tests to unclean_shutdown_collector_test or logging_UncleanShutdown autotest test that really tests all the file states of CheckPowerd... http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.cc File unclean_shutdown_collector.cc (right): http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:15: static const FilePath kPowerdTracePath = FilePath("/var/lib/power_manager"); On 2010/11/19 15:35:43, tfarina wrote: > Please, don't do this. > See > http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thre... Fair enough, I retract my statement about doing this in power manager. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:57: // Check for unclean shutdown due to battery running out. On 2010/11/19 15:35:43, tfarina wrote: > This comment is good. I'd add it in the header file and remove it from here. I'd name the function to indicate this: CheckForDeadBatteryUncleanShutdown (or something similar) http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:100: logger_->LogInfo("Checking for Power Daemon files"); This seems pretty chatty as we already emit a few messages at every startup and this one is not conditional. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:105: bool was_resumed = ReadModificationTime(kPowerdTracePath. I would move these down to where they're actually used. Then you probably can even remove this variable. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:111: if (was_suspended) { Once the Delete calls are factored out you can make this be !was_suspended return false. http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:119: } else if (was_suspended && was_resumed && resume_time > suspend_time && I think the whole first line of this is redundant. Do you need use (was_resumed && resume_time > suspend_time)? It's required to be true by the else if. Well, sort of, it should be resume_time >= suspend_time but I assume that's a bug. What about running_on_low_battery && suspend_time > low_battery_time && was_resumed && resume_time > suspend_time? Doesn't that mean the battery ran out while the device was resumed? Why wouldn't we return true for that case. And that that point, why wouldn't we just replace if (was_suspended) {...} with return running_on_low_battery? http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... unclean_shutdown_collector.cc:128: file_util::Delete(kPowerdTracePath.Append(kPowerdSuspendFile), false); Please separate this out into a function with a name like SignalPowerdTraceFileGeneration() and call it from Collect(). This helps avoid a "Check/Get" function from having a side effect which is always nice. Then you can also call this function instead of duplicating the code in Disable(). ;) http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.h File unclean_shutdown_collector.h (right): http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.h... unclean_shutdown_collector.h:38: bool ReadModificationTime(const FilePath &file_path, base::Time *mod_time); On 2010/11/19 15:35:43, tfarina wrote: > the & should be with the type. Please don't fix. This project does not yet follow the chromium */& style and we can do it all at once.
On Fri, Nov 19, 2010 at 12:02 PM, <kmixter@chromium.org> wrote: > I'm not understanding the value in tracking the suspend/resume times or > looking > at them in crash_reporter. All we care about is "was the battery low and > power > unplugged before an unclean shutdown", right? We want to avoid the false positives as well. For example suppose there was a suspend and resume when battery is not low. And then something in the system crashes. In this case, how do we tell that it was not due to battery running out while suspended? If we see that resume time was after suspend time, we know that it came out of suspend. Otherwise, it means unclean shutdown happened during suspend. I am thinking that we can reduce it to two files: low_battery and suspended. suspended is created on suspend and deleted on resume. This way we can detect whether the last state of the system was in suspend. The logic then becomes: IF suspended THEN battery ran out while suspended ELSE IF low_battery THEN battery ran out after resuming from low battery-induced suspend ELSE crash was not due to battery running out http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... > unclean_shutdown_collector.cc:119: } else if (was_suspended && > was_resumed && resume_time > suspend_time && > I think the whole first line of this is redundant. Do you need use > (was_resumed && resume_time > suspend_time)? It's required to be true > by the else if. Well, sort of, it should be resume_time >= suspend_time > but I assume that's a bug. Good point. The logic here is too complicated and it took me a while to analyze based on your feedback. All the more reason to simplify it down to two variables. > What about running_on_low_battery && suspend_time > low_battery_time && > was_resumed && resume_time > suspend_time? Doesn't that mean the > battery ran out while the device was resumed? Why wouldn't we return > true for that case. > I'm not sure what you're saying here. Isn't that the case captured by "else if..."? (except for the > instead of >= between suspend_time and low_battery_time).
I think two files whose existence implies the state of the power/suspension will simplify things. Thanks. On Fri, Nov 19, 2010 at 12:50 PM, Simon Que <sque@chromium.org> wrote: > > > On Fri, Nov 19, 2010 at 12:02 PM, <kmixter@chromium.org> wrote: > >> I'm not understanding the value in tracking the suspend/resume times or >> looking >> at them in crash_reporter. All we care about is "was the battery low and >> power >> unplugged before an unclean shutdown", right? > > > We want to avoid the false positives as well. For example suppose there > was a suspend and resume when battery is not low. And then something in the > system crashes. In this case, how do we tell that it was not due to battery > running out while suspended? If we see that resume time was after suspend > time, we know that it came out of suspend. Otherwise, it means unclean > shutdown happened during suspend. > > I am thinking that we can reduce it to two files: low_battery and > suspended. suspended is created on suspend and deleted on resume. This way > we can detect whether the last state of the system was in suspend. The > logic then becomes: > > IF suspended THEN > battery ran out while suspended > ELSE IF low_battery THEN > battery ran out after resuming from low battery-induced suspend > ELSE > crash was not due to battery running out > > > >> http://codereview.chromium.org/3644007/diff/9001/unclean_shutdown_collector.c... >> unclean_shutdown_collector.cc:119: } else if (was_suspended && >> was_resumed && resume_time > suspend_time && >> I think the whole first line of this is redundant. Do you need use >> (was_resumed && resume_time > suspend_time)? It's required to be true >> by the else if. Well, sort of, it should be resume_time >= suspend_time >> but I assume that's a bug. > > > Good point. The logic here is too complicated and it took me a while to > analyze based on your feedback. All the more reason to simplify it down to > two variables. > > >> What about running_on_low_battery && suspend_time > low_battery_time && >> was_resumed && resume_time > suspend_time? Doesn't that mean the >> battery ran out while the device was resumed? Why wouldn't we return >> true for that case. >> > > I'm not sure what you're saying here. Isn't that the case captured by > "else if..."? (except for the > instead of >= between suspend_time and > low_battery_time). > >
Much simpler, thanks! Could you also add a unit test for that function? Also, when sending out notifications after making a change to address someone's comments, please put some text in the message (often PTAL) so it's clear that it wasn't just a rietveld spam. http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.cc File unclean_shutdown_collector.cc (right): http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... unclean_shutdown_collector.cc:59: file_util::Delete(powerd_suspend_file_, false); Still would be nice to factor this out. How about putting it into DeleteUncleanShutdownFiles? and calling that above CheckForDeadBatteryUncleanShutdown? Then removing these deletions in Disable. http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... unclean_shutdown_collector.cc:86: logger_->LogInfo("Unclean shutdown occurred while resumed after battery " how about "while running with critically low battery" since there wasn't necessarily a suspend/resume cycle. Instead of "The battery probably ran out" here and above how about "not counting towards unclean shutdown statistic." (or something to that effect) http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.h File unclean_shutdown_collector.h (right): http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... unclean_shutdown_collector.h:11: #include "base/time.h" This isn't used here, please move it to the cc.
PTAL. I added unit tests in addition to the below changes. > http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... > unclean_shutdown_collector.cc:59: file_util::Delete(powerd_suspend_file_, > false); > Still would be nice to factor this out. How about putting it into > DeleteUncleanShutdownFiles? and calling that above > CheckForDeadBatteryUncleanShutdown? Then removing these deletions in Disable. Done. > http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... > unclean_shutdown_collector.cc:86: logger_->LogInfo("Unclean shutdown occurred > while resumed after battery " > how about "while running with critically low battery" since there wasn't > necessarily a suspend/resume cycle. Instead of "The battery probably ran out" > here and above how about "not counting towards unclean shutdown statistic." (or > something to that effect) Done. > http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.h > File unclean_shutdown_collector.h (right): > > http://codereview.chromium.org/3644007/diff/18001/unclean_shutdown_collector.... > unclean_shutdown_collector.h:11: #include "base/time.h" > This isn't used here, please move it to the cc. Removed altogether as we are no longer looking at timestamps.
http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector.cc File unclean_shutdown_collector.cc (right): http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector.... unclean_shutdown_collector.cc:59: CheckForDeadBatteryUncleanShutdown(); You're not checking the return value, so this really has no effect. You should not count as unclean and return false. http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... File unclean_shutdown_collector_test.cc (right): http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:19: static const char kTestLowBattery[] = "test/low_battery"; nit: abc order unless there's a specific reason not to http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:90: ASSERT_TRUE(collector_.Collect()); Should return false. http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:95: "battery critically low")); nit: search for ending '.' like you do in the other unit test http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:102: ASSERT_TRUE(collector_.Collect()); Should return false. http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:106: logging_.log().find("Unclean shutdown occurred while suspended.")); Could you add a check here, above and in CollectFalse and CollectTrue for the value of s_crashes? In all functions except CollectTrue it should be 0. In CollectTrue it should be 1. That would have made the bug in the new code more obvious to see in the test.
> unclean_shutdown_collector.cc:59: CheckForDeadBatteryUncleanShutdown(); > You're not checking the return value, so this really has no effect. You should > not count as unclean and return false. Done > unclean_shutdown_collector_test.cc:19: static const char kTestLowBattery[] = > "test/low_battery"; > nit: abc order unless there's a specific reason not to Done > unclean_shutdown_collector_test.cc:90: ASSERT_TRUE(collector_.Collect()); > Should return false. Done > http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... > unclean_shutdown_collector_test.cc:95: "battery critically low")); > nit: search for ending '.' like you do in the other unit test Done > http://codereview.chromium.org/3644007/diff/24001/unclean_shutdown_collector_... > unclean_shutdown_collector_test.cc:102: ASSERT_TRUE(collector_.Collect()); > Should return false. Done > unclean_shutdown_collector_test.cc:106: logging_.log().find("Unclean shutdown > occurred while suspended.")); > Could you add a check here, above and in CollectFalse and CollectTrue for the > value of s_crashes? In all functions except CollectTrue it should be 0. In > CollectTrue it should be 1. That would have made the bug in the new code more > obvious to see in the test. Made this change but I had to set s_metrics=true on line 15 in order for s_crashes == 1 in CollectTrue.
getting close http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... File unclean_shutdown_collector_test.cc (right): http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:18: static const char kTestLowBattery[] = "test/low_battery"; still not abc order. please try to be more careful when responding to reviews, it takes time to do these iterations. http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:78: ASSERT_EQ(s_crashes, 1); for _EQ checks always put the constant first (the failure message makes more sense)
> unclean_shutdown_collector_test.cc:18: static const char kTestLowBattery[] = > "test/low_battery"; > still not abc order. please try to be more careful when responding to reviews, > it takes time to do these iterations. Done. > unclean_shutdown_collector_test.cc:78: ASSERT_EQ(s_crashes, 1); > for _EQ checks always put the constant first (the failure message makes more > sense) Done.
http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... File unclean_shutdown_collector_test.cc (right): http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:18: static const char kTestLowBattery[] = "test/low_battery"; On 2010/11/24 00:22:38, kmixter1 wrote: > still not abc order. please try to be more careful when responding to reviews, > it takes time to do these iterations. Done. http://codereview.chromium.org/3644007/diff/31001/unclean_shutdown_collector_... unclean_shutdown_collector_test.cc:78: ASSERT_EQ(s_crashes, 1); On 2010/11/24 00:22:38, kmixter1 wrote: > for _EQ checks always put the constant first (the failure message makes more > sense) Done.
LGTM |