|
|
Chromium Code Reviews|
Created:
4 years, 7 months ago by aschulman Modified:
4 years, 6 months ago Reviewers:
charliea (OOO until 10-5) CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[battor agent] Fix the init sequence so it retries inits instead of resets.
The BattOr will reset if it receives an INIT message after it has
already collected data. The existing implementation would not retry
the INIT message after the first INIT times out. Instead, it would
proactively send a RESET to the BattOr. The problem is if there are
extra bytes on the link the proactive reset may fail.
In this fix, the BattOr agent will retry the INIT message several times
until it succeeds.
BUG=585192
Committed: https://crrev.com/3a21c9f5e01525b8788e260757152883b3ed913c
Cr-Commit-Position: refs/heads/master@{#400045}
Patch Set 1 : Initial draft #
Total comments: 16
Patch Set 2 : Cleaned up and ready for a 2nd review #
Total comments: 18
Patch Set 3 : Final changes made #Patch Set 4 : Fixed unit tests that were broken by this patch #
Total comments: 3
Patch Set 5 : Added StartTracing Success after INIT failure test case #
Total comments: 2
Patch Set 6 : Fixed comment #
Messages
Total messages: 27 (9 generated)
Description was changed from ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 ========== to ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 ==========
aschulman@chromium.org changed reviewers: + charliea@chromium.org
(Hitting send on drafts, just in case they're applicable for the next version you're working on) https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:261: case Action::SEND_INIT: I'm not sure I understand why this part is necessary: could you explain? I also think it might be problematic. All of the other paths through OnMessageRead() either end with CompleteCommand (which finishes the current command and returns control elsewhere) or Perform[Delayed]Action, which executes a follow-up command. This path doesn't do either, though, and seems like it's just going to lead to the program hanging. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:268: if (++num_init_attempts_ < kMaxInitAttempts) nit: any time that there are multiple lines inside of a scope, you need braces around it nit2: any time that an if() clause has braces, the else() clause also needs braces so this should be: if (++num_init_attempts_ < kMaxInitAttempts) { ... } else { ... } https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:286: PerformDelayedAction(Action::SEND_INIT, (same not about braces as above) https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:413: case Action::SEND_RESET: This can be deleted https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:496: switch (last_action_) { nit: when it makes sense, Chromium prefers to "early exit" for special cases and have the lowest indentation level by the "normal path" that would make this: if (last_action_ == Action::READ_INIT_ACK) { if (++num_init_attempts_ < kMaxInitAttempts) { connection_->CancelReadMessage(); PerformAction(Action::SEND_INIT); } else { CompleteCommand(BATTOR_ERROR_INIT_ERROR); } return; } CompleteCommand(BATTOR_ERROR_TIMEOUT); timeout_callback_.Cancel(); https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... File tools/battor_agent/battor_agent.h (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.h:97: SEND_RESET, This can be deleted https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_c... File tools/battor_agent/battor_connection_impl.cc (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_c... tools/battor_agent/battor_connection_impl.cc:175: io_handler_->CancelRead(device::serial::ReceiveError::NONE); Think it might be more accurate to cancel it w/ ReceiveError=TIMEOUT here https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_e... File tools/battor_agent/battor_error.h (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_e... tools/battor_agent/battor_error.h:20: BATTOR_ERROR_INIT_ERROR, How about BATTOR_ERROR_INIT_TIMEOUT? This seems like just a special case of a timeout where the last command attempted is an init. I think eventually we probably want to get rid of this error and instead have each BattOr error be some tuple of (error code, last command attempted). However, until we have that, I agree that it makes sense to have some obvious way to detect that this is the stage that failed.
For reference I integrated all of your comments into the new patch set. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:261: case Action::SEND_INIT: On 2016/05/23 14:08:52, charliea wrote: > I'm not sure I understand why this part is necessary: could you explain? > > I also think it might be problematic. All of the other paths through > OnMessageRead() either end with CompleteCommand (which finishes the current > command and returns control elsewhere) or Perform[Delayed]Action, which executes > a follow-up command. This path doesn't do either, though, and seems like it's > just going to lead to the program hanging. You are right this is confusing, especially since the timer is reset by PerformAction. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:268: if (++num_init_attempts_ < kMaxInitAttempts) On 2016/05/23 14:08:51, charliea wrote: > nit: any time that there are multiple lines inside of a scope, you need braces > around it > > nit2: any time that an if() clause has braces, the else() clause also needs > braces > > so this should be: > > if (++num_init_attempts_ < kMaxInitAttempts) { > ... > } else { > ... > } Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:286: PerformDelayedAction(Action::SEND_INIT, On 2016/05/23 14:08:51, charliea wrote: > (same not about braces as above) Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:413: case Action::SEND_RESET: On 2016/05/23 14:08:51, charliea wrote: > This can be deleted Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.cc:496: switch (last_action_) { On 2016/05/23 14:08:51, charliea wrote: > nit: when it makes sense, Chromium prefers to "early exit" for special cases and > have the lowest indentation level by the "normal path" > > that would make this: > > if (last_action_ == Action::READ_INIT_ACK) { > if (++num_init_attempts_ < kMaxInitAttempts) { > connection_->CancelReadMessage(); > PerformAction(Action::SEND_INIT); > } else { > CompleteCommand(BATTOR_ERROR_INIT_ERROR); > } > > return; > } > > CompleteCommand(BATTOR_ERROR_TIMEOUT); > timeout_callback_.Cancel(); Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... File tools/battor_agent/battor_agent.h (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_a... tools/battor_agent/battor_agent.h:97: SEND_RESET, On 2016/05/23 14:08:52, charliea wrote: > This can be deleted Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_c... File tools/battor_agent/battor_connection_impl.cc (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_c... tools/battor_agent/battor_connection_impl.cc:175: io_handler_->CancelRead(device::serial::ReceiveError::NONE); On 2016/05/23 14:08:52, charliea wrote: > Think it might be more accurate to cancel it w/ ReceiveError=TIMEOUT here Done. https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_e... File tools/battor_agent/battor_error.h (right): https://codereview.chromium.org/1991403002/diff/1/tools/battor_agent/battor_e... tools/battor_agent/battor_error.h:20: BATTOR_ERROR_INIT_ERROR, On 2016/05/23 14:08:52, charliea wrote: > How about BATTOR_ERROR_INIT_TIMEOUT? This seems like just a special case of a > timeout where the last command attempted is an init. > > I think eventually we probably want to get rid of this error and instead have > each BattOr error be some tuple of (error code, last command attempted). > However, until we have that, I agree that it makes sense to have some obvious > way to detect that this is the stage that failed. I went with BATTOR_ERROR_INIT_MAX_RETRIES since I think that best matches what caused this error to fire.
Patchset #2 (id:20001) has been deleted
If you're okay with it, I think that I'll approve this once the changes are made and then I'll draft up a CL adding some unit tests. The unit tests take quite a bit of ramping-up on and I'm afraid it would be a huge time suck. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:24: // The maximum number of times to retry when init'ing a battor. same nit about maybe putting this above the read attempts stuff because the init happens first https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:127: num_init_attempts_(0) { same nit (swap init and read) https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:180: nit: delete extra \n https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:265: if (++num_init_attempts_ < kMaxInitAttempts) { This definitely stems from problematic code that I wrote, but I think that (here and elsewhere) it should actually be if (num_init_attempts++ < kMaxInitAttempts) Logically, it makes sense that "max init attempts" is the maximum number of times that you'd attempt an init before giving up. However, right now, if kMaxInitAttempts = 2, then (++num_init_attempts < kMaxInitAttempts) will return false before the first retry, which is problematic. Could you change all of the ++num_init_attempts to num_init_attempts++? I'll submit a separate CL doing the same for num_read_attempts https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:291: return; I think this return; is indented wrong (by one space). (I'd recommend running `git cl format` to make sure that all of the C++ code in the CL is formatted correctly) https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:492: // Cancel INIT ACK read, OnReadMessage() will fail and retry SEND_INIT. I think the function is OnMessageRead and not OnReadMessage https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:492: // Cancel INIT ACK read, OnReadMessage() will fail and retry SEND_INIT. I'd put this comment inside of the if clause, because we only cancel if the condition is met. I don't think that means you have to put braces around it (because it's just a comment), but I'm honestly not sure. (Also, looking around, there are plenty of places that I don't do this in this file... sorry about that :-/ I think that this is something that I didn't notice I was doing 6 months ago but I now notice.) I'd also recommend scrapping the "Cancel INIT ACK read" part of the comment, because it's just describing what's obvious in the code. I think that Chromium's general attitude is that we prefer to use comments to describing non-intuitive parts of the code being commented rather than for as a second means of documenting what's happening. e.g. // This will cause OnMessageRead() to be called with success=false. connection_->CancelReadMessage(). and not // Cancels the read message. connection_->CancelReadMessage(); https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_agent.h (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.h:178: // The number of times we've attempted to init the BattOr. Logically, it might make sense to move this above read_attempts (because the init happens first) https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_error.h (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_error.h:20: BATTOR_ERROR_INIT_MAX_RETRIES, my vote for this error message is "TOO_MANY_INIT_RETRIES", or maybe "EXCEEDED_INIT_MAX_RETRIES" my reasoning is that all of the other error messages describe something that's clearly problematic "TIMEOUT, SEND_ERROR, UNEXPECTED_MESSAGE", whereas "INIT_MAX_RETRIES" sounds like maybe we just had to use the maximum number of init retries, but succeeded on that last attempt
Thanks for the comments! I incorporated everything into the final patch set. There also were some "#include <iostream>" that snuck in from my debugging that I removed too. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:24: // The maximum number of times to retry when init'ing a battor. On 2016/06/10 17:05:33, charliea wrote: > same nit about maybe putting this above the read attempts stuff because the init > happens first Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:127: num_init_attempts_(0) { On 2016/06/10 17:05:33, charliea wrote: > same nit (swap init and read) Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:180: On 2016/06/10 17:05:33, charliea wrote: > nit: delete extra \n Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:265: if (++num_init_attempts_ < kMaxInitAttempts) { On 2016/06/10 17:05:33, charliea wrote: > This definitely stems from problematic code that I wrote, but I think that (here > and elsewhere) it should actually be > > if (num_init_attempts++ < kMaxInitAttempts) > > Logically, it makes sense that "max init attempts" is the maximum number of > times that you'd attempt an init before giving up. However, right now, if > kMaxInitAttempts = 2, then > > (++num_init_attempts < kMaxInitAttempts) will return false before the first > retry, which is problematic. > > Could you change all of the ++num_init_attempts to num_init_attempts++? I'll > submit a separate CL doing the same for num_read_attempts Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:291: return; On 2016/06/10 17:05:33, charliea wrote: > I think this return; is indented wrong (by one space). > > (I'd recommend running `git cl format` to make sure that all of the C++ code in > the CL is formatted correctly) Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:492: // Cancel INIT ACK read, OnReadMessage() will fail and retry SEND_INIT. On 2016/06/10 17:05:33, charliea wrote: > I think the function is OnMessageRead and not OnReadMessage Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.cc:492: // Cancel INIT ACK read, OnReadMessage() will fail and retry SEND_INIT. On 2016/06/10 17:05:33, charliea wrote: > I'd put this comment inside of the if clause, because we only cancel if the > condition is met. I don't think that means you have to put braces around it > (because it's just a comment), but I'm honestly not sure. > > (Also, looking around, there are plenty of places that I don't do this in this > file... sorry about that :-/ I think that this is something that I didn't notice > I was doing 6 months ago but I now notice.) > > I'd also recommend scrapping the "Cancel INIT ACK read" part of the comment, > because it's just describing what's obvious in the code. I think that Chromium's > general attitude is that we prefer to use comments to describing non-intuitive > parts of the code being commented rather than for as a second means of > documenting what's happening. > > e.g. > // This will cause OnMessageRead() to be called with success=false. > connection_->CancelReadMessage(). > > and not > > // Cancels the read message. > connection_->CancelReadMessage(); Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_agent.h (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_agent.h:178: // The number of times we've attempted to init the BattOr. On 2016/06/10 17:05:33, charliea wrote: > Logically, it might make sense to move this above read_attempts (because the > init happens first) Done. https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... File tools/battor_agent/battor_error.h (right): https://codereview.chromium.org/1991403002/diff/40001/tools/battor_agent/batt... tools/battor_agent/battor_error.h:20: BATTOR_ERROR_INIT_MAX_RETRIES, On 2016/06/10 17:05:33, charliea wrote: > my vote for this error message is "TOO_MANY_INIT_RETRIES", or maybe > "EXCEEDED_INIT_MAX_RETRIES" > > my reasoning is that all of the other error messages describe something that's > clearly problematic "TIMEOUT, SEND_ERROR, UNEXPECTED_MESSAGE", whereas > "INIT_MAX_RETRIES" sounds like maybe we just had to use the maximum number of > init retries, but succeeded on that last attempt Done.
lgtm
The CQ bit was checked by aschulman@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1991403002/60001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_gn_chromeos_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/1991403002/diff/80001/tools/battor_agent/batt... File tools/battor_agent/battor_agent_unittest.cc (right): https://codereview.chromium.org/1991403002/diff/80001/tools/battor_agent/batt... tools/battor_agent/battor_agent_unittest.cc:347: TEST_F(BattOrAgentTest, StartTracingFailsIfResetSendFails) { I think that this test should be deleted. https://codereview.chromium.org/1991403002/diff/80001/tools/battor_agent/batt... tools/battor_agent/battor_agent_unittest.cc:578: TEST_F(BattOrAgentTest, StopTracingSucceedsWithOneCalibrationFrameReadFailure) { I would add a test (up by the StartTracing tests) that's similar to this one and verifies that the overall StartTracing command can succeed even if there's an init failure.
also, nolgtm due to test failures
On 2016/06/15 14:35:12, charliea wrote: > also, nolgtm due to test failures maybe not lgtm ?
On 2016/06/15 14:36:08, charliea wrote: > On 2016/06/15 14:35:12, charliea wrote: > > also, nolgtm due to test failures > > maybe not lgtm ? are you seeing test failures? I don't see them on my machine.
Added the new test case that you requested. https://codereview.chromium.org/1991403002/diff/80001/tools/battor_agent/batt... File tools/battor_agent/battor_agent_unittest.cc (right): https://codereview.chromium.org/1991403002/diff/80001/tools/battor_agent/batt... tools/battor_agent/battor_agent_unittest.cc:578: TEST_F(BattOrAgentTest, StopTracingSucceedsWithOneCalibrationFrameReadFailure) { On 2016/06/15 14:28:50, charliea wrote: > I would add a test (up by the StartTracing tests) that's similar to this one and > verifies that the overall StartTracing command can succeed even if there's an > init failure. Done.
lgtm w/ nit https://codereview.chromium.org/1991403002/diff/100001/tools/battor_agent/bat... File tools/battor_agent/battor_agent_unittest.cc (right): https://codereview.chromium.org/1991403002/diff/100001/tools/battor_agent/bat... tools/battor_agent/battor_agent_unittest.cc:436: // Instead of an INIT ACK get some samples. This will force an init retry. Maybe "Send some samples instead of an INIT ACK. This will force an init retry."?
The CQ bit was checked by aschulman@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from charliea@chromium.org Link to the patchset: https://codereview.chromium.org/1991403002/#ps120001 (title: "Fixed comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1991403002/120001
putting to CQ https://codereview.chromium.org/1991403002/diff/100001/tools/battor_agent/bat... File tools/battor_agent/battor_agent_unittest.cc (right): https://codereview.chromium.org/1991403002/diff/100001/tools/battor_agent/bat... tools/battor_agent/battor_agent_unittest.cc:436: // Instead of an INIT ACK get some samples. This will force an init retry. On 2016/06/15 18:56:34, charliea wrote: > Maybe "Send some samples instead of an INIT ACK. This will force an init > retry."? Done.
Message was sent while issue was closed.
Description was changed from ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 ========== to ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:120001)
Message was sent while issue was closed.
CQ bit was unchecked
Message was sent while issue was closed.
Description was changed from ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 ========== to ========== [battor agent] Fix the init sequence so it retries inits instead of resets. The BattOr will reset if it receives an INIT message after it has already collected data. The existing implementation would not retry the INIT message after the first INIT times out. Instead, it would proactively send a RESET to the BattOr. The problem is if there are extra bytes on the link the proactive reset may fail. In this fix, the BattOr agent will retry the INIT message several times until it succeeds. BUG=585192 Committed: https://crrev.com/3a21c9f5e01525b8788e260757152883b3ed913c Cr-Commit-Position: refs/heads/master@{#400045} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/3a21c9f5e01525b8788e260757152883b3ed913c Cr-Commit-Position: refs/heads/master@{#400045} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
