Issue 2826913002: [BattOr] Retry battor_agent commands when they fail.

aschulman

Description was changed from ========== [BattOr] Retry battor_agent commands when they fail. Many battor_agent failures ...

3 years, 8 months ago (2017-04-19 04:52:01 UTC) #1

rnephew (Reviews Here)

rnephew@chromium.org changed reviewers: + nednguyen@google.com, zhenw@chromium.org

3 years, 8 months ago (2017-04-21 22:01:57 UTC) #2

rnephew (Reviews Here)

rnephew@chromium.org changed reviewers: + rnephew@chromium.org

3 years, 8 months ago (2017-04-21 22:10:44 UTC) #3

rnephew (Reviews Here)

Can you use this to run third_party/catapult/common/battor/battor/battor_wrapper_devicetest.py and make sure the battor_wrapper still behaves properly ...

3 years, 8 months ago (2017-04-21 22:10:45 UTC) #4

charliea (OOO until 10-5)

charliea@chromium.org changed reviewers: + charliea@chromium.org

3 years, 7 months ago (2017-05-04 20:35:18 UTC) #5

charliea (OOO until 10-5)

I'm planning to take a look at this tomorrow.

3 years, 7 months ago (2017-05-04 20:36:17 UTC) #7

charliea (OOO until 10-5)

Randy, would you mind prereviewing this and passing it onto me when you think it's ...

3 years, 7 months ago (2017-05-05 19:47:33 UTC) #8

charliea (OOO until 10-5)

charliea@chromium.org changed reviewers: - nednguyen@google.com, zhenw@chromium.org

3 years, 7 months ago (2017-05-05 19:47:46 UTC) #9

rnephew (Reviews Here)

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc File tools/battor_agent/battor_agent.cc (left): https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc#oldcode530 tools/battor_agent/battor_agent.cc:530: connection_->Flush(); Do we not need to flush the connections ...

3 years, 7 months ago (2017-05-09 15:04:06 UTC) #10

charliea (OOO until 10-5)

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc#newcode20 tools/battor_agent/battor_agent.cc:20: const uint8_t kMaxCommandAttempts = 10; On 2017/05/09 15:04:05, rnephew ...

3 years, 7 months ago (2017-05-09 19:58:09 UTC) #11

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:20: const uint8_t kMaxCommandAttempts = 10;
On 2017/05/09 15:04:05, rnephew (Reviews Here) wrote:
> Was 20 chosen for a reason for kMaxInitAttempts/kMaxReadAttempts before
Charlie?
> I'm guessing it was just a best guess, so moving it down to ten shouldn't be a
> big deal.

I haven't seen this code in a long, long time...

Looking back at the blame, it looks like Aaron wrote the initial retry logic, so
I defer to him here :-)

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (left):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:182: GetAgent()->StartTracing();
On 2017/05/09 15:04:06, rnephew (Reviews Here) wrote:
> Charlie is there any reason you put these calls here as opposed to where Aaron
> moved them to?

Yeah, there was: before, there was the assumption that we'd only ever "run
through" the command once. A specific step in that command might be retried if
something went amiss, but things never really went backwards. 

Aaron's CL changes this and, instead of retrying individual steps, retries the
whole command over from the start when something goes wrong. As such, it makes
sense to be able to run through the command multiple times.

It seems like it's probably also safe to get rid of is_command_complete = false:
it's being set to false in the SetUp() method, where we're already setting
command_error. In the unlikely instance that either needs to be reused within a
single test case, we could always just manually set them.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:224: is_command_complete_ = false;
On 2017/05/09 15:04:05, rnephew (Reviews Here) wrote:
> Why is this no longer needed to be set to false? 

(See above comment)

aschulman

Thanks for the comments Charlie and Randy. I replied to your comments. It doesn't seem ...

3 years, 7 months ago (2017-05-11 13:13:00 UTC) #12

Thanks for the comments Charlie and Randy. I replied to your comments. It
doesn't seem like any changes  to the CL are needed.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (left):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:530: connection_->Flush();
On 2017/05/09 15:04:05, rnephew (Reviews Here) wrote:
> Do we not need to flush the connections now with these changes?
> 
> See comment in: tools/battor_agent/battor_connection_impl.cc

see reply in: : tools/battor_agent/battor_connection_impl.cc

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:20: const uint8_t kMaxCommandAttempts = 10;
On 2017/05/09 19:58:08, charliea wrote:
> On 2017/05/09 15:04:05, rnephew (Reviews Here) wrote:
> > Was 20 chosen for a reason for kMaxInitAttempts/kMaxReadAttempts before
> Charlie?
> > I'm guessing it was just a best guess, so moving it down to ten shouldn't be
a
> > big deal.
> 
> I haven't seen this code in a long, long time...
> 
> Looking back at the blame, it looks like Aaron wrote the initial retry logic,
so
> I defer to him here :-)

20 was due to the super short retry interval. With the much longer retry
interval we need fewer retries. 10 gave a very low failure rate in my testing,
so it seemed like a good choice for now.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:27: const uint16_t kCommandRetryDelaySeconds
= 2;
On 2017/04/21 22:10:45, rnephew (Reviews Here) wrote:
> These delays seem to be much larger than the previous
> delays(InitRetryDelayMilliseconds = 100). Is it necessary for it to be so
long?

Yeah the 100 msec retry interval was too aggressive. Many INIT messages were
being dropped.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (left):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:182: GetAgent()->StartTracing();
On 2017/05/09 19:58:08, charliea wrote:
> On 2017/05/09 15:04:06, rnephew (Reviews Here) wrote:
> > Charlie is there any reason you put these calls here as opposed to where
Aaron
> > moved them to?
> 
> Yeah, there was: before, there was the assumption that we'd only ever "run
> through" the command once. A specific step in that command might be retried if
> something went amiss, but things never really went backwards. 
> 
> Aaron's CL changes this and, instead of retrying individual steps, retries the
> whole command over from the start when something goes wrong. As such, it makes
> sense to be able to run through the command multiple times.
> 
> It seems like it's probably also safe to get rid of is_command_complete =
false:
> it's being set to false in the SetUp() method, where we're already setting
> command_error. In the unlikely instance that either needs to be reused within
a
> single test case, we could always just manually set them.

Acknowledged.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:224: is_command_complete_ = false;
On 2017/05/09 19:58:08, charliea wrote:
> On 2017/05/09 15:04:05, rnephew (Reviews Here) wrote:
> > Why is this no longer needed to be set to false? 
> 
> (See above comment)

Acknowledged.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_connection_impl.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_connection_impl.cc:84: Flush();
On 2017/05/09 15:04:06, rnephew (Reviews Here) wrote:
> Are the flushes not needed elsewhere because we moved it here?

Correct. Before we had to flush manually between retries of specific messages
(e.g., INIT). Now when a command fails it will reopen the BattOr connection
which will then jump into this condition and Flush() the existing connection.

rnephew (Reviews Here)

looks good to me, but I am not the most practiced at c++ style guides ...

3 years, 7 months ago (2017-05-11 17:19:02 UTC) #13

charliea (OOO until 10-5)

Overall, I think this is pretty close to being ready to go: just have a ...

3 years, 7 months ago (2017-05-12 22:03:43 UTC) #14

Overall, I think this is pretty close to being ready to go: just have a few
relatively minor suggestions

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:468:
base::TimeDelta::FromSeconds(kBattOrControlMessageTimeoutSeconds));
Could we move the timeout_callback_.Reset() and the PostDelayedTask into a
separate method, SetActionTimeout(seconds)? That would allow us to get rid of
your comment above, which I agree is currently necessary because the code itself
isn't self-documenting.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:525: // Start control message respone timeout
timer.
s/respone/response, but see above comment about factoring this out

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:545:
base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
One thing that might make this a little cleaner is if you story the task you
want to post in a closure and just post the delayed task after the switch. I
think you can do:

base::Callback<void(int)> next_command;
switch(command_) {
  case Command::START_TRACING:
    next_command = base::Bind(&BattOrAgent::StartTracing, AsWeakPtr());
    return;
  case Command::STOP_TRACING:
    next_command = base::Bind(&BattOrAgent::StopTracing, AsWeakPtr());
    return;
  ...
}

base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
    next_command, base::TimeDelta::FromSeconds(kCommandRetryDelaySeconds);

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest,
StartTracingSucceedsAfterSetGainSendFails) {
o_O is this passing right now? The name of the test is
"StartTracingSucceedsAfter...", but the test asserts that StartTracing fails
after an init send fails.

It seems like EXPECT_TRUE(IsCommandComplete()); should fail because the
BattOrAgent is waiting for more data for its retry attempt.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:519: // Send some samples instead of
an INIT ACK. This will force an command retry.
s/an command/a command

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:535: TEST_F(BattOrAgentTest,
StartTracingFailsAfterTooManyFails) {
I think that "StartTracingFailsAfterTooManyCumulativeFailures" would be a better
indicator of what this test is actually testing (both that there's a failure
limit, and that the failure limit works between different stages of the same
command)

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:747: RawBattOrSample frame1[] =
{RawBattOrSample{1, 1}};
nit: if there's no frame2, just call it "frame" instead of "frame1"

aschulman

On 2017/05/12 22:03:43, charliea wrote: > Overall, I think this is pretty close to being ...

3 years, 7 months ago (2017-05-15 18:49:06 UTC) #15

On 2017/05/12 22:03:43, charliea wrote:
> Overall, I think this is pretty close to being ready to go: just have a few
> relatively minor suggestions
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> File tools/battor_agent/battor_agent.cc (right):
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent.cc:468:
> base::TimeDelta::FromSeconds(kBattOrControlMessageTimeoutSeconds));
> Could we move the timeout_callback_.Reset() and the PostDelayedTask into a
> separate method, SetActionTimeout(seconds)? That would allow us to get rid of
> your comment above, which I agree is currently necessary because the code
itself
> isn't self-documenting.
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent.cc:525: // Start control message respone
timeout
> timer.
> s/respone/response, but see above comment about factoring this out
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent.cc:545:
> base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
> One thing that might make this a little cleaner is if you story the task you
> want to post in a closure and just post the delayed task after the switch. I
> think you can do:
> 
> base::Callback<void(int)> next_command;
> switch(command_) {
>   case Command::START_TRACING:
>     next_command = base::Bind(&BattOrAgent::StartTracing, AsWeakPtr());
>     return;
>   case Command::STOP_TRACING:
>     next_command = base::Bind(&BattOrAgent::StopTracing, AsWeakPtr());
>     return;
>   ...
> }
> 
> base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
>     next_command, base::TimeDelta::FromSeconds(kCommandRetryDelaySeconds);
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> File tools/battor_agent/battor_agent_unittest.cc (right):
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest,
> StartTracingSucceedsAfterSetGainSendFails) {
> o_O is this passing right now? The name of the test is
> "StartTracingSucceedsAfter...", but the test asserts that StartTracing fails
> after an init send fails.
> 
> It seems like EXPECT_TRUE(IsCommandComplete()); should fail because the
> BattOrAgent is waiting for more data for its retry attempt.
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent_unittest.cc:519: // Send some samples instead
of
> an INIT ACK. This will force an command retry.
> s/an command/a command
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent_unittest.cc:535: TEST_F(BattOrAgentTest,
> StartTracingFailsAfterTooManyFails) {
> I think that "StartTracingFailsAfterTooManyCumulativeFailures" would be a
better
> indicator of what this test is actually testing (both that there's a failure
> limit, and that the failure limit works between different stages of the same
> command)
> 
>
https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
> tools/battor_agent/battor_agent_unittest.cc:747: RawBattOrSample frame1[] =
> {RawBattOrSample{1, 1}};
> nit: if there's no frame2, just call it "frame" instead of "frame1"

Thanks Charlie! I addressed all of your comments in latest patch set.

aschulman

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent.cc#newcode468 tools/battor_agent/battor_agent.cc:468: base::TimeDelta::FromSeconds(kBattOrControlMessageTimeoutSeconds)); On 2017/05/12 22:03:42, charliea wrote: > Could we ...

3 years, 7 months ago (2017-05-15 18:49:17 UTC) #16

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:468:
base::TimeDelta::FromSeconds(kBattOrControlMessageTimeoutSeconds));
On 2017/05/12 22:03:42, charliea wrote:
> Could we move the timeout_callback_.Reset() and the PostDelayedTask into a
> separate method, SetActionTimeout(seconds)? That would allow us to get rid of
> your comment above, which I agree is currently necessary because the code
itself
> isn't self-documenting.

Done.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:525: // Start control message respone timeout
timer.
On 2017/05/12 22:03:42, charliea wrote:
> s/respone/response, but see above comment about factoring this out

Done.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:545:
base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
On 2017/05/12 22:03:42, charliea wrote:
> One thing that might make this a little cleaner is if you story the task you
> want to post in a closure and just post the delayed task after the switch. I
> think you can do:
> 
> base::Callback<void(int)> next_command;
> switch(command_) {
>   case Command::START_TRACING:
>     next_command = base::Bind(&BattOrAgent::StartTracing, AsWeakPtr());
>     return;
>   case Command::STOP_TRACING:
>     next_command = base::Bind(&BattOrAgent::StopTracing, AsWeakPtr());
>     return;
>   ...
> }
> 
> base::ThreadTaskRunnerHandle::Get()->PostDelayedTask(
>     next_command, base::TimeDelta::FromSeconds(kCommandRetryDelaySeconds);

Done.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest,
StartTracingSucceedsAfterSetGainSendFails) {
On 2017/05/12 22:03:42, charliea wrote:
> o_O is this passing right now? The name of the test is
> "StartTracingSucceedsAfter...", but the test asserts that StartTracing fails
> after an init send fails.
> 
> It seems like EXPECT_TRUE(IsCommandComplete()); should fail because the
> BattOrAgent is waiting for more data for its retry attempt.

right, this should say "StartTracingFails..." because the send failed. we fail
immediately if a send fails because there is no other reason for that than a
completely dead serial connection to the BattOr. there is no benefit to
retrying.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:519: // Send some samples instead of
an INIT ACK. This will force an command retry.
On 2017/05/12 22:03:42, charliea wrote:
> s/an command/a command

Done.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:535: TEST_F(BattOrAgentTest,
StartTracingFailsAfterTooManyFails) {
On 2017/05/12 22:03:43, charliea wrote:
> I think that "StartTracingFailsAfterTooManyCumulativeFailures" would be a
better
> indicator of what this test is actually testing (both that there's a failure
> limit, and that the failure limit works between different stages of the same
> command)

Done.

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:747: RawBattOrSample frame1[] =
{RawBattOrSample{1, 1}};
On 2017/05/12 22:03:42, charliea wrote:
> nit: if there's no frame2, just call it "frame" instead of "frame1"

Done.

charliea (OOO until 10-5)

lgtm w/ comments https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent_unittest.cc File tools/battor_agent/battor_agent_unittest.cc (right): https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/battor_agent_unittest.cc#newcode431 tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest, StartTracingSucceedsAfterSetGainSendFails) { On 2017/05/15 18:49:17, ...

3 years, 7 months ago (2017-05-15 19:03:10 UTC) #17

lgtm w/ comments

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest,
StartTracingSucceedsAfterSetGainSendFails) {
On 2017/05/15 18:49:17, aschulman wrote:
> On 2017/05/12 22:03:42, charliea wrote:
> > o_O is this passing right now? The name of the test is
> > "StartTracingSucceedsAfter...", but the test asserts that StartTracing fails
> > after an init send fails.
> > 
> > It seems like EXPECT_TRUE(IsCommandComplete()); should fail because the
> > BattOrAgent is waiting for more data for its retry attempt.
> 
> right, this should say "StartTracingFails..." because the send failed. we fail
> immediately if a send fails because there is no other reason for that than a
> completely dead serial connection to the BattOr. there is no benefit to
> retrying.

Ah, okay. So we only retry the command if a receive fails, not if a send fails?

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:526: base::Callback<void()> next_command;
Interesting: I would think that this needs to be void(uint16_t). My
understanding was always that the type before the parentheses was the return
type, and the types inside of the parentheses were the parameters that needed to
be passed in to make the thing a real function call. If this works though, it
looks good to me.

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.h (right):

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.h:147: void setActionTimeout(const uint16_t
timeout_seconds);
s/setActionTimeout/SetActionTimeout

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.h:147: void setActionTimeout(const uint16_t
timeout_seconds);
nit: for consistency, maybe just use uint16_t (see SendControlMessage above). I
have no idea if there was an original reason that we used the non-const version,
but it seems worth it to not people wonder why one method used const and another
didn't when neither actually modified the parameter

aschulman

Final comments are taken care of in the latest patch set. Randy, after this is ...

3 years, 7 months ago (2017-05-15 22:26:25 UTC) #18

Final comments are taken care of in the latest patch set. Randy, after this is
committed, can you deploy it on the mac bots?

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent_unittest.cc (right):

https://codereview.chromium.org/2826913002/diff/40001/tools/battor_agent/batt...
tools/battor_agent/battor_agent_unittest.cc:431: TEST_F(BattOrAgentTest,
StartTracingSucceedsAfterSetGainSendFails) {
On 2017/05/15 19:03:10, charliea wrote:
> On 2017/05/15 18:49:17, aschulman wrote:
> > On 2017/05/12 22:03:42, charliea wrote:
> > > o_O is this passing right now? The name of the test is
> > > "StartTracingSucceedsAfter...", but the test asserts that StartTracing
fails
> > > after an init send fails.
> > > 
> > > It seems like EXPECT_TRUE(IsCommandComplete()); should fail because the
> > > BattOrAgent is waiting for more data for its retry attempt.
> > 
> > right, this should say "StartTracingFails..." because the send failed. we
fail
> > immediately if a send fails because there is no other reason for that than a
> > completely dead serial connection to the BattOr. there is no benefit to
> > retrying.
> 
> Ah, okay. So we only retry the command if a receive fails, not if a send
fails?

correct

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.cc (right):

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.cc:526: base::Callback<void()> next_command;
On 2017/05/15 19:03:10, charliea wrote:
> Interesting: I would think that this needs to be void(uint16_t). My
> understanding was always that the type before the parentheses was the return
> type, and the types inside of the parentheses were the parameters that needed
to
> be passed in to make the thing a real function call. If this works though, it
> looks good to me.

Done.

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
File tools/battor_agent/battor_agent.h (right):

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.h:147: void setActionTimeout(const uint16_t
timeout_seconds);
On 2017/05/15 19:03:10, charliea wrote:
> nit: for consistency, maybe just use uint16_t (see SendControlMessage above).
I
> have no idea if there was an original reason that we used the non-const
version,
> but it seems worth it to not people wonder why one method used const and
another
> didn't when neither actually modified the parameter

Done.

https://codereview.chromium.org/2826913002/diff/60001/tools/battor_agent/batt...
tools/battor_agent/battor_agent.h:147: void setActionTimeout(const uint16_t
timeout_seconds);
On 2017/05/15 19:03:10, charliea wrote:
> s/setActionTimeout/SetActionTimeout

Done.

aschulman

The patchset sent to the CQ was uploaded after l-g-t-m from charliea@chromium.org Link to the ...

3 years, 7 months ago (2017-05-15 22:26:31 UTC) #20

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2826913002/100001

3 years, 7 months ago (2017-05-15 22:28:23 UTC) #21

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 7 months ago (2017-05-15 23:14:56 UTC) #22

commit-bot: I haz the power

Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/455029) win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, ...

3 years, 7 months ago (2017-05-15 23:14:57 UTC) #23

charliea (OOO until 10-5)

https://codereview.chromium.org/2826913002/diff/100001/tools/battor_agent/battor_agent.cc File tools/battor_agent/battor_agent.cc (right): https://codereview.chromium.org/2826913002/diff/100001/tools/battor_agent/battor_agent.cc#newcode542 tools/battor_agent/battor_agent.cc:542: base::ThreadTaskRunnerHandle::Get()->PostDelayedTask( Ah, looks like the presubmit caught a bug ...

3 years, 7 months ago (2017-05-16 16:13:08 UTC) #24

aschulman

On 2017/05/16 16:13:08, charliea wrote: > https://codereview.chromium.org/2826913002/diff/100001/tools/battor_agent/battor_agent.cc > File tools/battor_agent/battor_agent.cc (right): > > https://codereview.chromium.org/2826913002/diff/100001/tools/battor_agent/battor_agent.cc#newcode542 > ...

3 years, 7 months ago (2017-05-17 16:49:57 UTC) #25

aschulman

The patchset sent to the CQ was uploaded after l-g-t-m from charliea@chromium.org Link to the ...

3 years, 7 months ago (2017-05-17 20:15:37 UTC) #27

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2826913002/120001

3 years, 7 months ago (2017-05-17 20:16:46 UTC) #28

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 7 months ago (2017-05-18 02:56:27 UTC) #29

commit-bot: I haz the power

Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/447583)

3 years, 7 months ago (2017-05-18 02:56:28 UTC) #30

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2826913002/120001

3 years, 7 months ago (2017-05-18 07:41:37 UTC) #32

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 120001, "attempt_start_ts": 1495093256217880, "parent_rev": "2a03a5de9275d45001da304069313dc7810a63ec", "commit_rev": "0406975eadca3fc0ac6435dd18718b8bc6f9fc5e"}

3 years, 7 months ago (2017-05-18 07:46:51 UTC) #33

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 120001, "attempt_start_ts": 1495093256217880, "parent_rev": "b6d4c52370c0b1106e1bff3cdba63172950bd563", "commit_rev": "b8e9b5f2237d0fad703d20fb0a22d45b6ebdb047"}

3 years, 7 months ago (2017-05-18 07:47:06 UTC) #34

commit-bot: I haz the power

Description was changed from ========== [BattOr] Retry battor_agent commands when they fail. Many battor_agent failures ...

3 years, 7 months ago (2017-05-18 07:47:21 UTC) #35

commit-bot: I haz the power

3 years, 7 months ago (2017-05-18 07:47:22 UTC) #36

Message was sent while issue was closed.

Committed patchset #7 (id:120001) as
https://chromium.googlesource.com/chromium/src/+/b8e9b5f2237d0fad703d20fb0a22...

Issue 2826913002: [BattOr] Retry battor_agent commands when they fail. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Stored samples needed to be cleared before each retry of StopTracing. #

Patch Set 3 : Doubled retry attempts (each StopTracing attempt requires two retries) #

Patch Set 4 : Addressed Charlie's comments #

Patch Set 5 : [BattOr] Retry battor_agent commands when they fail. #

Patch Set 6 : Final comments #

Patch Set 7 : Swapped return with break #

Messages