|
|
Created:
6 years, 6 months ago by Nicolas Zea Modified:
6 years, 6 months ago CC:
chromium-reviews, cbentzel+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Project:
chromium Visibility:
Public. |
DescriptionBackoff: Respect maximum delay in face of float overflow
Due to the 11 bit limitations for exponents in double precision, it's possible
to overflow when calculating the backoff amount. Check for this overflow and
drop back to the max backoff delay if necessary.
BUG=373181
R=cbentzel@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=278153
Patch Set 1 #
Total comments: 6
Patch Set 2 : Use CheckedNumeric and don't rely on policy max #
Total comments: 9
Patch Set 3 : Check max release time too #
Messages
Total messages: 16 (0 generated)
+Chris, please take a look
Adding rsleevi because he's better about thinking through overflow concerns than I am. https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, maximum_backoff_ms is not always set - so it seems like a bad idea to go to initial_delay_ms here. Instead, I think this could be written like ----- delay *= pow(policy...); const int64 kMaxInt64 = std::numeric_limits<int64>::max(); int64 delay_int = 0; if (isinf(delay)) { delay_int = kMaxInt64; } else { delay -= base::RandDouble() * policy_->jitter_factory * delay; delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + 0.5); } if (policy_->maximum_backoff_ms >= 0) delay_int = std::min(delay_int, policy_->maximum_backoff_ms); ------ In the overflow case this means that we will either be the maximum_backoff_ms or kMaxInt64. Other case stays the same. ------- However - I am worried that we may get integral overflow issues at some point if exponential_backoff_release_time_ is kMaxInt64. Inside this class it does not look like that will be a problem, but we do expose it (via GetReleaseTime) out to callers and they may run into this issue. https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry_unitt... File net/base/backoff_entry_unittest.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry_unitt... net/base/backoff_entry_unittest.cc:300: // to represent the exponential backoff intermediate values. Could you use a different policy for this test with a larger multiplication_factor to reduce the number of cycles needed to reach overflow?
https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, On 2014/06/14 15:16:07, cbentzel wrote: > maximum_backoff_ms is not always set - so it seems like a bad idea to go to > initial_delay_ms here. > > Instead, I think this could be written like > > ----- > > delay *= pow(policy...); > const int64 kMaxInt64 = std::numeric_limits<int64>::max(); > int64 delay_int = 0; > if (isinf(delay)) { > delay_int = kMaxInt64; > } else { > delay -= base::RandDouble() * policy_->jitter_factory * delay; > delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + 0.5); > } > if (policy_->maximum_backoff_ms >= 0) > delay_int = std::min(delay_int, policy_->maximum_backoff_ms); > > ------ > > In the overflow case this means that we will either be the maximum_backoff_ms or > kMaxInt64. Other case stays the same. > > ------- > > However - I am worried that we may get integral overflow issues at some point if > exponential_backoff_release_time_ is kMaxInt64. Inside this class it does not > look like that will be a problem, but we do expose it (via GetReleaseTime) out > to callers and they may run into this issue. Actually, we'd reach integral overflow with this approach below when doing ImplGetTimeNow() + delay_int
On 2014/06/14 15:30:47, cbentzel wrote: > https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc > File net/base/backoff_entry.cc (right): > > https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... > net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, > On 2014/06/14 15:16:07, cbentzel wrote: > > maximum_backoff_ms is not always set - so it seems like a bad idea to go to > > initial_delay_ms here. > > > > Instead, I think this could be written like > > > > ----- > > > > delay *= pow(policy...); > > const int64 kMaxInt64 = std::numeric_limits<int64>::max(); > > int64 delay_int = 0; > > if (isinf(delay)) { > > delay_int = kMaxInt64; > > } else { > > delay -= base::RandDouble() * policy_->jitter_factory * delay; > > delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + > 0.5); > > } > > if (policy_->maximum_backoff_ms >= 0) > > delay_int = std::min(delay_int, policy_->maximum_backoff_ms); > > > > ------ > > > > In the overflow case this means that we will either be the maximum_backoff_ms > or > > kMaxInt64. Other case stays the same. > > > > ------- > > > > However - I am worried that we may get integral overflow issues at some point > if > > exponential_backoff_release_time_ is kMaxInt64. Inside this class it does not > > look like that will be a problem, but we do expose it (via GetReleaseTime) out > > to callers and they may run into this issue. > > Actually, we'd reach integral overflow with this approach below when doing > ImplGetTimeNow() + delay_int You're right, initial delay isn't the right thing to fall back on. That said, what is a developer intending by not having a maximum backoff delay? Does it make sense to have it required, or default to some max value (e.g. 2 weeks)?
On 2014/06/16 06:24:40, Nicolas Zea wrote: > On 2014/06/14 15:30:47, cbentzel wrote: > > https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc > > File net/base/backoff_entry.cc (right): > > > > > https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... > > net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, > > On 2014/06/14 15:16:07, cbentzel wrote: > > > maximum_backoff_ms is not always set - so it seems like a bad idea to go to > > > initial_delay_ms here. > > > > > > Instead, I think this could be written like > > > > > > ----- > > > > > > delay *= pow(policy...); > > > const int64 kMaxInt64 = std::numeric_limits<int64>::max(); > > > int64 delay_int = 0; > > > if (isinf(delay)) { > > > delay_int = kMaxInt64; > > > } else { > > > delay -= base::RandDouble() * policy_->jitter_factory * delay; > > > delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + > > 0.5); > > > } > > > if (policy_->maximum_backoff_ms >= 0) > > > delay_int = std::min(delay_int, policy_->maximum_backoff_ms); > > > > > > ------ > > > > > > In the overflow case this means that we will either be the > maximum_backoff_ms > > or > > > kMaxInt64. Other case stays the same. > > > > > > ------- > > > > > > However - I am worried that we may get integral overflow issues at some > point > > if > > > exponential_backoff_release_time_ is kMaxInt64. Inside this class it does > not > > > look like that will be a problem, but we do expose it (via GetReleaseTime) > out > > > to callers and they may run into this issue. > > > > Actually, we'd reach integral overflow with this approach below when doing > > ImplGetTimeNow() + delay_int > > You're right, initial delay isn't the right thing to fall back on. That said, > what is a developer intending by not having a maximum backoff delay? Does it > make sense to have it required, or default to some max value (e.g. 2 weeks)? We may want a better default value but let's not address that in this CL. Ultimately, I want to make sure this one: a) Does not run into additional overflow issues b) Clamps to maximum delay time when reaching float overflow Also - could you change the title to not reflect "1k failures"? That happens in the gcm case since the multiplicaion factor is 2, but if the factor was 4 it would reach the same problem after 512 failures. Thanks.
When I think overflow concerns, I think "Gee, base/numerics/safe_math.h sure is great" ;) https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, On 2014/06/14 15:16:07, cbentzel wrote: > maximum_backoff_ms is not always set - so it seems like a bad idea to go to > initial_delay_ms here. > > Instead, I think this could be written like > > ----- > > delay *= pow(policy...); > const int64 kMaxInt64 = std::numeric_limits<int64>::max(); ^ base/basictypes.h has kint64max / kint64min > int64 delay_int = 0; > if (isinf(delay)) { > delay_int = kMaxInt64; > } else { > delay -= base::RandDouble() * policy_->jitter_factory * delay; > delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + 0.5); ^ Pretty sure it could still overflow here > } > if (policy_->maximum_backoff_ms >= 0) > delay_int = std::min(delay_int, policy_->maximum_backoff_ms); > > ------ > > In the overflow case this means that we will either be the maximum_backoff_ms or > kMaxInt64. Other case stays the same. > > ------- > > However - I am worried that we may get integral overflow issues at some point if > exponential_backoff_release_time_ is kMaxInt64. Inside this class it does not > look like that will be a problem, but we do expose it (via GetReleaseTime) out > to callers and they may run into this issue.
PTAL. I've gone ahead and tried to do all the overflow checking at once using CheckedNumeric. https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry.cc#ne... net/base/backoff_entry.cc:138: delay = std::max(policy_->maximum_backoff_ms, On 2014/06/16 17:20:47, Ryan Sleevi wrote: > On 2014/06/14 15:16:07, cbentzel wrote: > > maximum_backoff_ms is not always set - so it seems like a bad idea to go to > > initial_delay_ms here. > > > > Instead, I think this could be written like > > > > ----- > > > > delay *= pow(policy...); > > const int64 kMaxInt64 = std::numeric_limits<int64>::max(); > > ^ base/basictypes.h has kint64max / kint64min > > > int64 delay_int = 0; > > if (isinf(delay)) { > > delay_int = kMaxInt64; > > } else { > > delay -= base::RandDouble() * policy_->jitter_factory * delay; > > delay_int = (delay > kMaxInt64) ? kMaxInt64 : static_cast<int64>(delay + > 0.5); > > ^ Pretty sure it could still overflow here > > > } > > if (policy_->maximum_backoff_ms >= 0) > > delay_int = std::min(delay_int, policy_->maximum_backoff_ms); > > > > ------ > > > > In the overflow case this means that we will either be the maximum_backoff_ms > or > > kMaxInt64. Other case stays the same. > > > > ------- > > > > However - I am worried that we may get integral overflow issues at some point > if > > exponential_backoff_release_time_ is kMaxInt64. Inside this class it does not > > look like that will be a problem, but we do expose it (via GetReleaseTime) out > > to callers and they may run into this issue. > Switched to doing all overflow checking via CheckedNumeric and in microseconds (to ensure the TimeTicks conversion doesn't overflow). https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry_unitt... File net/base/backoff_entry_unittest.cc (right): https://codereview.chromium.org/337803004/diff/1/net/base/backoff_entry_unitt... net/base/backoff_entry_unittest.cc:300: // to represent the exponential backoff intermediate values. On 2014/06/14 15:16:07, cbentzel wrote: > Could you use a different policy for this test with a larger > multiplication_factor to reduce the number of cycles needed to reach overflow? Done.
https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:148: base::internal::CheckedNumeric<int64> checked_release_time_us = NaN to integer conversion is undefined, not sure if we want to do this. I _think_ what is happening here from reading CheckedNumeric is that first the double operation (delay_ms + 0.5) happens, followed by the implicit integral conversion (which is potentially undefined), followed by the CheckedNumeric constructor call.
https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:157: policy_->maximum_backoff_ms Can't this still overflow? https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:166: base::TimeTicks() + base::TimeDelta::FromMicroseconds(release_time_us), Can't this still overflow?
https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:148: base::internal::CheckedNumeric<int64> checked_release_time_us = On 2014/06/16 21:08:32, cbentzel wrote: > NaN to integer conversion is undefined, not sure if we want to do this. I > _think_ what is happening here from reading CheckedNumeric is that first the > double operation (delay_ms + 0.5) happens, followed by the implicit integral > conversion (which is potentially undefined), followed by the CheckedNumeric > constructor call. CheckedNumeric, from my understanding of the code, uses template specialization to handle the casts from float to integer and vice versa (see the comment about implicitly converting and overloading arithmetic). Looking at the code, the double(NaN) assignment is handled by the copy constructor for CheckedNumericState, which is itself templated based on the src type, and therefore invokes the DstRangeRelationToSrcRange functions, which are themselves specialized. https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:157: policy_->maximum_backoff_ms On 2014/06/16 22:12:47, Ryan Sleevi wrote: > Can't this still overflow? Hmm, I suppose if a developer gives a massive/invalid maximum backoff. That seems like an error in the policy though, and better enforced by having constraints on what valid max backoff values are allowed, rather than here. WDYT? https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:166: base::TimeTicks() + base::TimeDelta::FromMicroseconds(release_time_us), On 2014/06/16 22:12:47, Ryan Sleevi wrote: > Can't this still overflow? base::TimeTicks() is a 0 value, this is just for creating a TimeTicks object, not doing any actual arithmetic. Similarly, TimeTicks/TimeDelta use int64 microseconds internally, so this isn't converting to different units.
https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:148: base::internal::CheckedNumeric<int64> checked_release_time_us = On 2014/06/17 00:33:29, Nicolas Zea wrote: > On 2014/06/16 21:08:32, cbentzel wrote: > > NaN to integer conversion is undefined, not sure if we want to do this. I > > _think_ what is happening here from reading CheckedNumeric is that first the > > double operation (delay_ms + 0.5) happens, followed by the implicit integral > > conversion (which is potentially undefined), followed by the CheckedNumeric > > constructor call. > > CheckedNumeric, from my understanding of the code, uses template specialization > to handle the casts from float to integer and vice versa (see the comment about > implicitly converting and overloading arithmetic). I believe Nicolas is correct. That is, it invokes template <typename Src = double> CheckedNumeric<int64>::CheckedNumeric(double arg); which initializes template <typename Src = double> CheckedNumericState<int64>::CheckedNumericState(double arg); It then will static_cast<int64>(arg), and set validity_ to DstRangeRelationToSrcRange<int64, float>(arg) > > Looking at the code, the double(NaN) assignment is handled by the copy > constructor for CheckedNumericState, which is itself templated based on the src > type, and therefore invokes the DstRangeRelationToSrcRange functions, which are > themselves specialized. https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:157: policy_->maximum_backoff_ms On 2014/06/17 00:33:29, Nicolas Zea wrote: > On 2014/06/16 22:12:47, Ryan Sleevi wrote: > > Can't this still overflow? > > Hmm, I suppose if a developer gives a massive/invalid maximum backoff. That > seems like an error in the policy though, and better enforced by having > constraints on what valid max backoff values are allowed, rather than here. > WDYT? *shrug* I agree, it's likely an error in policy, but that policy is dependent upon what you do with maximum_backoff_ms (which, in this case, is added to kTimeTicksNowUs), so it feels like the check should be here.
PTAL https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.cc File net/base/backoff_entry.cc (right): https://codereview.chromium.org/337803004/diff/100001/net/base/backoff_entry.... net/base/backoff_entry.cc:157: policy_->maximum_backoff_ms On 2014/06/17 20:02:12, Ryan Sleevi wrote: > On 2014/06/17 00:33:29, Nicolas Zea wrote: > > On 2014/06/16 22:12:47, Ryan Sleevi wrote: > > > Can't this still overflow? > > > > Hmm, I suppose if a developer gives a massive/invalid maximum backoff. That > > seems like an error in the policy though, and better enforced by having > > constraints on what valid max backoff values are allowed, rather than here. > > WDYT? > > *shrug* I agree, it's likely an error in policy, but that policy is dependent > upon what you do with maximum_backoff_ms (which, in this case, is added to > kTimeTicksNowUs), so it feels like the check should be here. Done.
LGTM Thanks for discovering the issue and cleaning up BackoffEntry.
The CQ bit was checked by zea@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/zea@chromium.org/337803004/160001
Message was sent while issue was closed.
Committed patchset #3 manually as r278153 (presubmit successful). |