|
|
Created:
7 years, 1 month ago by haitaol1 Modified:
7 years, 1 month ago CC:
chromium-reviews, tim+watch_chromium.org, cbentzel+watch_chromium.org, rsimha+watch_chromium.org, haitaol+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionFix one case that can make sync backend stuck in auth error state.
In case like server outage/bug, sync backend can receive same invalid
token in a row and stay in auth error state without frontend knowing
it. This change makes backend report auth error again if this happens.
Frontend will request token again but also use exponetial backoff to
protect againest hammering token server too often.
BUG=311420
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=233938
Patch Set 1 #Patch Set 2 : #
Total comments: 13
Patch Set 3 : #Patch Set 4 : #Patch Set 5 : #
Total comments: 4
Patch Set 6 : #
Messages
Total messages: 10 (0 generated)
https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... File chrome/browser/sync/profile_sync_service.cc (right): https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1113: // invalid, there may be some issues with token server. In that case, we In fact, if the access token is repeatedly reported invalid there may be *no issue* with the token server. It's worth pointing out the subtlety that we need to make additional (exponentially backed-off) requests to GAIA to work around their token caching that causes us to get the same token if we ask twice in quick succession. We should also have a comment explaining that we take the somewhat unorthodox approach of employing a single backoff state and timer for failures talking to *either* GAIA or Sync. Therefore, we're more likely to reach the backoff ceiling more quickly than you would expect from looking at the BackoffPolicy, if (for example) we get an auth error from sync and then lose connectivity and fail to reach GAIA. We don't expect them to ping-pong (and crank the TimeUntilRelease up even faster) because we won't make a sync request if we hit an error until GAIA succeeds at sending a new token, and we won't request a new token (and risk failure of request) unless sync reports a token failure. (Correct me if I'm wrong). https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1127: // Reset backoff time after successful connection. This is where it gets dangerous and bug prone in my opinion to only have a single backoff tracker. If we're backed-off due to failures talking to GAIA, and sync reports a connection success (either legitimately or spuriously due to a bug in the client! A delayed / racy notification, or something.) we'll wind up bypassing the mechanism intended to protect GAIA communication. https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... File sync/engine/net/server_connection_manager.cc (right): https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:237: // This could happen in case like server outage/bug. Need to notify sync What is server outage/bug? Either explain the outage, reference the bug#, or both. https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:241: SetServerStatus(HttpResponse::SYNC_AUTH_ERROR); I guess the only case that is implicitly in the "else" clause of the above if is if we call SetAuthToken with an empty token; now we'll always send up a SYNC_AUTH_ERROR if we did that. I distinctly recall us changing this a while back to not always bubble up the error, although it may have been because we were trying to avoid UI updates. Are you sure that has no implications? Does your approach actually require tracking previously_invalidated_token? https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:264: server_status_ == server_status) { Pavel and Richard should review this part carefully, we've had several bugs in this area that they have diagnosed.
https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... File chrome/browser/sync/profile_sync_service.cc (right): https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1113: // invalid, there may be some issues with token server. In that case, we On 2013/11/06 19:35:32, timsteele wrote: > In fact, if the access token is repeatedly reported invalid there may be *no > issue* with the token server. It's worth pointing out the subtlety that we need > to make additional (exponentially backed-off) requests to GAIA to work around > their token caching that causes us to get the same token if we ask twice in > quick succession. I think the root cause is still on server. If server works fine, sync should succeed on first token. Added comment that this is due to inconsistency between authentication server and issue server. > > We should also have a comment explaining that we take the somewhat unorthodox > approach of employing a single backoff state and timer for failures talking to > *either* GAIA or Sync. Therefore, we're more likely to reach the backoff > ceiling more quickly than you would expect from looking at the BackoffPolicy, if > (for example) we get an auth error from sync and then lose connectivity and fail > to reach GAIA. We don't expect them to ping-pong (and crank the TimeUntilRelease > up even faster) because we won't make a sync request if we hit an error until > GAIA succeeds at sending a new token, and we won't request a new token (and risk > failure of request) unless sync reports a token failure. (Correct me if I'm > wrong). done https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1127: // Reset backoff time after successful connection. On 2013/11/06 19:35:32, timsteele wrote: > > This is where it gets dangerous and bug prone in my opinion to only have a > single backoff tracker. If we're backed-off due to failures talking to GAIA, > and sync reports a connection success (either legitimately or spuriously due to > a bug in the client! A delayed / racy notification, or something.) we'll wind up > bypassing the mechanism intended to protect GAIA communication. Good point. Changed to only reset if no request is scheduled. https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... File sync/engine/net/server_connection_manager.cc (right): https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:237: // This could happen in case like server outage/bug. Need to notify sync On 2013/11/06 19:35:32, timsteele wrote: > What is server outage/bug? Either explain the outage, reference the bug#, or > both. Done. https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:241: SetServerStatus(HttpResponse::SYNC_AUTH_ERROR); On 2013/11/06 19:35:32, timsteele wrote: > I guess the only case that is implicitly in the "else" clause of the above if is > if we call SetAuthToken with an empty token; now we'll always send up a > SYNC_AUTH_ERROR if we did that. I distinctly recall us changing this a while > back to not always bubble up the error, although it may have been because we > were trying to avoid UI updates. Are you sure that has no implications? Empty token shouldn't be possible because ProfileSyncService::GetCredentials() overwrites empty token with "credentials_lost". > > Does your approach actually require tracking previously_invalidated_token? Without previous token, it takes one trip to sync server to find out if token is valid or not. https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:264: server_status_ == server_status) { On 2013/11/06 19:35:32, timsteele wrote: > Pavel and Richard should review this part carefully, we've had several bugs in > this area that they have diagnosed. > acked
https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... File sync/engine/net/server_connection_manager.cc (right): https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... sync/engine/net/server_connection_manager.cc:264: server_status_ == server_status) { Another option is to leverage CONNECTION_NOT_ATTEMPTED added in 60703002. So SetAuthToken() always resets status to CONNECTION_NOT_ATTEMPTED at beginning, then exception can be removed here. On 2013/11/06 22:37:30, haitaol1 wrote: > On 2013/11/06 19:35:32, timsteele wrote: > > Pavel and Richard should review this part carefully, we've had several bugs in > > this area that they have diagnosed. > > > > acked
On 2013/11/07 15:23:52, haitaol1 wrote: > https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... > File sync/engine/net/server_connection_manager.cc (right): > > https://codereview.chromium.org/62423002/diff/20001/sync/engine/net/server_co... > sync/engine/net/server_connection_manager.cc:264: server_status_ == > server_status) { > Another option is to leverage CONNECTION_NOT_ATTEMPTED added in 60703002. So > SetAuthToken() always resets status to CONNECTION_NOT_ATTEMPTED at beginning, > then exception can be removed here. > > On 2013/11/06 22:37:30, haitaol1 wrote: > > On 2013/11/06 19:35:32, timsteele wrote: > > > Pavel and Richard should review this part carefully, we've had several bugs > in > > > this area that they have diagnosed. > > > > > > > acked lgtm
No LGTM from a valid reviewer yet. Only full committers are accepted. Even if an LGTM may have been provided, it was from a non-committer or a lowly provisional committer, _not_ a full super star committer. See http://www.chromium.org/getting-involved/become-a-committer Note that this has nothing to do with OWNERS files.
LGTM with nits. When you rebase over Raghu's recent CL you'll see a new IsRetryingAccessTokenFetchForTest() method on ProfileSyncService... make sure that still works as intended. I think it does, because even though we InformOfRequest on the first SYNC_AUTH_ERROR, you call R.A.T immediately vs starting the timer. https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... File chrome/browser/sync/profile_sync_service.cc (right): https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1113: // invalid, there may be some issues with token server. In that case, we On 2013/11/06 22:37:30, haitaol1 wrote: > On 2013/11/06 19:35:32, timsteele wrote: > > In fact, if the access token is repeatedly reported invalid there may be *no > > issue* with the token server. It's worth pointing out the subtlety that we > need > > to make additional (exponentially backed-off) requests to GAIA to work around > > their token caching that causes us to get the same token if we ask twice in > > quick succession. > > I think the root cause is still on server. If server works fine, sync should > succeed on first token. Added comment that this is due to inconsistency between > authentication server and issue server. > > > > > We should also have a comment explaining that we take the somewhat unorthodox > > approach of employing a single backoff state and timer for failures talking to > > *either* GAIA or Sync. Therefore, we're more likely to reach the backoff > > ceiling more quickly than you would expect from looking at the BackoffPolicy, > if > > (for example) we get an auth error from sync and then lose connectivity and > fail > > to reach GAIA. We don't expect them to ping-pong (and crank the > TimeUntilRelease > > up even faster) because we won't make a sync request if we hit an error until > > GAIA succeeds at sending a new token, and we won't request a new token (and > risk > > failure of request) unless sync reports a token failure. (Correct me if I'm > > wrong). > > done > But there could be *no issue* with the *token* server if the sync server is erroneously saying invalid token. https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... File sync/engine/net/server_connection_manager.cc (right): https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... sync/engine/net/server_connection_manager.cc:238: // first request is considered invalid by authentication server and because invalid by 'sync server' ? https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... sync/engine/net/server_connection_manager.cc:241: // otherwise backend will stay in SYNC_AUTH_ERROR state while fronend thinks nit - while frontend*
https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... File chrome/browser/sync/profile_sync_service.cc (right): https://codereview.chromium.org/62423002/diff/20001/chrome/browser/sync/profi... chrome/browser/sync/profile_sync_service.cc:1113: // invalid, there may be some issues with token server. In that case, we Removed token On 2013/11/07 22:07:13, timsteele wrote: > On 2013/11/06 22:37:30, haitaol1 wrote: > > On 2013/11/06 19:35:32, timsteele wrote: > > > In fact, if the access token is repeatedly reported invalid there may be *no > > > issue* with the token server. It's worth pointing out the subtlety that we > > need > > > to make additional (exponentially backed-off) requests to GAIA to work > around > > > their token caching that causes us to get the same token if we ask twice in > > > quick succession. > > > > I think the root cause is still on server. If server works fine, sync should > > succeed on first token. Added comment that this is due to inconsistency > between > > authentication server and issue server. > > > > > > > > We should also have a comment explaining that we take the somewhat > unorthodox > > > approach of employing a single backoff state and timer for failures talking > to > > > *either* GAIA or Sync. Therefore, we're more likely to reach the backoff > > > ceiling more quickly than you would expect from looking at the > BackoffPolicy, > > if > > > (for example) we get an auth error from sync and then lose connectivity and > > fail > > > to reach GAIA. We don't expect them to ping-pong (and crank the > > TimeUntilRelease > > > up even faster) because we won't make a sync request if we hit an error > until > > > GAIA succeeds at sending a new token, and we won't request a new token (and > > risk > > > failure of request) unless sync reports a token failure. (Correct me if I'm > > > wrong). > > > > done > > > > But there could be *no issue* with the *token* server if the sync server is > erroneously saying invalid token. https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... File sync/engine/net/server_connection_manager.cc (right): https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... sync/engine/net/server_connection_manager.cc:238: // first request is considered invalid by authentication server and because On 2013/11/07 22:07:13, timsteele wrote: > invalid by 'sync server' ? Done. https://codereview.chromium.org/62423002/diff/100002/sync/engine/net/server_c... sync/engine/net/server_connection_manager.cc:241: // otherwise backend will stay in SYNC_AUTH_ERROR state while fronend thinks On 2013/11/07 22:07:13, timsteele wrote: > nit - while frontend* Done.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/haitaol@chromium.org/62423002/240001
Message was sent while issue was closed.
Change committed as 233938 |