Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(467)

Issue 1853433002: LogDog: Handle archive failures. (Closed)

Created:
4 years, 8 months ago by dnj
Modified:
4 years, 8 months ago
Reviewers:
Vadim Sh., estaab, iannucci
CC:
chromium-reviews, infra-reviews+luci-go_chromium.org, andrew.wang, todd, tandrii+luci-go_chromium.org, M-A Ruel
Base URL:
https://github.com/luci/luci-go@logdog-gs-update
Target Ref:
refs/heads/master
Project:
luci-go
Visibility:
Public.

Description

LogDog: Handle archive failures. The LogDog Coordinator now tracks failed stream archival attempts. If a stream fails to archive, it will retry the archival task a specific number of times, then ultimately accept the stream as archived. Failed archival is only permitted when a log stream has had its completeness requirement dropped. This handles the case where a data corruption occurs. Currently, such a log stream would stay in archival limbo forever, having incomplete archival tasked but having the archival task fail each time. This allows the stream to fail if it truly cannot be archived. BUG=

Patch Set 1 #

Total comments: 8

Patch Set 2 : Added ArchiveState, moved to dual-state w/ intent to migrate. #

Patch Set 3 : Dual-state in Get endpoint. #

Total comments: 2

Patch Set 4 : Better comment. #

Patch Set 5 : Regenerate protobufs. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+986 lines, -786 lines) Patch
M appengine/logdog/coordinator/backend/archiveCron.go View 1 1 chunk +4 lines, -0 lines 0 comments Download
M appengine/logdog/coordinator/endpoints/logs/get_test.go View 1 2 1 chunk +1 line, -0 lines 0 comments Download
M appengine/logdog/coordinator/endpoints/logs/query_test.go View 1 2 1 chunk +1 line, -0 lines 0 comments Download
M appengine/logdog/coordinator/endpoints/services/archiveStream.go View 1 5 chunks +50 lines, -0 lines 0 comments Download
M appengine/logdog/coordinator/endpoints/services/archiveStream_test.go View 1 6 chunks +55 lines, -0 lines 0 comments Download
M appengine/logdog/coordinator/logStream.go View 1 5 chunks +49 lines, -4 lines 0 comments Download
M common/api/logdog_coordinator/services/v1/pb.discovery.go View 1 2 3 4 1 chunk +746 lines, -740 lines 0 comments Download
M common/api/logdog_coordinator/services/v1/service.proto View 1 chunk +3 lines, -0 lines 0 comments Download
M common/api/logdog_coordinator/services/v1/service.pb.go View 1 2 3 4 2 chunks +42 lines, -38 lines 0 comments Download
M server/internal/logdog/archivist/archivist.go View 1 2 3 4 chunks +18 lines, -2 lines 0 comments Download
M server/internal/logdog/archivist/archivist_test.go View 3 chunks +17 lines, -2 lines 0 comments Download

Messages

Total messages: 10 (3 generated)
dnj
PTAL
4 years, 8 months ago (2016-03-31 21:23:28 UTC) #2
dnj
+vadimsh@, thanks!
4 years, 8 months ago (2016-03-31 22:13:41 UTC) #4
Vadim Sh.
lgtm with questions https://codereview.chromium.org/1853433002/diff/1/appengine/logdog/coordinator/endpoints/services/archiveStream.go File appengine/logdog/coordinator/endpoints/services/archiveStream.go (right): https://codereview.chromium.org/1853433002/diff/1/appengine/logdog/coordinator/endpoints/services/archiveStream.go#newcode102 appengine/logdog/coordinator/endpoints/services/archiveStream.go:102: log.Fields{ maybe mark the entity with ...
4 years, 8 months ago (2016-03-31 22:29:59 UTC) #5
dnj
Updated, PTAL. I added a formal "ArchiveState" constant that can be queried against for streams ...
4 years, 8 months ago (2016-04-01 22:57:04 UTC) #6
Vadim Sh.
https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordinator/endpoints/services/archiveStream.go File appengine/logdog/coordinator/endpoints/services/archiveStream.go (right): https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordinator/endpoints/services/archiveStream.go#newcode126 appengine/logdog/coordinator/endpoints/services/archiveStream.go:126: case !req.Complete: shouldn't this two cases be swapped? If ...
4 years, 8 months ago (2016-04-01 23:04:01 UTC) #7
dnj
https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordinator/endpoints/services/archiveStream.go File appengine/logdog/coordinator/endpoints/services/archiveStream.go (right): https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordinator/endpoints/services/archiveStream.go#newcode126 appengine/logdog/coordinator/endpoints/services/archiveStream.go:126: case !req.Complete: On 2016/04/01 23:04:01, Vadim Sh. wrote: > ...
4 years, 8 months ago (2016-04-01 23:05:38 UTC) #8
Vadim Sh.
4 years, 8 months ago (2016-04-01 23:09:19 UTC) #9
lgtm

On 2016/04/01 23:05:38, dnj wrote:
>
https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordi...
> File appengine/logdog/coordinator/endpoints/services/archiveStream.go (right):
> 
>
https://codereview.chromium.org/1853433002/diff/40001/appengine/logdog/coordi...
> appengine/logdog/coordinator/endpoints/services/archiveStream.go:126: case
> !req.Complete:
> On 2016/04/01 23:04:01, Vadim Sh. wrote:
> > shouldn't this two cases be swapped?
> > 
> > If req.Error is true and req.Complete is false, we want to get ArchiveState
==
> > ArchivedPartially, since it's more severe condition compared to
> > ArchivedWithErrors (or not?).
> 
> ArchivedWithErrors is the most severe, as it means the archival straight up
> didn't work (e.g., data corruption). !Complete means that everything was
great,
> but some logs were missing.

Ok, I misread the enum comments then. I interpreted ArchivedWithErrors as "some
errors did happen, but we retried and succeeded, so everything is great".

Powered by Google App Engine
This is Rietveld 408576698