Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(17)

Issue 2535803004: isolate: give up and die on file unavailability (Closed)

Created:
4 years ago by djd-OOO-Apr2017
Modified:
4 years ago
CC:
chromium-reviews, infra-reviews+luci-go_chromium.org, maruel+w_chromium.org, mcgreevy, tandrii+luci-go_chromium.org
Target Ref:
refs/heads/master
Project:
luci-go
Visibility:
Public.

Description

isolate: give up and die on file unavailability Don't try to gracefully die if a file is unavailable during the initial stat / walk; it's much safer to just kill binary. BUG=666047 Committed: https://github.com/luci/luci-go/commit/d1e2b7c4a43da90682972424bd0745fa0217a814

Patch Set 1 #

Total comments: 4
Unified diffs Side-by-side diffs Delta from patch set Stats (+6 lines, -26 lines) Patch
M client/archiver/directory.go View 2 chunks +3 lines, -2 lines 4 comments Download
M client/archiver/directory_test.go View 2 chunks +0 lines, -23 lines 0 comments Download
M client/isolate/isolate.go View 1 chunk +3 lines, -1 line 0 comments Download

Messages

Total messages: 16 (5 generated)
djd-OOO-Apr2017
4 years ago (2016-11-29 02:51:51 UTC) #2
mithro
I think we should definitely fail fast here. The current code hides to many issues. ...
4 years ago (2016-11-29 03:43:44 UTC) #3
djd-OOO-Apr2017
https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go#newcode72 client/archiver/directory.go:72: return fmt.Errorf("walk(%q): %v", p, err) On 2016/11/29 at 03:43:44, ...
4 years ago (2016-11-29 03:51:56 UTC) #4
mithro
lgtm LGTM.
4 years ago (2016-11-30 02:51:50 UTC) #6
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2535803004/1
4 years ago (2016-11-30 02:52:00 UTC) #7
commit-bot: I haz the power
Committed patchset #1 (id:1) as https://github.com/luci/luci-go/commit/d1e2b7c4a43da90682972424bd0745fa0217a814
4 years ago (2016-11-30 02:58:19 UTC) #10
tandrii(chromium)
Sorry, I forgot to look yesterday. Please consider https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5 - it'd be really useful for ...
4 years ago (2016-11-30 14:35:08 UTC) #11
M-A Ruel
https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go#newcode102 client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, err) On 2016/11/30 ...
4 years ago (2016-11-30 15:39:14 UTC) #13
djd-OOO-Apr2017
On 2016/11/30 at 15:39:14, maruel wrote: > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go > File client/archiver/directory.go (right): > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go#newcode102 ...
4 years ago (2016-12-05 02:46:41 UTC) #14
tandrii(chromium)
On 2016/12/05 02:46:41, djd wrote: > On 2016/11/30 at 15:39:14, maruel wrote: > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go ...
4 years ago (2016-12-05 10:42:38 UTC) #15
M-A Ruel
4 years ago (2016-12-05 16:24:27 UTC) #16
Message was sent while issue was closed.
On 2016/12/05 10:42:38, tandrii(chromium) wrote:
> On 2016/12/05 02:46:41, djd wrote:
> > On 2016/11/30 at 15:39:14, maruel wrote:
> > >
> https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go
> > > File client/archiver/directory.go (right):
> > > 
> > >
> >
>
https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g...
> > > client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v",
root,
> > err)
> > > On 2016/11/30 14:35:08, tandrii(chromium) wrote:
> > > > 1. a test for this would be nice anyway - panic can be caught...
> > > > 2. have you seen this
> > > > https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be
> > really
> > > > awesome to exit with the right error code.
> > > 
> > > I'm not enthusiastic about hard failing vs cleanly propagating the error
up,
> > but I'm not concerned enough to block the CL as propagating the error would
> > require a fair amount of refactoring.
> > > 
> > > log.Fatal() calls os.Exit(1) and doesn't panic. It could be worth calling
> > log.Printf() + os.Exit(3) or something.
> > > 
> > > (Sorry for not commenting about this earlier)
> > 
> > Looking at that bug, it seems that you were advocating for an exit code of 2
> (or
> > I guess, anything != 1) for an infrastructure failure?
> > 
> > AFAICT, these deaths are more likely to be caused by a malformed .isolate
> which
> > points at missing files. So we should stick with the normal (1) exit code
> here?
> Yes, Exit(1) is perfectly good here, you are right.
> Exit(2) should be elsewhere if we abort because of say network errors.
> 
> 
> > Also, from what I can see the code already tries to propagate the error up.
> > However, because of complex layering above that it looks like that
propagation
> > sometimes leaves the binary in a weird state where it deadlocks. In simple
> cases
> > today, the error already causes the binary to exit cleanly.
> Yep, seemingly deadlocks with lots of goroutines, but Go runtime doesn't think
> it deadlocks.

Err you were right; I meant exit != 1 for infrastructure error, in this change
here exit == 1 is the right course of action.

I also agree that terminating safely with the current concurrency is tricky so I
don't mind the current approach.

Thanks

Powered by Google App Engine
This is Rietveld 408576698