|
|
Chromium Code Reviews|
Created:
4 years ago by djd-OOO-Apr2017 Modified:
4 years ago CC:
chromium-reviews, infra-reviews+luci-go_chromium.org, maruel+w_chromium.org, mcgreevy, tandrii+luci-go_chromium.org Target Ref:
refs/heads/master Project:
luci-go Visibility:
Public. |
Descriptionisolate: give up and die on file unavailability
Don't try to gracefully die if a file is unavailable during the initial
stat / walk; it's much safer to just kill binary.
BUG=666047
Committed: https://github.com/luci/luci-go/commit/d1e2b7c4a43da90682972424bd0745fa0217a814
Patch Set 1 #
Total comments: 4
Messages
Total messages: 16 (5 generated)
djd@chromium.org changed reviewers: + tandrii@chromium.org, tansell@chromium.org
I think we should definitely fail fast here. The current code hides to many issues. https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... client/archiver/directory.go:72: return fmt.Errorf("walk(%q): %v", p, err) This will cause the binary to fail if it tries to walk a directory with a permission denied error... I think that is probably the right solution, but we should just double check?
https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... client/archiver/directory.go:72: return fmt.Errorf("walk(%q): %v", p, err) On 2016/11/29 at 03:43:44, mithro wrote: > This will cause the binary to fail if it tries to walk a directory with a permission denied error... > > I think that is probably the right solution, but we should just double check? Do you have a suggestion on how to check that? Also, it doesn't look like this change would affect the ultimate behaviour for this type of failure: such an error is already been reported from walk and ultimately/hopefully cancelling the upload.
The CQ bit was checked by tansell@chromium.org
lgtm LGTM.
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 1, "attempt_start_ts": 1480474309547090, "parent_rev":
"fbe3a910aa126bc2120e371250a88a24a2ce7b5d", "commit_rev":
"d1e2b7c4a43da90682972424bd0745fa0217a814"}
Message was sent while issue was closed.
Description was changed from ========== isolate: give up and die on file unavailability Don't try to gracefully die if a file is unavailable during the initial stat / walk; it's much safer to just kill binary. BUG=666047 ========== to ========== isolate: give up and die on file unavailability Don't try to gracefully die if a file is unavailable during the initial stat / walk; it's much safer to just kill binary. BUG=666047 Committed: https://github.com/luci/luci-go/commit/d1e2b7c4a43da90682972424bd0745fa0217a814 ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://github.com/luci/luci-go/commit/d1e2b7c4a43da90682972424bd0745fa0217a814
Message was sent while issue was closed.
Sorry, I forgot to look yesterday. Please consider https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5 - it'd be really useful for us to differentiate infra failure from bad patch failure. https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, err) 1. a test for this would be nice anyway - panic can be caught... 2. have you seen this https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be really awesome to exit with the right error code.
Message was sent while issue was closed.
maruel@chromium.org changed reviewers: + maruel@chromium.org
Message was sent while issue was closed.
https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go File client/archiver/directory.go (right): https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, err) On 2016/11/30 14:35:08, tandrii(chromium) wrote: > 1. a test for this would be nice anyway - panic can be caught... > 2. have you seen this > https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be really > awesome to exit with the right error code. I'm not enthusiastic about hard failing vs cleanly propagating the error up, but I'm not concerned enough to block the CL as propagating the error would require a fair amount of refactoring. log.Fatal() calls os.Exit(1) and doesn't panic. It could be worth calling log.Printf() + os.Exit(3) or something. (Sorry for not commenting about this earlier)
Message was sent while issue was closed.
On 2016/11/30 at 15:39:14, maruel wrote: > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go > File client/archiver/directory.go (right): > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... > client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, err) > On 2016/11/30 14:35:08, tandrii(chromium) wrote: > > 1. a test for this would be nice anyway - panic can be caught... > > 2. have you seen this > > https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be really > > awesome to exit with the right error code. > > I'm not enthusiastic about hard failing vs cleanly propagating the error up, but I'm not concerned enough to block the CL as propagating the error would require a fair amount of refactoring. > > log.Fatal() calls os.Exit(1) and doesn't panic. It could be worth calling log.Printf() + os.Exit(3) or something. > > (Sorry for not commenting about this earlier) Looking at that bug, it seems that you were advocating for an exit code of 2 (or I guess, anything != 1) for an infrastructure failure? AFAICT, these deaths are more likely to be caused by a malformed .isolate which points at missing files. So we should stick with the normal (1) exit code here? Also, from what I can see the code already tries to propagate the error up. However, because of complex layering above that it looks like that propagation sometimes leaves the binary in a weird state where it deadlocks. In simple cases today, the error already causes the binary to exit cleanly.
Message was sent while issue was closed.
On 2016/12/05 02:46:41, djd wrote: > On 2016/11/30 at 15:39:14, maruel wrote: > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go > > File client/archiver/directory.go (right): > > > > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... > > client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, > err) > > On 2016/11/30 14:35:08, tandrii(chromium) wrote: > > > 1. a test for this would be nice anyway - panic can be caught... > > > 2. have you seen this > > > https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be > really > > > awesome to exit with the right error code. > > > > I'm not enthusiastic about hard failing vs cleanly propagating the error up, > but I'm not concerned enough to block the CL as propagating the error would > require a fair amount of refactoring. > > > > log.Fatal() calls os.Exit(1) and doesn't panic. It could be worth calling > log.Printf() + os.Exit(3) or something. > > > > (Sorry for not commenting about this earlier) > > Looking at that bug, it seems that you were advocating for an exit code of 2 (or > I guess, anything != 1) for an infrastructure failure? > > AFAICT, these deaths are more likely to be caused by a malformed .isolate which > points at missing files. So we should stick with the normal (1) exit code here? Yes, Exit(1) is perfectly good here, you are right. Exit(2) should be elsewhere if we abort because of say network errors. > Also, from what I can see the code already tries to propagate the error up. > However, because of complex layering above that it looks like that propagation > sometimes leaves the binary in a weird state where it deadlocks. In simple cases > today, the error already causes the binary to exit cleanly. Yep, seemingly deadlocks with lots of goroutines, but Go runtime doesn't think it deadlocks.
Message was sent while issue was closed.
On 2016/12/05 10:42:38, tandrii(chromium) wrote: > On 2016/12/05 02:46:41, djd wrote: > > On 2016/11/30 at 15:39:14, maruel wrote: > > > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.go > > > File client/archiver/directory.go (right): > > > > > > > > > https://codereview.chromium.org/2535803004/diff/1/client/archiver/directory.g... > > > client/archiver/directory.go:102: log.Fatalf("Unable to walk %q: %v", root, > > err) > > > On 2016/11/30 14:35:08, tandrii(chromium) wrote: > > > > 1. a test for this would be nice anyway - panic can be caught... > > > > 2. have you seen this > > > > https://bugs.chromium.org/p/chromium/issues/detail?id=480558#c5? It'd be > > really > > > > awesome to exit with the right error code. > > > > > > I'm not enthusiastic about hard failing vs cleanly propagating the error up, > > but I'm not concerned enough to block the CL as propagating the error would > > require a fair amount of refactoring. > > > > > > log.Fatal() calls os.Exit(1) and doesn't panic. It could be worth calling > > log.Printf() + os.Exit(3) or something. > > > > > > (Sorry for not commenting about this earlier) > > > > Looking at that bug, it seems that you were advocating for an exit code of 2 > (or > > I guess, anything != 1) for an infrastructure failure? > > > > AFAICT, these deaths are more likely to be caused by a malformed .isolate > which > > points at missing files. So we should stick with the normal (1) exit code > here? > Yes, Exit(1) is perfectly good here, you are right. > Exit(2) should be elsewhere if we abort because of say network errors. > > > > Also, from what I can see the code already tries to propagate the error up. > > However, because of complex layering above that it looks like that propagation > > sometimes leaves the binary in a weird state where it deadlocks. In simple > cases > > today, the error already causes the binary to exit cleanly. > Yep, seemingly deadlocks with lots of goroutines, but Go runtime doesn't think > it deadlocks. Err you were right; I meant exit != 1 for infrastructure error, in this change here exit == 1 is the right course of action. I also agree that terminating safely with the current concurrency is tricky so I don't mind the current approach. Thanks |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
