|
|
Chromium Code Reviews
DescriptionImplement device watchdog.
BUG=632895
Committed: https://chromium.googlesource.com/infra/infra/+/5800afbf62d889d3db5a261e45da7c1c956fba8e
Patch Set 1 : WIP #Patch Set 2 : More stuff #Patch Set 3 : Bump sleep duration. #Patch Set 4 : Added doc strings + license header #
Total comments: 24
Patch Set 5 : Comments #
Total comments: 29
Patch Set 6 : comments #Patch Set 7 : Commentsss #Patch Set 8 : commentsss #
Total comments: 6
Patch Set 9 : comments #Messages
Total messages: 18 (4 generated)
bpastene@chromium.org changed reviewers: + dnj@chromium.org
lgtm https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:38: func logcatInfo(msg string) { Consider making this accept a format string: func logcatInfo(f string, args ...interface{}) { msg = fmt.Sprintf(f, args...) } Then you can use more idiomatic log strings in your code: logcatInfo("weird number %d in %s", mynum, mything) https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:39: C.__android_log_write(C.ANDROID_LOG_INFO, logHeader, C.CString(msg)) You need to free the string that you create: cmsg := C.CString(msg) defer C.free(unsafe.Pointer(cmsg)) C.__android_log_write(...) (and below) https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:48: func daemonize() (int, error) { Consider using a formal package for daemonizing: https://github.com/sevlyar/go-daemon Unless you've already tried that and noted that it doesn't work so well on Android. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:52: return -1, errno nit: when returning an error code, you can assume the caller is going to check for error condition. In this case, return the zero value of other parameters, not "-1" (here and elsewhere). https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:62: os.Chdir("/") (Note this is normally not part of daemonizing, maybe do this at the beginning of your daemon code). https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:69: syscall.Dup2(int(fd), int(os.Stdin.Fd())) Because "os.Stdin" can change, I would just hardcode this to 0, 1, and 2 respectively. Maybe give them some const values: const ( STDIN_FD = 0 STDOUT_FD = 1 STDERR_FD = 2 ) https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:84: uptimeStr := strings.Fields(string(bytes))[0] I would assert that the field size is what you expect: fields := strings.Fields(...) if len(fields) == 0 { return 0, fmt.Errorf("failed to parse /proc/uptime") } https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:87: return -1, err nit: surround with context: return 0, fmt.Errorf("failed to parse uptime from %q: %s", uptimeStr, err) https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:89: return uptime, nil nit: You might want to make this a little more go idiomatic by converting the uptime to a "time.Duration": time.Duration(uptime * float64(time.Second)) This will make further manipulation and comparison a lot nicer. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:95: err := ioutil.WriteFile("/proc/sysrq-trigger", []byte("b"), 0644) I think rather than WriteFile (which is generally used when creating files), you should open the existing file and write to it: fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY) if err != nil { ... } defer fd.Close() _, err = fd.Write([]byte("b")) if err != nil { ... } https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:100: logcatError("I just rebooted. How am I still alive?!?\n") Is the reboot that immediate? Or could this string be printed in between the reboot request being received and the process terminating? https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:134: time.Sleep(60 * time.Second) Any reason not to just sleep the difference? e.g., time.Sleep(maxUptime - uptime)
not lgtm, misclick.
https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:38: func logcatInfo(msg string) { On 2016/08/15 19:43:29, dnj wrote: > Consider making this accept a format string: > > func logcatInfo(f string, args ...interface{}) { > msg = fmt.Sprintf(f, args...) > } > > Then you can use more idiomatic log strings in your code: > logcatInfo("weird number %d in %s", mynum, mything) Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:39: C.__android_log_write(C.ANDROID_LOG_INFO, logHeader, C.CString(msg)) On 2016/08/15 19:43:30, dnj wrote: > You need to free the string that you create: > > cmsg := C.CString(msg) > defer C.free(unsafe.Pointer(cmsg)) > C.__android_log_write(...) > > (and below) Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:48: func daemonize() (int, error) { On 2016/08/15 19:43:29, dnj wrote: > Consider using a formal package for daemonizing: > https://github.com/sevlyar/go-daemon > > Unless you've already tried that and noted that it doesn't work so well on > Android. Yeah, already tried that exact package. I was seeing "ExecPath not implemented for android" errors. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:52: return -1, errno On 2016/08/15 19:43:29, dnj wrote: > nit: when returning an error code, you can assume the caller is going to check > for error condition. In this case, return the zero value of other parameters, > not "-1" (here and elsewhere). Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:62: os.Chdir("/") On 2016/08/15 19:43:29, dnj wrote: > (Note this is normally not part of daemonizing, maybe do this at the beginning > of your daemon code). Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:69: syscall.Dup2(int(fd), int(os.Stdin.Fd())) On 2016/08/15 19:43:30, dnj wrote: > Because "os.Stdin" can change, I would just hardcode this to 0, 1, and 2 > respectively. Maybe give them some const values: > > const ( > STDIN_FD = 0 > STDOUT_FD = 1 > STDERR_FD = 2 > ) Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:84: uptimeStr := strings.Fields(string(bytes))[0] On 2016/08/15 19:43:29, dnj wrote: > I would assert that the field size is what you expect: > fields := strings.Fields(...) > if len(fields) == 0 { > return 0, fmt.Errorf("failed to parse /proc/uptime") > } Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:87: return -1, err On 2016/08/15 19:43:29, dnj wrote: > nit: surround with context: > > return 0, fmt.Errorf("failed to parse uptime from %q: %s", uptimeStr, err) Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:89: return uptime, nil On 2016/08/15 19:43:30, dnj wrote: > nit: You might want to make this a little more go idiomatic by converting the > uptime to a "time.Duration": > > time.Duration(uptime * float64(time.Second)) > > This will make further manipulation and comparison a lot nicer. Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:95: err := ioutil.WriteFile("/proc/sysrq-trigger", []byte("b"), 0644) On 2016/08/15 19:43:29, dnj wrote: > I think rather than WriteFile (which is generally used when creating files), you > should open the existing file and write to it: > > fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY) > if err != nil { ... } > defer fd.Close() > > _, err = fd.Write([]byte("b")) > if err != nil { ... } Done. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:100: logcatError("I just rebooted. How am I still alive?!?\n") On 2016/08/15 19:43:30, dnj wrote: > Is the reboot that immediate? Or could this string be printed in between the > reboot request being received and the process terminating? For all local testing, it's been instantaneous. https://codereview.chromium.org/2241963002/diff/60001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:134: time.Sleep(60 * time.Second) On 2016/08/15 19:43:30, dnj wrote: > Any reason not to just sleep the difference? e.g., > > time.Sleep(maxUptime - uptime) No reason currently; changed it to sleep the difference. (I might be interested in adding other types of checks other than uptime in the future.)
https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:74: os.Chdir("/") (I meant move this to realMain) https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:107: return 0, fmt.Errorf("unable to open /proc/uptime: %s", err.Error()) nit: two spaces https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:124: fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY, 0644) nit: get rid of 0644 since you're not creating a file. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:130: _, err = fd.Write([]byte("b")) nit, oneline: if _, err := fd.Write(...); err != nil { https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:133: os.Exit(1) WDYT about having this function actually return an error, and then dealing with the os.Exit part in realMain? In general I like to consolidate exit points in a single place if possible. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:139: func realMain() { Just a thought, but if you made "realMain" return an int, all those "os.Exit(#)" could become "return #", and you could os.Exit in main(). https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:141: maxUptimeFlag := flag.Int("max-uptime", 120, "Maximum uptime in minutes before a reboot is triggered.") If you wanted to, you could use a clockflag.Duration to actually express a full duration string: e.g., -max-uptime 1h,30m https://godoc.org/github.com/luci/luci-go/common/clock/clockflag https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:154: maxUptime := time.Duration(int64(*maxUptimeFlag) * int64(time.Minute)) Make this: maxUptime := time.Duration(*maxUptimeFlag) * time.Minute https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() Suggestion: break this into two loops: // Wait until we've hit our max uptime. for { ... if uptime > maxUptime { break } time.Sleep(maxUptime - uptime) } // Reboot the device. This should work immediately, so we will // keep trying until it succeeds. for { rebootDevice() logcatLog(logError, "THIS SHOUDL HAVE WORKED") time.Sleep(time.Second) } https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:168: maxSleep := math.Max(float64(maxUptime-uptime), float64(time.Second)) Confused: you know "maxUptime" is >= "uptime", so you should be able to get rid of this "math.Max" thing here and directly time.Sleep(maxUptime - uptime).
https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:74: os.Chdir("/") On 2016/08/15 23:02:14, dnj wrote: > (I meant move this to realMain) Done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:107: return 0, fmt.Errorf("unable to open /proc/uptime: %s", err.Error()) On 2016/08/15 23:02:14, dnj wrote: > nit: two spaces Done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:124: fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY, 0644) On 2016/08/15 23:02:14, dnj wrote: > nit: get rid of 0644 since you're not creating a file. Needs it: https://golang.org/pkg/os/#OpenFile https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:130: _, err = fd.Write([]byte("b")) On 2016/08/15 23:02:14, dnj wrote: > nit, oneline: > if _, err := fd.Write(...); err != nil { That makes it more readable? I'm not sure I agree :( https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:133: os.Exit(1) On 2016/08/15 23:02:14, dnj wrote: > WDYT about having this function actually return an error, and then dealing with > the os.Exit part in realMain? > > In general I like to consolidate exit points in a single place if possible. Done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:139: func realMain() { On 2016/08/15 23:02:14, dnj wrote: > Just a thought, but if you made "realMain" return an int, all those "os.Exit(#)" > could become "return #", and you could os.Exit in main(). Good idea; done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:141: maxUptimeFlag := flag.Int("max-uptime", 120, "Maximum uptime in minutes before a reboot is triggered.") On 2016/08/15 23:02:14, dnj wrote: > If you wanted to, you could use a clockflag.Duration to actually express a full > duration string: > > e.g., -max-uptime 1h,30m > > https://godoc.org/github.com/luci/luci-go/common/clock/clockflag I think that would add a lot more time.Duration-type-conversion logic than I can handle https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:154: maxUptime := time.Duration(int64(*maxUptimeFlag) * int64(time.Minute)) On 2016/08/15 23:02:14, dnj wrote: > Make this: > maxUptime := time.Duration(*maxUptimeFlag) * time.Minute Done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() On 2016/08/15 23:02:14, dnj wrote: > Suggestion: break this into two loops: > > // Wait until we've hit our max uptime. > for { > ... > if uptime > maxUptime { > break > } > time.Sleep(maxUptime - uptime) > } > > // Reboot the device. This should work immediately, so we will > // keep trying until it succeeds. > for { > rebootDevice() > logcatLog(logError, "THIS SHOUDL HAVE WORKED") > time.Sleep(time.Second) > } If that first reboot attempt doesn't work, why would the second? I'd rather it just try once and immediately fail loudly if it doesn't work. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:168: maxSleep := math.Max(float64(maxUptime-uptime), float64(time.Second)) On 2016/08/15 23:02:14, dnj wrote: > Confused: you know "maxUptime" is >= "uptime", so you should be able to get rid > of this "math.Max" thing here and directly time.Sleep(maxUptime - uptime). Thanks to obscure floating point precision issues, that makes time.Sleep() get called ~10 times in the last few seconds. Insuring it sleeps for at least one second avoids that.
lgtm w/ comments/nits https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:124: fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY, 0644) On 2016/08/15 23:39:28, bpastene wrote: > On 2016/08/15 23:02:14, dnj wrote: > > nit: get rid of 0644 since you're not creating a file. > > Needs it: https://golang.org/pkg/os/#OpenFile I was thinking pass 0 to be clear that you're not using the flag, but it doesn't really matter. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:130: _, err = fd.Write([]byte("b")) On 2016/08/15 23:39:28, bpastene wrote: > On 2016/08/15 23:02:14, dnj wrote: > > nit, oneline: > > if _, err := fd.Write(...); err != nil { > > That makes it more readable? I'm not sure I agree :( It's a coding preference we've exhibited in LUCI project, it doesn't really matter though. If you don't like it, you can leave as-is. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() On 2016/08/15 23:39:28, bpastene wrote: > On 2016/08/15 23:02:14, dnj wrote: > > Suggestion: break this into two loops: > > > > // Wait until we've hit our max uptime. > > for { > > ... > > if uptime > maxUptime { > > break > > } > > time.Sleep(maxUptime - uptime) > > } > > > > // Reboot the device. This should work immediately, so we will > > // keep trying until it succeeds. > > for { > > rebootDevice() > > logcatLog(logError, "THIS SHOUDL HAVE WORKED") > > time.Sleep(time.Second) > > } > > If that first reboot attempt doesn't work, why would the second? I'd rather it > just try once and immediately fail loudly if it doesn't work. My thought was that this is a terminal point, so you have three options: 1) Exit w/ error/panic 2) Try rebooting again. 3) Infinite loop. The advantages of 2 and 3 is that the application sticks around trying to do what it's supposed to do. With 1, it just silently disappears. If that's fine, then go for it, but I prefer stricter terminal boundaries. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:168: maxSleep := math.Max(float64(maxUptime-uptime), float64(time.Second)) On 2016/08/15 23:39:28, bpastene wrote: > On 2016/08/15 23:02:14, dnj wrote: > > Confused: you know "maxUptime" is >= "uptime", so you should be able to get > rid > > of this "math.Max" thing here and directly time.Sleep(maxUptime - uptime). > > Thanks to obscure floating point precision issues, that makes time.Sleep() get > called ~10 times in the last few seconds. Insuring it sleeps for at least one > second avoids that. That's weird. Maybe just have it sleep fro (maxUptime - uptime + time.Second) to push past that issue altogether? If you don't like that and want to continue with the current approach, I'd ditch 'math' and all the float64 conversions and just do it manually: maxSleep := maxUptime - uptime if maxSleep < time.Second { maxSleep = time.Second } That way you're sticking with duration-level stuff the entire time instead of converting around types.
https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:124: fd, err := os.OpenFile("/proc/sysrq-trigger", os.O_WRONLY, 0644) On 2016/08/15 23:50:03, dnj wrote: > On 2016/08/15 23:39:28, bpastene wrote: > > On 2016/08/15 23:02:14, dnj wrote: > > > nit: get rid of 0644 since you're not creating a file. > > > > Needs it: https://golang.org/pkg/os/#OpenFile > > I was thinking pass 0 to be clear that you're not using the flag, but it doesn't > really matter. Done. https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() On 2016/08/15 23:50:02, dnj wrote: > My thought was that this is a terminal point, so you have three options: > 1) Exit w/ error/panic > 2) Try rebooting again. > 3) Infinite loop. > > The advantages of 2 and 3 is that the application sticks around trying to do > what it's supposed to do. With 1, it just silently disappears. If that's fine, > then go for it, but I prefer stricter terminal boundaries. With 1), it doesn't silently disappear, its error + panic gets dumped to logcat. If it turns out that the rebooting can sometimes fail, or if the original error gets pushed out of logcat too quickly, I'll definitely add some retries around it. But I think I'd like to stick to the one-and-done approach for now https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:168: maxSleep := math.Max(float64(maxUptime-uptime), float64(time.Second)) On 2016/08/15 23:50:03, dnj wrote: > That's weird. Maybe just have it sleep fro (maxUptime - uptime + time.Second) to > push past that issue altogether? > > If you don't like that and want to continue with the current approach, I'd ditch > 'math' and all the float64 conversions and just do it manually: > > maxSleep := maxUptime - uptime > if maxSleep < time.Second { > maxSleep = time.Second > } > > That way you're sticking with duration-level stuff the entire time instead of > converting around types. Ahh, the + 1 second is good idea. I'll do that.
https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() On 2016/08/16 19:47:26, bpastene wrote: > On 2016/08/15 23:50:02, dnj wrote: > > My thought was that this is a terminal point, so you have three options: > > 1) Exit w/ error/panic > > 2) Try rebooting again. > > 3) Infinite loop. > > > > The advantages of 2 and 3 is that the application sticks around trying to do > > what it's supposed to do. With 1, it just silently disappears. If that's fine, > > then go for it, but I prefer stricter terminal boundaries. > > With 1), it doesn't silently disappear, its error + panic gets dumped to logcat. > > If it turns out that the rebooting can sometimes fail, or if the original error > gets pushed out of logcat too quickly, I'll definitely add some retries around > it. But I think I'd like to stick to the one-and-done approach for now Okay seems fair. Then I would recommend still breaking this into a serial flow: // Wait until we've hit our max uptime. for { uptime, err := getDeviceUptime() if err != nil { (log) // What should you do if this fails? Maybe reboot anyway, or try again?: break / continue } if uptime > maxUptime { break } time.Sleep(maxUptime - uptime + time.Second) } if err := rebootDevice(); err != nil { log("failed to reboot") return 1 } return 0
https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/80001/go/src/infra/tools/devi... go/src/infra/tools/device_watchdog/main.go:164: rebootDevice() On 2016/08/16 19:51:47, dnj wrote: > On 2016/08/16 19:47:26, bpastene wrote: > > On 2016/08/15 23:50:02, dnj wrote: > > > My thought was that this is a terminal point, so you have three options: > > > 1) Exit w/ error/panic > > > 2) Try rebooting again. > > > 3) Infinite loop. > > > > > > The advantages of 2 and 3 is that the application sticks around trying to do > > > what it's supposed to do. With 1, it just silently disappears. If that's > fine, > > > then go for it, but I prefer stricter terminal boundaries. > > > > With 1), it doesn't silently disappear, its error + panic gets dumped to > logcat. > > > > If it turns out that the rebooting can sometimes fail, or if the original > error > > gets pushed out of logcat too quickly, I'll definitely add some retries around > > it. But I think I'd like to stick to the one-and-done approach for now > > Okay seems fair. Then I would recommend still breaking this into a serial flow: > > // Wait until we've hit our max uptime. > for { > uptime, err := getDeviceUptime() > if err != nil { > (log) > // What should you do if this fails? Maybe reboot anyway, or try again?: > break / continue > } > > if uptime > maxUptime { > break > } > > time.Sleep(maxUptime - uptime + time.Second) > } > > if err := rebootDevice(); err != nil { > log("failed to reboot") > return 1 > } > return 0 Done.
lgtm w/ nits https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:157: logcatLog(logInfo, "Max uptime exceeded: (%.2f > %.2f)\n", float64(uptime)/float64(time.Minute), float64(maxUptime)/float64(time.Minute)) nit: render time as "%s" and pass the duration directly. The string rendering of a time.Duration looks pretty decent. e.g., https://play.golang.org/p/RfW_QrL3BC https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:159: } else { nit: You don't need this "else" clause, since you break above. https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:160: logcatLog(logInfo, "No need to reboot, uptime < max_uptime: (%.2f < %.2f)\n", float64(uptime)/float64(time.Minute), float64(maxUptime)/float64(time.Minute)) (same here)
https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... File go/src/infra/tools/device_watchdog/main.go (right): https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:157: logcatLog(logInfo, "Max uptime exceeded: (%.2f > %.2f)\n", float64(uptime)/float64(time.Minute), float64(maxUptime)/float64(time.Minute)) On 2016/08/17 00:42:36, dnj wrote: > nit: render time as "%s" and pass the duration directly. The string rendering of > a time.Duration looks pretty decent. > > e.g., https://play.golang.org/p/RfW_QrL3BC Ahhhh, that's so much better. Done https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:159: } else { On 2016/08/17 00:42:36, dnj wrote: > nit: You don't need this "else" clause, since you break above. Done. https://codereview.chromium.org/2241963002/diff/140001/go/src/infra/tools/dev... go/src/infra/tools/device_watchdog/main.go:160: logcatLog(logInfo, "No need to reboot, uptime < max_uptime: (%.2f < %.2f)\n", float64(uptime)/float64(time.Minute), float64(maxUptime)/float64(time.Minute)) On 2016/08/17 00:42:36, dnj wrote: > (same here) Done.
The CQ bit was checked by bpastene@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dnj@chromium.org Link to the patchset: https://codereview.chromium.org/2241963002/#ps160001 (title: "comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Implement device watchdog. BUG=632895 ========== to ========== Implement device watchdog. BUG=632895 Committed: https://chromium.googlesource.com/infra/infra/+/5800afbf62d889d3db5a261e45da7... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as https://chromium.googlesource.com/infra/infra/+/5800afbf62d889d3db5a261e45da7... |
