|
|
Chromium Code Reviews|
Created:
3 years, 5 months ago by Vadim Sh. Modified:
3 years, 4 months ago Reviewers:
tandrii(chromium) CC:
chromium-reviews, infra-reviews+luci-go_chromium.org, maruel+w_chromium.org, tandrii+luci-go_chromium.org Target Ref:
refs/heads/master Project:
luci-go Visibility:
Public. |
DescriptionAdd cron.Machine state machine.
It is subset of engine.StateMachine that deals exclusively with time. Unlike
engine.StateMachine, it doesn't try to keep track of how individual invocations
travel through task queue.
Will be eventually used to power cron jobs by periodically emitting trigger
events based on some schedule.
R=tandrii@chromium.org
BUG=
Review-Url: https://codereview.chromium.org/2980943002
Committed: https://github.com/luci/luci-go/commit/e295526f5ff8e4023422d895477f27974e27642f
Patch Set 1 : add-cron-machine-construct #
Total comments: 13
Patch Set 2 : nit #Patch Set 3 : add tests #Patch Set 4 : typo #
Total comments: 5
Patch Set 5 : typos #
Messages
Total messages: 14 (4 generated)
PTAL Will add tests before committing. This is what I called "CronMachine construct" in my email (please read my email...). It is extracted from existing engine.StateMachine class. The big change is that it no longer tries to keep track to what happens to StartInvocationAction. Instead it requires the host to call RewindIfNecessary at some appropriate time (== when running invocation finishes, it is handled by StateMachine.OnInvocationDone in current code). The main rationale is that in the new model there may be many concurrent invocations running, so we'll need to split the code that "watches" how invocation travels through task queues into some per-invocation process (which cron job is NOT, it is global thing). It is not a large amount of code, but it is sort of tricky, so I tried to heavily document it. I hope it all makes sense....
https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/demo/main.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:40: _extra datastore.PropertyMap `gae:"-,extra"` what's this for? https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:86: Delay: time.Second, // give the transaction time to land I don't get why 1s is necessary. Suppose Delay is 0, then are you implying that handleInvocation might fail to load up-to-date CronState entity? If so, then how does 1s solve this? TBH, until you wrote this comment, I was certain that the fact that we do {task.put() and entity.put()} in a transaction imply that handleInvocation is guaranteed to load the state at least as recent as the one we put *right now*. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/machine.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:207: if delay < 0 { nit: combine two lines? https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:236: if nextTickTime != schedule.DistantFuture { is this actually necessary for a cron job? AFAIR, schedule.New returns DistanctFuture only for triggered jobs. Now, you also wrote above that for Paused jobs, nextTickTime can be set to DistanceFuture too, but I don't see this happening (yet?).
https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/demo/main.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:40: _extra datastore.PropertyMap `gae:"-,extra"` On 2017/07/14 18:17:01, tandrii(chromium) wrote: > what's this for? This is not strictly necessary in this code... In general, without this line, if datastore entity has some other fields not specified in the struct, it will fail to load. This can happen if datastore model evolves with time (some fields get removed from source code but stay present in DS). 'extra' is a map that collect all "undefined" fields. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:86: Delay: time.Second, // give the transaction time to land On 2017/07/14 18:17:01, tandrii(chromium) wrote: > I don't get why 1s is necessary. > > Suppose Delay is 0, then are you implying that handleInvocation might fail to > load up-to-date CronState entity? If so, then how does 1s solve this? > > TBH, until you wrote this comment, I was certain that the fact that we do > {task.put() and entity.put()} in a transaction imply that handleInvocation is > guaranteed to load the state at least as recent as the one we put *right now*. This is opportunistic attempt to avoid hitting transaction congestion: a second transaction right after the first one has a higher chance to fail, since the first one is still "replicating". This is 1 QPS transaction limit I was talking about. Setting Delay to 0 is correct, everything will continue to work. I can remove it from this adhoc demo code, it's not essential. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/machine.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:207: if delay < 0 { On 2017/07/14 18:17:02, tandrii(chromium) wrote: > nit: combine two lines? Ack, will do. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:236: if nextTickTime != schedule.DistantFuture { On 2017/07/14 18:17:02, tandrii(chromium) wrote: > is this actually necessary for a cron job? Maybe not... I'm not sure yet. I'm keeping it so for to not deviate from existing code too much. AFAIR, schedule.New returns > DistanctFuture only for triggered > jobs. Now, you also wrote above that for Paused jobs, nextTickTime can be set to > DistanceFuture too, but I don't see this happening (yet?). Paused jobs are identical to triggered jobs: https://github.com/luci/luci-go/blob/master/scheduler/appengine/engine/engine... (I acknowledge this decision may be seen as both "clever" and "horrible"... But it unifies handling of pause/unpause and schedule changes in single code path).
https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/demo/main.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:40: _extra datastore.PropertyMap `gae:"-,extra"` On 2017/07/14 at 18:29:12, Vadim Sh. wrote: > On 2017/07/14 18:17:01, tandrii(chromium) wrote: > > what's this for? > > This is not strictly necessary in this code... > > In general, without this line, if datastore entity has some other fields not specified in the struct, it will fail to load. This can happen if datastore model evolves with time (some fields get removed from source code but stay present in DS). > > 'extra' is a map that collect all "undefined" fields. i guess so, but was confused since this is new field :) https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:86: Delay: time.Second, // give the transaction time to land On 2017/07/14 at 18:29:12, Vadim Sh. wrote: > On 2017/07/14 18:17:01, tandrii(chromium) wrote: > > I don't get why 1s is necessary. > > > > Suppose Delay is 0, then are you implying that handleInvocation might fail to > > load up-to-date CronState entity? If so, then how does 1s solve this? > > > > TBH, until you wrote this comment, I was certain that the fact that we do > > {task.put() and entity.put()} in a transaction imply that handleInvocation is > > guaranteed to load the state at least as recent as the one we put *right now*. > > This is opportunistic attempt to avoid hitting transaction congestion: a second transaction right after the first one has a higher chance to fail, since the first one is still "replicating". This is 1 QPS transaction limit I was talking about. > > Setting Delay to 0 is correct, everything will continue to work. I can remove it from this adhoc demo code, it's not essential. Got it. So, if Delay was 0 AND handleInvocation hits "replicating" error, then handleInvocation will be replied again by task queue, and things will work anyway. Am I right? And please keep it 1s. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/machine.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:236: if nextTickTime != schedule.DistantFuture { On 2017/07/14 at 18:29:12, Vadim Sh. wrote: > On 2017/07/14 18:17:02, tandrii(chromium) wrote: > > is this actually necessary for a cron job? > Maybe not... I'm not sure yet. I'm keeping it so for to not deviate from existing code too much. > > AFAIR, schedule.New returns > > DistanctFuture only for triggered > > jobs. Now, you also wrote above that for Paused jobs, nextTickTime can be set to > > DistanceFuture too, but I don't see this happening (yet?). > > Paused jobs are identical to triggered jobs: https://github.com/luci/luci-go/blob/master/scheduler/appengine/engine/engine... oops, that was my cache miss. Thanks for reminder. > (I acknowledge this decision may be seen as both "clever" and "horrible"... But it unifies handling of pause/unpause and schedule changes in single code path).
https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/demo/main.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/demo/main.go:86: Delay: time.Second, // give the transaction time to land On 2017/07/14 18:36:08, tandrii(chromium) wrote: > On 2017/07/14 at 18:29:12, Vadim Sh. wrote: > > On 2017/07/14 18:17:01, tandrii(chromium) wrote: > > > I don't get why 1s is necessary. > > > > > > Suppose Delay is 0, then are you implying that handleInvocation might fail > to > > > load up-to-date CronState entity? If so, then how does 1s solve this? > > > > > > TBH, until you wrote this comment, I was certain that the fact that we do > > > {task.put() and entity.put()} in a transaction imply that handleInvocation > is > > > guaranteed to load the state at least as recent as the one we put *right > now*. > > > > This is opportunistic attempt to avoid hitting transaction congestion: a > second transaction right after the first one has a higher chance to fail, since > the first one is still "replicating". This is 1 QPS transaction limit I was > talking about. > > > > Setting Delay to 0 is correct, everything will continue to work. I can remove > it from this adhoc demo code, it's not essential. > > Got it. So, if Delay was 0 AND handleInvocation hits "replicating" error, then > handleInvocation will be replied again by task queue, and things will work > anyway. Am I right? Yes. > > And please keep it 1s. https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... File scheduler/appengine/engine/cron/machine.go (right): https://codereview.chromium.org/2980943002/diff/1/scheduler/appengine/engine/... scheduler/appengine/engine/cron/machine.go:207: if delay < 0 { On 2017/07/14 18:29:12, Vadim Sh. wrote: > On 2017/07/14 18:17:02, tandrii(chromium) wrote: > > nit: combine two lines? > > Ack, will do. Done.
I want land this change now. Added tests. PTAL!
lgtm https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... File scheduler/appengine/engine/cron/machine_test.go (right): https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... scheduler/appengine/engine/cron/machine_test.go:136: // Some time later (when invocation has presumably finishes), rewind the finisheD https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... scheduler/appengine/engine/cron/machine_test.go:216: // The operation is idempotent. No new tick are scheduled when we try again. s/are/is https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... scheduler/appengine/engine/cron/machine_test.go:316: // Just to get 100% code coverage... :)
https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... File scheduler/appengine/engine/cron/machine_test.go (right): https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... scheduler/appengine/engine/cron/machine_test.go:136: // Some time later (when invocation has presumably finishes), rewind the On 2017/07/27 11:57:24, tandrii(chromium) wrote: > finisheD Done. https://codereview.chromium.org/2980943002/diff/60001/scheduler/appengine/eng... scheduler/appengine/engine/cron/machine_test.go:216: // The operation is idempotent. No new tick are scheduled when we try again. On 2017/07/27 11:57:24, tandrii(chromium) wrote: > s/are/is Done.
The CQ bit was checked by vadimsh@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from tandrii@chromium.org Link to the patchset: https://codereview.chromium.org/2980943002/#ps80001 (title: "typos")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 80001, "attempt_start_ts": 1501175428185560,
"parent_rev": "88d9e961390254aa5a732d56f83eef40e25db59e", "commit_rev":
"e295526f5ff8e4023422d895477f27974e27642f"}
Message was sent while issue was closed.
Description was changed from ========== Add cron.Machine state machine. It is subset of engine.StateMachine that deals exclusively with time. Unlike engine.StateMachine, it doesn't try to keep track of how individual invocations travel through task queue. Will be eventually used to power cron jobs by periodically emitting trigger events based on some schedule. R=tandrii@chromium.org BUG= ========== to ========== Add cron.Machine state machine. It is subset of engine.StateMachine that deals exclusively with time. Unlike engine.StateMachine, it doesn't try to keep track of how individual invocations travel through task queue. Will be eventually used to power cron jobs by periodically emitting trigger events based on some schedule. R=tandrii@chromium.org BUG= Review-Url: https://codereview.chromium.org/2980943002 Committed: https://github.com/luci/luci-go/commit/e295526f5ff8e4023422d895477f27974e27642f ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://github.com/luci/luci-go/commit/e295526f5ff8e4023422d895477f27974e27642f |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
