Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(23)

Issue 2386793002: Reimplement Machine Provider integration (Closed)

Created:
4 years, 2 months ago by smut
Modified:
4 years, 2 months ago
Reviewers:
Vadim Sh.
CC:
chromium-reviews, infra-reviews+luci-py_chromium.org
Target Ref:
refs/heads/master
Project:
luci-py
Visibility:
Public.

Description

Reimplement Machine Provider integration BUG=627560 Committed: https://github.com/luci/luci-py/commit/4ad6fd1e59b4095a6876974a93b30d1bec5c9649

Patch Set 1 #

Total comments: 25

Patch Set 2 #

Patch Set 3 : Fix handlers_test.py #

Patch Set 4 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+543 lines, -915 lines) Patch
M appengine/components/components/machine_provider/utils.py View 1 chunk +15 lines, -0 lines 0 comments Download
M appengine/swarming/cron.yaml View 2 chunks +6 lines, -10 lines 0 comments Download
M appengine/swarming/handlers_backend.py View 8 chunks +28 lines, -58 lines 0 comments Download
M appengine/swarming/handlers_test.py View 1 2 2 chunks +2 lines, -7 lines 0 comments Download
M appengine/swarming/queue.yaml View 1 chunk +1 line, -1 line 0 comments Download
M appengine/swarming/server/lease_management.py View 1 2 3 4 chunks +321 lines, -341 lines 0 comments Download
M appengine/swarming/server/lease_management_test.py View 1 chunk +170 lines, -498 lines 0 comments Download

Messages

Total messages: 23 (10 generated)
smut
Design doc: https://docs.google.com/document/d/1Bvsu4J0sMziUii3ELBHCLyLq7nfjIiGYDHs3O8BsEXU/edit Next steps: Create BotInfo as soon as hostname is known, ensure it ...
4 years, 2 months ago (2016-09-30 23:57:10 UTC) #2
Vadim Sh.
looks good, mostly nits https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py File appengine/swarming/server/lease_management.py (right): https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py#newcode140 appengine/swarming/server/lease_management.py:140: ndb.Future.wait_any(futures) this doesn't check status ...
4 years, 2 months ago (2016-10-03 20:26:32 UTC) #3
smut
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py File appengine/swarming/server/lease_management.py (right): https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py#newcode140 appengine/swarming/server/lease_management.py:140: ndb.Future.wait_any(futures) On 2016/10/03 20:26:31, Vadim Sh. wrote: > this ...
4 years, 2 months ago (2016-10-03 21:18:08 UTC) #5
Vadim Sh.
lgtm https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py File appengine/swarming/server/lease_management.py (right): https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py#newcode331 appengine/swarming/server/lease_management.py:331: duration=machine_lease.lease_duration_secs, On 2016/10/03 21:18:07, smut wrote: > On ...
4 years, 2 months ago (2016-10-03 21:34:18 UTC) #6
smut
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py File appengine/swarming/server/lease_management.py (right): https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py#newcode331 appengine/swarming/server/lease_management.py:331: duration=machine_lease.lease_duration_secs, On 2016/10/03 21:34:18, Vadim Sh. wrote: > On ...
4 years, 2 months ago (2016-10-03 21:41:50 UTC) #7
Vadim Sh.
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py File appengine/swarming/server/lease_management.py (right): https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/lease_management.py#newcode331 appengine/swarming/server/lease_management.py:331: duration=machine_lease.lease_duration_secs, On 2016/10/03 21:41:50, smut wrote: > On 2016/10/03 ...
4 years, 2 months ago (2016-10-03 21:46:43 UTC) #8
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2386793002/40001
4 years, 2 months ago (2016-10-03 23:03:06 UTC) #10
commit-bot: I haz the power
Try jobs failed on following builders: Luci-py Presubmit on luci.infra.try (JOB_FAILED, https://luci-milo.appspot.com/swarming/task/31a59dc084619410)
4 years, 2 months ago (2016-10-03 23:09:56 UTC) #12
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2386793002/60001
4 years, 2 months ago (2016-10-03 23:11:47 UTC) #15
commit-bot: I haz the power
Try jobs failed on following builders: Luci-py Presubmit on luci.infra.try (JOB_FAILED, https://luci-milo.appspot.com/swarming/task/31a5a2edd9244310)
4 years, 2 months ago (2016-10-03 23:19:17 UTC) #17
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2386793002/80001
4 years, 2 months ago (2016-10-03 23:25:18 UTC) #20
commit-bot: I haz the power
Committed patchset #4 (id:80001) as https://github.com/luci/luci-py/commit/4ad6fd1e59b4095a6876974a93b30d1bec5c9649
4 years, 2 months ago (2016-10-03 23:29:15 UTC) #22
smut
4 years, 2 months ago (2016-10-06 00:13:10 UTC) #23
Message was sent while issue was closed.
On 2016/10/03 21:34:18, Vadim Sh. wrote:
> lgtm
> 
>
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/l...
> File appengine/swarming/server/lease_management.py (right):
> 
>
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/l...
> appengine/swarming/server/lease_management.py:331:
> duration=machine_lease.lease_duration_secs,
> On 2016/10/03 21:18:07, smut wrote:
> > On 2016/10/03 20:26:32, Vadim Sh. wrote:
> > > I really really wish we can randomize this a bit to avoid periodic
> avalanches
> > of
> > > lease requests. E.g. make duration be something like  (0.9 + 0.2 *
random())
> *
> > > duration. It will make lease refreshes even out in time eventually
(changes
> to
> > > the config would still suck though, but they are relatively infrequent).
> > 
> > Your suggestion of varying with +/- 10% seems reasonable to me, but I think
> that
> > lease duration is part of deduplication logic in MP (it just hashes the
whole
> > received proto), so we actually need to retransmit the same
> lease_duration_secs
> > for the same client_request_id. Will do something in another CL.
> 
> Instead of random() you can use some hash-like function of client_request_id.
It
> will be effectively "as random", but deterministic.
> 
>
https://codereview.chromium.org/2386793002/diff/1/appengine/swarming/server/l...
> appengine/swarming/server/lease_management.py:388: # TODO(smut): Create
BotInfo,
> associate lease information.
> On 2016/10/03 21:18:07, smut wrote:
> > On 2016/10/03 20:26:32, Vadim Sh. wrote:
> > > yep, in the same transaction
> > 
> > xg in log_lease_fulfillment?
> 
> Yes. Assuming 'manage_lease' is idempotent and will successfully call
> 'log_lease_fulfillment' again with exact same parameters if it fails the first
> time. (I think usage of 'client_request_id' ensures that?).
> 
> xg transactions are not big deal if they are infrequent, like in this case

Just realized it's hard to do in an xg because of the way BotInfos are normally
created. What happens everywhere else is we call bot_management.bot_event() to
log a bot-related event. This calls datastore_utils.store_new_version(), which
asserts that not ndb.in_transaction() before storing the BotRoot, BotInfo, and
BotEvent together in a transaction. We could modify it to have a code path which
skips the assertion and then stores MachineLease, BotRoot, BotInfo, and BotEvent
in a cross-group. But then we need to write code to circumvent the whole
bot_event() route which is normally the only way to create a BotInfo.

How about we don't do an xg on fulfillment and instead do a check for each
fulfilled machine in lease_management.manage_lease(), like:

if machine_lease.hostname and machine_lease.lease_expiration_ts > now:
  # Request is fulfilled and unexpired.
  if BotInfo doesn't exist or exists but doesn't have MP-related fields:
    bot_event('lease_fulfilled', machine_lease.hostname,
machine_lease.lease_expiration_ts, ...)

bot_event() will take care of logging as a BotEvent that a lease was fulfilled
and an MP bot is now expected, as well as creating the BotRoot and BotInfo or
adding the MP-related info if the BotRoot/BotInfo are already created.

I don't think there's any harm in not doing an xg. It's not really a weird state
if MachineLease has hostname and lease_expiration_ts but BotInfo hasn't been
created yet.

What do you think?

Powered by Google App Engine
This is Rietveld 408576698