|
|
Chromium Code Reviews
Descriptionswarming: Add server-side implementation for supplemental bot_config
This enable subteams to implement hooks (get_dimensions, on_after_task,
etc) that are specific to their groups of bots via bots.cfg, without risking
interference with bots from other pools.
This is leveraged by fetching the files from luci-config. Only one file can be
specified per groups, this is not additive at the moment.
Migrate from rebooting the host to restarting the bot when a config change
occurs, which is much more efficient.
R=vadimsh@chromium.org
BUG=
Review-Url: https://codereview.chromium.org/2689483004
Committed: https://github.com/luci/luci-py/commit/22b12591c9b7ee36fa8f94570ffc40e547ed7257
Patch Set 1 : rebase #
Total comments: 7
Patch Set 2 : Removed check, it doesn't work #Patch Set 3 : Bot side fix #
Total comments: 1
Patch Set 4 : Redone to actually work #
Total comments: 2
Patch Set 5 : Rebase #Patch Set 6 : include fix and test case to test the fix #
Total comments: 2
Patch Set 7 : Rebased #
Messages
Total messages: 29 (13 generated)
Description was changed from ========== Add server-side implementation for supplemental bot_config. This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their pool of bots, without risking interference with bots from other pools. R=vadimsh@chromium.org BUG= ========== to ========== swarming: Add server-side implementation for supplemental bot_config. This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. R=vadimsh@chromium.org BUG= ==========
Description was changed from ========== swarming: Add server-side implementation for supplemental bot_config. This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. R=vadimsh@chromium.org BUG= ========== to ========== swarming: Add server-side implementation for supplemental bot_config This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. R=vadimsh@chromium.org BUG= ==========
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
I'll test it live on staging (not done yet) before committing.
vadimsh@chromium.org changed reviewers: + nodir@chromium.org
+nodir https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/hand... File appengine/swarming/handlers_bot.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/hand... appengine/swarming/handlers_bot.py:452: msg = 'Failed to find %s' % res.bot_group_cfg.bot_config_script these will most probably happen when adding completely new configs :( https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... File appengine/swarming/server/bot_groups_config.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... appengine/swarming/server/bot_groups_config.py:415: if not config.get_self_config(path)[1]: I think this will not generally work. Luci config has a notion of "good config set", a config at HEAD is promoted to "good config set" only after it passes all the validations. get_self_config returns configs from "good config set". When bot_config_script and the actual scrip are added in the same commit, the script will not be visible yet from the validation hook, because the commit (which is being validated right here) is not yet "good". Consider just checking the name and be done with it. I think there's also a way to tell luci-config to pass all "scripts/*.py" to Swarming for validation (to check their syntax). It will be independent validation rule.
https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... File appengine/swarming/server/bot_groups_config.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... appengine/swarming/server/bot_groups_config.py:415: if not config.get_self_config(path)[1]: On 2017/02/09 22:22:00, Vadim Sh. wrote: > I think this will not generally work. Luci config has a notion of "good config > set", a config at HEAD is promoted to "good config set" only after it passes all > the validations. get_self_config returns configs from "good config set". > > When bot_config_script and the actual scrip are added in the same commit, the > script will not be visible yet from the validation hook, because the commit > (which is being validated right here) is not yet "good". > > Consider just checking the name and be done with it. Nodir can confirm if what I do is wrong. It's not a big deal, I just wanted to help users that do a typo in their config file name but this won't be changed frequently so it is not a big deal. > I think there's also a way to tell luci-config to pass all "scripts/*.py" to > Swarming for validation (to check their syntax). It will be independent > validation rule. It was already implemented, it lives there: https://github.com/luci/luci-py/blob/master/appengine/swarming/server/bot_cod...
https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/hand... File appengine/swarming/handlers_bot.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/hand... appengine/swarming/handlers_bot.py:452: msg = 'Failed to find %s' % res.bot_group_cfg.bot_config_script On 2017/02/09 22:22:00, Vadim Sh. wrote: > these will most probably happen when adding completely new configs :( At least we'll get errors about these and the time to deploy the fix is only a few minutes. https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... File appengine/swarming/server/bot_groups_config.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... appengine/swarming/server/bot_groups_config.py:415: if not config.get_self_config(path)[1]: On 2017/02/09 22:22:00, Vadim Sh. wrote: > I think this will not generally work. I tested and confirmed it doesn't. Just removed this code and added a comment.
https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... File appengine/swarming/server/bot_groups_config.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... appengine/swarming/server/bot_groups_config.py:415: if not config.get_self_config(path)[1]: On 2017/02/10 18:33:26, M-A Ruel wrote: > On 2017/02/09 22:22:00, Vadim Sh. wrote: > > I think this will not generally work. > > I tested and confirmed it doesn't. Just removed this code and added a comment. Sorry for being late, Vadim's reasoning is correct. The essential problem seems to be that config validation is per file, cannot validate a set of config files
https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... File appengine/swarming/server/bot_groups_config.py (right): https://codereview.chromium.org/2689483004/diff/60001/appengine/swarming/serv... appengine/swarming/server/bot_groups_config.py:415: if not config.get_self_config(path)[1]: On 2017/02/10 18:41:43, nodir wrote: > On 2017/02/10 18:33:26, M-A Ruel wrote: > > On 2017/02/09 22:22:00, Vadim Sh. wrote: > > > I think this will not generally work. > > > > I tested and confirmed it doesn't. Just removed this code and added a comment. > > Sorry for being late, Vadim's reasoning is correct. The essential problem seems > to be that config validation is per file, cannot validate a set of config files No big deal, the latest patchset is confirmed to work. I had to do a small fix on the bot side due to unicode + coding header. <3 python.
https://codereview.chromium.org/2689483004/diff/100001/appengine/swarming/han... File appengine/swarming/handlers_bot.py (right): https://codereview.chromium.org/2689483004/diff/100001/appengine/swarming/han... appengine/swarming/handlers_bot.py:442: if res.bot_group_cfg.bot_config_script: I've realized there's no mechanism that triggers a reload of config script changes. The existing mechanism based on 'bot_group_cfg_version' will trigger a reload if _name_ of bot_config_script changes, not its content. And even if we extend to include bot_config_script body into bot_group_cfg_version hash, this mechanism currently reboots bots, which is too heavyweight for config changes (e.g. we don't want to reboot the entire Chrome pool when changing configs).
Description was changed from ========== swarming: Add server-side implementation for supplemental bot_config This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. R=vadimsh@chromium.org BUG= ========== to ========== swarming: Add server-side implementation for supplemental bot_config This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. Migrate from rebooting the host to restarting the bot when a config change occurs, which is much more efficient. R=vadimsh@chromium.org BUG= ==========
https://codereview.chromium.org/2689483004/diff/120001/appengine/swarming/pro... File appengine/swarming/proto/bots.proto (right): https://codereview.chromium.org/2689483004/diff/120001/appengine/swarming/pro... appengine/swarming/proto/bots.proto:121: optional string bot_config_script = 23; can you extract *.proto change (and perhaps validator) into a separate CL to fix https://bugs.chromium.org/p/chromium/issues/detail?id=558655#c34? It will also help in case we need to revert this CL.
https://codereview.chromium.org/2689483004/diff/120001/appengine/swarming/pro... File appengine/swarming/proto/bots.proto (right): https://codereview.chromium.org/2689483004/diff/120001/appengine/swarming/pro... appengine/swarming/proto/bots.proto:121: optional string bot_config_script = 23; On 2017/02/14 22:51:16, Vadim Sh. wrote: > can you extract *.proto change (and perhaps validator) into a separate CL to fix > https://bugs.chromium.org/p/chromium/issues/detail?id=558655#c34? > > It will also help in case we need to revert this CL. This was done in 80856c2cca3fb44796a66ed5bc50e3d16168033b. Rebased.
Started to test it live.
lgtm https://codereview.chromium.org/2689483004/diff/160001/appengine/swarming/han... File appengine/swarming/handlers_bot.py (right): https://codereview.chromium.org/2689483004/diff/160001/appengine/swarming/han... appengine/swarming/handlers_bot.py:530: bot_event('request_restart') should we use a different event name here?
https://codereview.chromium.org/2689483004/diff/160001/appengine/swarming/han... File appengine/swarming/handlers_bot.py (right): https://codereview.chromium.org/2689483004/diff/160001/appengine/swarming/han... appengine/swarming/handlers_bot.py:530: bot_event('request_restart') On 2017/02/16 18:31:16, Vadim Sh. wrote: > should we use a different event name here? I won't change it here as this would mean scope creeping this CL but I agree the ALLOWED_EVENTS set will have to be increased/changed.
The CQ bit was checked by maruel@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Failed to apply patch for appengine/swarming/server/bot_groups_config.py:
While running git apply --index -p1;
error: patch failed: appengine/swarming/server/bot_groups_config.py:71
error: appengine/swarming/server/bot_groups_config.py: patch does not apply
Patch: appengine/swarming/server/bot_groups_config.py
Index: appengine/swarming/server/bot_groups_config.py
diff --git a/appengine/swarming/server/bot_groups_config.py
b/appengine/swarming/server/bot_groups_config.py
index
962a36087159b45a6dc360a5e357410a9ba8def0..518044b56000b378b534452f0083c654ed052370
100644
--- a/appengine/swarming/server/bot_groups_config.py
+++ b/appengine/swarming/server/bot_groups_config.py
@@ -7,6 +7,7 @@
import collections
import hashlib
import logging
+import os
import re
from components import auth
@@ -47,6 +48,14 @@ BotGroupConfig = collections.namedtuple('BotGroupConfig', [
# dimension from that set, the list of value for it will be empty. Key and
# values are unicode strings.
'dimensions',
+
+ # Name of the supplemental bot_config.py to inject to the bot during
+ # handshake.
+ 'bot_config_script',
+
+ # Content of the supplemental bot_config.py to inject to the bot during
+ # handshake.
+ 'bot_config_script_content',
])
@@ -71,7 +80,9 @@ _DEFAULT_BOT_GROUPS = _BotGroups(
require_service_account=None,
ip_whitelist=auth.BOTS_IP_WHITELIST,
owners=(),
- dimensions={}))
+ dimensions={},
+ bot_config_script='',
+ bot_config_script_content=''))
def _gen_version(fields):
@@ -140,12 +151,26 @@ def _bot_group_proto_to_tuple(msg, trusted_dimensions):
auth_cfg = msg.auth or bots_pb2.BotAuth()
+ content = ''
+ if msg.bot_config_script:
+ rev, content = config.get_self_config(
+ 'scripts/' + msg.bot_config_script,
+ store_last_good=True)
+ if not rev or not content:
+ # The entry is invalid. It points to a non existing file. It could be
+ # because of a typo in the file name. An empty file is an invalid file,
+ # log an error to alert the admins.
+ logging.error(
+ 'Configuration referenced non existing bot_config file %r\n%s',
+ msg.bot_config_script, msg)
return _make_bot_group_config(
require_luci_machine_token=auth_cfg.require_luci_machine_token,
require_service_account=auth_cfg.require_service_account,
ip_whitelist=auth_cfg.ip_whitelist,
owners=tuple(msg.owners),
- dimensions={k: sorted(v) for k, v in dimensions.iteritems()})
+ dimensions={k: sorted(v) for k, v in dimensions.iteritems()},
+ bot_config_script=msg.bot_config_script or '',
+ bot_config_script_content=content or '')
def _expand_bot_id_expr(expr):
@@ -392,6 +417,19 @@ def validate_settings(cfg, ctx):
if not local_config.validate_flat_dimension(dim):
ctx.error('bad dimension %r', dim)
+ # Validate 'bot_config_script': the supplemental bot_config.py.
+ if entry.bot_config_script:
+ # Another check in bot_code.py confirms that the script itself is valid
+ # python.
+ if not entry.bot_config_script.endswith('.py'):
+ ctx.error('Invalid bot_config_script name: must end with .py')
+ if os.path.basename(entry.bot_config_script) !=
entry.bot_config_script:
+ ctx.error(
+ 'Invalid bot_config_script name: must not contain path entry')
+ # We can't validate that the file exists here. It'll fail in
+ # _bot_group_proto_to_tuple() which is called by _fetch_bot_groups()
and
+ # cached for 60 seconds.
+
# Now verify bot_id_prefix is never a prefix of other prefix. It causes
# ambiguities.
for smaller, s_idx in bot_id_prefixes.iteritems():
The CQ bit was checked by maruel@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from vadimsh@chromium.org Link to the patchset: https://codereview.chromium.org/2689483004/#ps180001 (title: "Rebased")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 180001, "attempt_start_ts": 1487272514971990,
"parent_rev": "d05a29e0442f8ef22d345ae8a2eb9d3b2d2ef90f", "commit_rev":
"22b12591c9b7ee36fa8f94570ffc40e547ed7257"}
Message was sent while issue was closed.
Description was changed from ========== swarming: Add server-side implementation for supplemental bot_config This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. Migrate from rebooting the host to restarting the bot when a config change occurs, which is much more efficient. R=vadimsh@chromium.org BUG= ========== to ========== swarming: Add server-side implementation for supplemental bot_config This enable subteams to implement hooks (get_dimensions, on_after_task, etc) that are specific to their groups of bots via bots.cfg, without risking interference with bots from other pools. This is leveraged by fetching the files from luci-config. Only one file can be specified per groups, this is not additive at the moment. Migrate from rebooting the host to restarting the bot when a config change occurs, which is much more efficient. R=vadimsh@chromium.org BUG= Review-Url: https://codereview.chromium.org/2689483004 Committed: https://github.com/luci/luci-py/commit/22b12591c9b7ee36fa8f94570ffc40e547ed7257 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:180001) as https://github.com/luci/luci-py/commit/22b12591c9b7ee36fa8f94570ffc40e547ed7257 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
