|
|
Chromium Code Reviews|
Created:
5 years, 7 months ago by ghost stip (do not use) Modified:
5 years, 7 months ago CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/infra/infra.git@master Target Ref:
refs/heads/master Project:
infra Visibility:
Public. |
DescriptionAdd master_manager_launch script which launches master_manager scripts for each master on a host.
BUG=479059
Committed: https://chromium.googlesource.com/infra/infra/+/51f4a0dc0f645c4ade146178112d8b504cc5f7b9
Patch Set 1 #Patch Set 2 : Add coverage. #
Total comments: 38
Patch Set 3 : Semi-address comments. #Patch Set 4 : Fix typo. #Patch Set 5 : Some more cleanup. #Patch Set 6 : Move to infra/services. #
Total comments: 16
Patch Set 7 : Address agable's comments. #Messages
Total messages: 15 (2 generated)
stip@chromium.org changed reviewers: + agable@chromium.org, iannucci@chromium.org
ptal. this version doesn't have multiprocessing or gitiles support yet, those will be added in subsequent CLs
https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/__main__.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:1: #!/usr/bin/python infra/tools is meant to hold things for people to run one-off. infra/services is meant to hold things that will be invoked in an automated fashion. Since I'm pretty sure the whole point of this is be run automatically, shouldn't this live in services/? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:37: default=os.path.join(SCRIPT_DIR, 'desired_master_state.json'), Seems like an odd default. Why would the json be in this directory? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:57: def synthesize_master_manager_cli(master_dict, hostname, prod=False): It's not a CLI, it's a command. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:77: def log_triggered_ignored(triggered, ignored, hostname, logger): When/how is this covered? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:99: logger = logging.getLogger(__name__) This shouldn't be necessary for the top-level __main__ script, only for libraries. __main__ can just set up the logging argparse args, and then use 'logging.foo('a string')' iirc. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:101: if desired_state is None: Why is not having a desired state an error? The log_triggered_ignored function above just thinks the same error (afaict) is only worthy of an info-level log. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:112: for m in triggered] nit: outdent closing brace on next line (like you did for synthesize_master_command) https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:119: subprocess.call(command) Yeah we'll definitely need parallelism here. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:21: def load_desired_state_file(filename): # pragma: no cover Feels odd. Not finding the file is exception-worthy. Finding the file and having it totally empty is fine. Finding the file and having it invalid json results in None, which must then be handled by the caller. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:42: sorted_states = sorted( Why does it matter if they're sorted in the file when they have timestamps anyway? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): Why? Why can't I create the file containing just a single future state that I'd like it to end up in? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:95: Returns two lists: triggered_masters and ignored_masters. Any reason they're sets instead of lists? Ordering isn't important here afaict. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:102: ignored_masters is merely a list of 'dirname' strings (ex: master.chromium). Docstring should explain why they're ignored
some more minor bits https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/libs/build... File infra/libs/buildbot/master.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/libs/build... infra/libs/buildbot/master.py:110: return json.loads(subprocess.check_output( if this gets called a bunch in the same process, it may be worth caching it. https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/libs/build... infra/libs/buildbot/master.py:123: build_dir = os.path.join(directory, os.pardir, os.pardir, os.pardir, 'build') would much rather use dirname than pardir (assuming 'directory' is an abspath already) https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/__main__.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/__main__.py:99: logger = logging.getLogger(__name__) On 2015/05/06 23:29:30, agable wrote: > This shouldn't be necessary for the top-level __main__ script, only for > libraries. __main__ can just set up the logging argparse args, and then use > 'logging.foo('a string')' iirc. Additionally, logger is actually one of the (very few) things which should be a module-level global. https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/06 23:29:30, agable wrote: > Why? Why can't I create the file containing just a single future state that I'd > like it to end up in? Because then the script doesn't know what state it's supposed to be in in the mean time. timestamp / state 0 / offline 1 / online is different than 0 / drained 1 / online If you just gave it `1 / online`, it wouldn't be able to take any action until '1', because it doesn't know what the pre-1 state should be.
https://codereview.chromium.org/1128783003/diff/20001/infra/libs/buildbot/mas... File infra/libs/buildbot/master.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/libs/buildbot/mas... infra/libs/buildbot/master.py:110: return json.loads(subprocess.check_output( On 2015/05/06 23:48:05, iannucci wrote: > if this gets called a bunch in the same process, it may be worth caching it. only twice. I'd like to introduce a @memoize decorator at some point https://codereview.chromium.org/1128783003/diff/20001/infra/libs/buildbot/mas... infra/libs/buildbot/master.py:123: build_dir = os.path.join(directory, os.pardir, os.pardir, os.pardir, 'build') On 2015/05/06 23:48:05, iannucci wrote: > would much rather use dirname than pardir (assuming 'directory' is an abspath > already) Done. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/__main__.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:77: def log_triggered_ignored(triggered, ignored, hostname, logger): On 2015/05/06 23:29:30, agable wrote: > When/how is this covered? actually, the whole file doesn't have a test (since I moved most of the logic to desired_state_parser.py, which *is* tested. I've moved the pragma: no cover to show humans this fact. do we have a better way to signify 'whole file is untested'? https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:99: logger = logging.getLogger(__name__) On 2015/05/06 23:29:30, agable wrote: > This shouldn't be necessary for the top-level __main__ script, only for > libraries. __main__ can just set up the logging argparse args, and then use > 'logging.foo('a string')' iirc. Done. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:101: if desired_state is None: On 2015/05/06 23:29:30, agable wrote: > Why is not having a desired state an error? The log_triggered_ignored function > above just thinks the same error (afaict) is only worthy of an info-level log. there's a difference between 'I've haven't written anything about this master, ignore it' versus 'you wrote a configuration file that is invalid.' load_desired_state_file() now raises an exception on a bad config, so these two lines go away https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:119: subprocess.call(command) On 2015/05/06 23:29:30, agable wrote: > Yeah we'll definitely need parallelism here. Acknowledged. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:21: def load_desired_state_file(filename): # pragma: no cover On 2015/05/06 23:29:30, agable wrote: > Feels odd. Not finding the file is exception-worthy. Finding the file and having > it totally empty is fine. Finding the file and having it invalid json results in > None, which must then be handled by the caller. Done. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:42: sorted_states = sorted( On 2015/05/06 23:29:30, agable wrote: > Why does it matter if they're sorted in the file when they have timestamps > anyway? Because a human editing / reading the file would be *very* confused if a later state was added in the middle of the file somewhere. The parser is very strict here to strongly discourage people from doing that. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/06 23:48:05, iannucci wrote: > On 2015/05/06 23:29:30, agable wrote: > > Why? Why can't I create the file containing just a single future state that > I'd > > like it to end up in? > > Because then the script doesn't know what state it's supposed to be in in the > mean time. > > timestamp / state > 0 / offline > 1 / online > > is different than > 0 / drained > 1 / online > > If you just gave it `1 / online`, it wouldn't be able to take any action until > '1', because it doesn't know what the pre-1 state should be. Right, this is a discussion Robbie and I had in mountain view last tuesday. Basically, leave master_manager as a very simple script which only has a single, fully defined goal. It raises an exception if you give it an action to perform in the future (as that leaves its current goal underdefined). master_manager_launcher determines what goal to give master_manager based on a list of goal states over time. this has a hidden benefit: we can now have a script which generates CLs to update a rolling 'restart schedule' for each master, similar to the sheriff rotation system. a human can even edit the file to postpone or remove actions in the future. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:95: Returns two lists: triggered_masters and ignored_masters. On 2015/05/06 23:29:30, agable wrote: > Any reason they're sets instead of lists? Ordering isn't important here afaict. TypeError: unhashable type: 'dict' ignored_masters has been made a set (since it's just strings) https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:102: ignored_masters is merely a list of 'dirname' strings (ex: master.chromium). On 2015/05/06 23:29:30, agable wrote: > Docstring should explain why they're ignored Done.
ok, addressed the comments. ptal! to make things a bit easier to follow, the first few patchsets address your comments while the last one moves the code to a new location (infra/services) https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/__main__.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:1: #!/usr/bin/python On 2015/05/06 23:29:30, agable wrote: > infra/tools is meant to hold things for people to run one-off. infra/services is > meant to hold things that will be invoked in an automated fashion. Since I'm > pretty sure the whole point of this is be run automatically, shouldn't this live > in services/? Done. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:37: default=os.path.join(SCRIPT_DIR, 'desired_master_state.json'), On 2015/05/06 23:29:30, agable wrote: > Seems like an odd default. Why would the json be in this directory? made it just the current dir. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:57: def synthesize_master_manager_cli(master_dict, hostname, prod=False): On 2015/05/06 23:29:30, agable wrote: > It's not a CLI, it's a command. Done. https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/__main__.py:112: for m in triggered] On 2015/05/06 23:29:30, agable wrote: > nit: outdent closing brace on next line (like you did for > synthesize_master_command) Done.
https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/__main__.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/__main__.py:77: def log_triggered_ignored(triggered, ignored, hostname, logger): On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:29:30, agable wrote: > > When/how is this covered? > > actually, the whole file doesn't have a test (since I moved most of the logic to desired_state_parser.py, which *is* tested. > > I've moved the pragma: no cover to show humans this fact. do we have a better way to signify 'whole file is untested'? Nope, there's not a better way to show "the whole file is uncovered". Unfortnately, putting "#pragma: no cover" on the first line doesn't apply to the entire file :( https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/desired_state_parser.py:42: sorted_states = sorted( On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:29:30, agable wrote: > > Why does it matter if they're sorted in the file when they have timestamps > > anyway? > > Because a human editing / reading the file would be *very* confused if a later state was added in the middle of the file somewhere. The parser is very strict here to strongly discourage people from doing that. Fair enough, that makes sense. https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:48:05, iannucci wrote: > > On 2015/05/06 23:29:30, agable wrote: > > > Why? Why can't I create the file containing just a single future state that > > I'd > > > like it to end up in? > > > > Because then the script doesn't know what state it's supposed to be in in the > > mean time. > > > > timestamp / state > > 0 / offline > > 1 / online > > > > is different than > > 0 / drained > > 1 / online > > > > If you just gave it `1 / online`, it wouldn't be able to take any action until > > '1', because it doesn't know what the pre-1 state should be. > > Right, this is a discussion Robbie and I had in mountain view last tuesday. Basically, leave master_manager as a very simple script which only has a single, fully defined goal. It raises an exception if you give it an action to perform in the future (as that leaves its current goal underdefined). master_manager_launcher determines what goal to give master_manager based on a list of goal states over time. > > this has a hidden benefit: we can now have a script which generates CLs to update a rolling 'restart schedule' for each master, similar to the sheriff rotation system. a human can even edit the file to postpone or remove actions in the future. I don't see how my proposal goes against your discussion with Robbie. The script is still simple, and has a single fully defined goal: "Oh, the only timestamp I care about is in the future, so I do nothing." I think it is ugly for a person who wants to force a master to do something to have to put *two* entries in a file, one of which contains a meaningless timestamp arbitrarily far (or recent) in the past. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/__main__.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/__main__.py:7: # pragma: no cover I don't think this actually works. Does it? I swear I've tried this and had it not actually do what I want in the past. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/__main__.py:117: logging.info('running %s' % command) This log line will be prettier (and copy-pasteable) if you ' '.join(command). Otherwise it's gonna print the full list of tokens, which is just annoying. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/desired_state_parser.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:25: def load_desired_state_file(filename): # pragma: no cover Seems like this could be tested with a couple valid/invalid json files checked into test/data/*.json https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:54: # Verify desired_state and timestamp are valid. Seems like the first "check for presence" loop could be inlined in this loop to save slightly on the constant factors. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:91: if index: bisect_left is guaranteed to return an int, so please test for inequality against 0 for readability. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:99: Returns triggered_masters and ignored_masters (a set and a list). (a list and a set, respectively). https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/test/desired_state_parser_test.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/test/desired_state_parser_test.py:116: nit: two blank lines
https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/__main__.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/__main__.py:77: def log_triggered_ignored(triggered, ignored, hostname, logger): On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:29:30, agable wrote: > > When/how is this covered? > > actually, the whole file doesn't have a test (since I moved most of the logic to desired_state_parser.py, which *is* tested. > > I've moved the pragma: no cover to show humans this fact. do we have a better way to signify 'whole file is untested'? Nope, there's not a better way to show "the whole file is uncovered". Unfortnately, putting "#pragma: no cover" on the first line doesn't apply to the entire file :( https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/desired_state_parser.py:42: sorted_states = sorted( On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:29:30, agable wrote: > > Why does it matter if they're sorted in the file when they have timestamps > > anyway? > > Because a human editing / reading the file would be *very* confused if a later state was added in the middle of the file somewhere. The parser is very strict here to strongly discourage people from doing that. Fair enough, that makes sense. https://chromiumcodereview.appspot.com/1128783003/diff/20001/infra/tools/mast... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/07 at 06:50:57, stip wrote: > On 2015/05/06 23:48:05, iannucci wrote: > > On 2015/05/06 23:29:30, agable wrote: > > > Why? Why can't I create the file containing just a single future state that > > I'd > > > like it to end up in? > > > > Because then the script doesn't know what state it's supposed to be in in the > > mean time. > > > > timestamp / state > > 0 / offline > > 1 / online > > > > is different than > > 0 / drained > > 1 / online > > > > If you just gave it `1 / online`, it wouldn't be able to take any action until > > '1', because it doesn't know what the pre-1 state should be. > > Right, this is a discussion Robbie and I had in mountain view last tuesday. Basically, leave master_manager as a very simple script which only has a single, fully defined goal. It raises an exception if you give it an action to perform in the future (as that leaves its current goal underdefined). master_manager_launcher determines what goal to give master_manager based on a list of goal states over time. > > this has a hidden benefit: we can now have a script which generates CLs to update a rolling 'restart schedule' for each master, similar to the sheriff rotation system. a human can even edit the file to postpone or remove actions in the future. I don't see how my proposal goes against your discussion with Robbie. The script is still simple, and has a single fully defined goal: "Oh, the only timestamp I care about is in the future, so I do nothing." I think it is ugly for a person who wants to force a master to do something to have to put *two* entries in a file, one of which contains a meaningless timestamp arbitrarily far (or recent) in the past. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/__main__.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/__main__.py:7: # pragma: no cover I don't think this actually works. Does it? I swear I've tried this and had it not actually do what I want in the past. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/__main__.py:117: logging.info('running %s' % command) This log line will be prettier (and copy-pasteable) if you ' '.join(command). Otherwise it's gonna print the full list of tokens, which is just annoying. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/desired_state_parser.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:25: def load_desired_state_file(filename): # pragma: no cover Seems like this could be tested with a couple valid/invalid json files checked into test/data/*.json https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:54: # Verify desired_state and timestamp are valid. Seems like the first "check for presence" loop could be inlined in this loop to save slightly on the constant factors. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:91: if index: bisect_left is guaranteed to return an int, so please test for inequality against 0 for readability. https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/desired_state_parser.py:99: Returns triggered_masters and ignored_masters (a set and a list). (a list and a set, respectively). https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... File infra/services/master_manager_launcher/test/desired_state_parser_test.py (right): https://chromiumcodereview.appspot.com/1128783003/diff/100001/infra/services/... infra/services/master_manager_launcher/test/desired_state_parser_test.py:116: nit: two blank lines
ptal with: 1 rebuttal (past states) 1 half-rebuttal (printing command lines raw vs. space-joined) 1 ambiguity (pragma no:cover) thanks a ton! https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/07 17:59:31, agable wrote: > On 2015/05/07 at 06:50:57, stip wrote: > > On 2015/05/06 23:48:05, iannucci wrote: > > > On 2015/05/06 23:29:30, agable wrote: > > > > Why? Why can't I create the file containing just a single future state > that > > > I'd > > > > like it to end up in? > > > > > > Because then the script doesn't know what state it's supposed to be in in > the > > > mean time. > > > > > > timestamp / state > > > 0 / offline > > > 1 / online > > > > > > is different than > > > 0 / drained > > > 1 / online > > > > > > If you just gave it `1 / online`, it wouldn't be able to take any action > until > > > '1', because it doesn't know what the pre-1 state should be. > > > > Right, this is a discussion Robbie and I had in mountain view last tuesday. > Basically, leave master_manager as a very simple script which only has a single, > fully defined goal. It raises an exception if you give it an action to perform > in the future (as that leaves its current goal underdefined). > master_manager_launcher determines what goal to give master_manager based on a > list of goal states over time. > > > > this has a hidden benefit: we can now have a script which generates CLs to > update a rolling 'restart schedule' for each master, similar to the sheriff > rotation system. a human can even edit the file to postpone or remove actions in > the future. > > I don't see how my proposal goes against your discussion with Robbie. The script > is still simple, and has a single fully defined goal: "Oh, the only timestamp I > care about is in the future, so I do nothing." I think it is ugly for a person > who wants to force a master to do something to have to put *two* entries in a > file, one of which contains a meaningless timestamp arbitrarily far (or recent) > in the past. There is only one case where a person who wants to restart a master *in the future* has to put two entries. As you'll see, this makes sense: Case 1: The master already has entries in the desired_master_state file and the person wants to do something in the future. The person only adds one entry, in the future, because the present case is already taken care of by previous entries. Case 2: The master is not in the desired_master_state file (perhaps it is new) and wants to do an action now. The person only adds one entry with a timestamp of 'slightly in the past', and the action is immediately performed. Case 3: The master is not in the desired_master_state file (perhaps it is new) and wants to do an action in the future. This is particularly rare, as it means that someone has made a new master and wants to do something to it later. The person adds two entries: the 'meaningless timestamp' describing the current state, and the action in the future. The reason this makes sense is that it obviates all kinds of special-case logic for when the present is undefined. What if someone says 'offline' in the future, the master is running right now, and you kill it? Do we keep the script running and *then* bring it offline in the future? Do we keep it offline because "that's what they want later anyway?" What if someone says 'drained' in the future, and it's offline now. Do we start it just to keep it drained, or keep it offline? All these kinds of weird edge cases go away if you just *tell* it what you want for the present state (most likely 'running' or 'offline'), and you only have to do this once for when the master isn't in the system yet *and* you want something in the future. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... File infra/services/master_manager_launcher/__main__.py (right): https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/__main__.py:7: # pragma: no cover On 2015/05/07 17:59:31, agable wrote: > I don't think this actually works. Does it? I swear I've tried this and had it > not actually do what I want in the past. again -- since there is no test file associated with this, the entire file is not tested for coverage, period. the #pragma here is for humans https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/__main__.py:117: logging.info('running %s' % command) On 2015/05/07 17:59:31, agable wrote: > This log line will be prettier (and copy-pasteable) if you ' '.join(command). > Otherwise it's gonna print the full list of tokens, which is just annoying. I always worry about doing that because you might have spaces in the command. we don't right now, but I don't like the idea of people copying/pasting things which may be subtly broken https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... File infra/services/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/desired_state_parser.py:25: def load_desired_state_file(filename): # pragma: no cover On 2015/05/07 17:59:31, agable wrote: > Seems like this could be tested with a couple valid/invalid json files checked > into test/data/*.json Done. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/desired_state_parser.py:54: # Verify desired_state and timestamp are valid. On 2015/05/07 17:59:31, agable wrote: > Seems like the first "check for presence" loop could be inlined in this loop to > save slightly on the constant factors. Done. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/desired_state_parser.py:91: if index: On 2015/05/07 17:59:31, agable wrote: > bisect_left is guaranteed to return an int, so please test for inequality > against 0 for readability. Done. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/desired_state_parser.py:99: Returns triggered_masters and ignored_masters (a set and a list). On 2015/05/07 17:59:31, agable wrote: > (a list and a set, respectively). Done. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... File infra/services/master_manager_launcher/test/desired_state_parser_test.py (right): https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/test/desired_state_parser_test.py:116: On 2015/05/07 17:59:31, agable wrote: > nit: two blank lines my guess is you meant 'one blank line,' which I've done. let me know if this isn't what you intended
LGTM % my rebuttal to your rebuttal :) I don't actually expect you to change the behavior at this point, but I really honestly believe that there are no corner cases or crazy logic necessary. It is very simple and makes tons of sense. I would see this as an easy way to bring a master online: land the new master config and simultaneously land a file which says "turn it on over the weekend". https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/07 at 19:49:39, stip wrote: > Case 1: The master already has entries in the desired_master_state file and the person wants to do something in the future. The person only adds one entry, in the future, because the present case is already taken care of by previous entries. Case 1.5: The file used to contain a bunch of entries, but has been cleaned out for the sake of readability. Goto Case 2 or 3. On 2015/05/07 at 19:49:39, stip wrote: > What if someone says 'offline' in the future, the master is running right now, and you kill it? Do we keep the script running and *then* bring it offline in the future? Do we keep it offline because "that's what they want later anyway?" > > What if someone says 'drained' in the future, and it's offline now. Do we start it just to keep it drained, or keep it offline? > > All these kinds of weird edge cases go away if you just *tell* it what you want for the present state (most likely 'running' or 'offline'), and you only have to do this once for when the master isn't in the system yet *and* you want something in the future. I think these "what do we do" objections are contrived. The answer is always "nothing". If the only action is in the future, we *do nothing*, exactly as though the file were empty and completely unmanaged by the master-manager. There are no corner cases here. https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... File infra/services/master_manager_launcher/__main__.py (right): https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/__main__.py:7: # pragma: no cover On 2015/05/07 at 19:49:39, stip wrote: > On 2015/05/07 17:59:31, agable wrote: > > I don't think this actually works. Does it? I swear I've tried this and had it > > not actually do what I want in the past. > > again -- since there is no test file associated with this, the entire file is not tested for coverage, period. the #pragma here is for humans Ah, right. Balls :( https://codereview.chromium.org/1128783003/diff/100001/infra/services/master_... infra/services/master_manager_launcher/__main__.py:117: logging.info('running %s' % command) On 2015/05/07 at 19:49:39, stip wrote: > On 2015/05/07 17:59:31, agable wrote: > > This log line will be prettier (and copy-pasteable) if you ' '.join(command). > > Otherwise it's gonna print the full list of tokens, which is just annoying. > > I always worry about doing that because you might have spaces in the command. we don't right now, but I don't like the idea of people copying/pasting things which may be subtly broken Hmm. That's fair, I suppose. But also makes me want to cry.
https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... File infra/tools/master_manager_launcher/desired_state_parser.py (right): https://codereview.chromium.org/1128783003/diff/20001/infra/tools/master_mana... infra/tools/master_manager_launcher/desired_state_parser.py:67: if not get_master_state(states, now=now): On 2015/05/08 00:14:53, agable wrote: > On 2015/05/07 at 19:49:39, stip wrote: > > Case 1: The master already has entries in the desired_master_state file and > the person wants to do something in the future. The person only adds one entry, > in the future, because the present case is already taken care of by previous > entries. > > Case 1.5: The file used to contain a bunch of entries, but has been cleaned out > for the sake of readability. Goto Case 2 or 3. You can't clean it out for readability fully, because the script (and presubmit) won't allow you to delete all entries -- you must have at least one describing a state in the past. > > On 2015/05/07 at 19:49:39, stip wrote: > > What if someone says 'offline' in the future, the master is running right now, > and you kill it? Do we keep the script running and *then* bring it offline in > the future? Do we keep it offline because "that's what they want later anyway?" > > > > What if someone says 'drained' in the future, and it's offline now. Do we > start it just to keep it drained, or keep it offline? > > > > All these kinds of weird edge cases go away if you just *tell* it what you > want for the present state (most likely 'running' or 'offline'), and you only > have to do this once for when the master isn't in the system yet *and* you want > something in the future. > > I think these "what do we do" objections are contrived. The answer is always > "nothing". If the only action is in the future, we *do nothing*, exactly as > though the file were empty and completely unmanaged by the master-manager. There > are no corner cases here. we'll see how it plays out, but we can always tear out logic in the future if it's not working out
The CQ bit was checked by stip@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1128783003/110001
Message was sent while issue was closed.
Committed patchset #7 (id:110001) as https://chromium.googlesource.com/infra/infra/+/51f4a0dc0f645c4ade146178112d8...
Message was sent while issue was closed.
On 2015/05/08 at 00:22:13, stip wrote: > On 2015/05/08 00:14:53, agable wrote: > > Case 1.5: The file used to contain a bunch of entries, but has been cleaned out > > for the sake of readability. Goto Case 2 or 3. > > You can't clean it out for readability fully, because the script (and presubmit) won't allow you to delete all entries -- you must have at least one describing a state in the past. /facepalm, yes, that was my point exactly -- you *should* be able to clean it out fully, which is one of the reasons that the script *shouldn't* enforce describing at least one state in the past. |
