|
|
Chromium Code Reviews|
Created:
4 years ago by Paweł Hajdan Jr. Modified:
4 years ago CC:
chromium-reviews, infra-reviews+luci-py_chromium.org Target Ref:
refs/heads/master Project:
luci-py Visibility:
Public. |
DescriptionUpload basic info about completed swarming tasks to event_mon
More CLs will follow to fill in more details.
BUG=660014
Committed: https://github.com/luci/luci-py/commit/62d9a940b73d2d1497d8dc07b36417be136eb8b1
Patch Set 1 #
Total comments: 9
Patch Set 2 : fixes #
Total comments: 5
Patch Set 3 : fixes #
Messages
Total messages: 30 (22 generated)
phajdan.jr@chromium.org changed reviewers: + maruel@chromium.org
The CQ bit was checked by phajdan.jr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Luci-py Presubmit on luci.infra.try (JOB_FAILED, https://luci-milo.appspot.com/swarming/task/32c4963a564d3110)
lgtm with caveat. https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:15: """Sends an event_mon event about a swarming task. Is this synchronous or asynchronous? Please specify. https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... File appengine/swarming/server/task_scheduler.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... appengine/swarming/server/task_scheduler.py:816: event_mon_metrics.send_task_event(smry) I'm concerned in adding more synchronous I/O in the hot path. I was concerned about ts_mon_metrics too and it did cause a fair amount of failure once deployed to prod. I'm fine with testing on prod and looking at the failure rate.
vadimsh@chromium.org changed reviewers: + vadimsh@chromium.org
https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:23: event.send() should we wrap this in try: except: and log errors instead of failing? Otherwise an outage of monitoring pipeline would cause a Swarming outage.
The CQ bit was checked by phajdan.jr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:15: """Sends an event_mon event about a swarming task. On 2016/11/28 16:56:32, M-A Ruel wrote: > Is this synchronous or asynchronous? Please specify. Done. Ideally we'd have an async mode available, but as far as I know that's not the case yet. https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:23: event.send() On 2016/11/28 19:19:31, Vadim Sh. wrote: > should we wrap this in try: except: and log errors instead of failing? Otherwise > an outage of monitoring pipeline would cause a Swarming outage. Good point. Done. Marc-Antoine, what do you think? I'm somewhat worried about possible silent failures because of this. Compared to a complete swarming outage it may be a reasonable tradeoff. https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... File appengine/swarming/server/task_scheduler.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... appengine/swarming/server/task_scheduler.py:816: event_mon_metrics.send_task_event(smry) On 2016/11/28 16:56:32, M-A Ruel wrote: > I'm concerned in adding more synchronous I/O in the hot path. I was concerned > about ts_mon_metrics too and it did cause a fair amount of failure once deployed > to prod. > > I'm fine with testing on prod and looking at the failure rate. I'm also concerned. How about proactively requesting async interfaces for both monitoring pipelines?
lgtm https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:15: """Sends an event_mon event about a swarming task. On 2016/11/29 14:37:33, Paweł Hajdan Jr. wrote: > On 2016/11/28 16:56:32, M-A Ruel wrote: > > Is this synchronous or asynchronous? Please specify. > > Done. Ideally we'd have an async mode available, but as far as I know that's not > the case yet. Ok then can you document what are the side effects? Is it an urlfetch, is it a DB write? https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... File appengine/swarming/server/task_scheduler.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/server/t... appengine/swarming/server/task_scheduler.py:816: event_mon_metrics.send_task_event(smry) On 2016/11/29 14:37:33, Paweł Hajdan Jr. wrote: > On 2016/11/28 16:56:32, M-A Ruel wrote: > > I'm concerned in adding more synchronous I/O in the hot path. I was concerned > > about ts_mon_metrics too and it did cause a fair amount of failure once > deployed > > to prod. > > > > I'm fine with testing on prod and looking at the failure rate. > > I'm also concerned. How about proactively requesting async interfaces for both > monitoring pipelines? As long as it is quick to disable the functionality by adding an early return in send_task_event(), it's not worth overdesigning it. https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... appengine/swarming/event_mon_metrics.py:22: logging.info('Sending event: %s', event.proto) I'd prefer not to log. https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... appengine/swarming/event_mon_metrics.py:28: except Exception as e: Not a fan of trapping everything but I'll let it for now. https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... appengine/swarming/event_mon_metrics.py:29: logging.error('Caught exception while sending event: %s', e) logging.exception() would probably be useful, then you don't need e.
The CQ bit was checked by phajdan.jr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by phajdan.jr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/1/appengine/swarming/event_mo... appengine/swarming/event_mon_metrics.py:15: """Sends an event_mon event about a swarming task. On 2016/11/29 14:48:40, M-A Ruel wrote: > Ok then can you document what are the side effects? Is it an urlfetch, is it a > DB write? Done. Please note this comment might get out of sync with future implementation of gae_event_mon. https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... File appengine/swarming/event_mon_metrics.py (right): https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... appengine/swarming/event_mon_metrics.py:22: logging.info('Sending event: %s', event.proto) On 2016/11/29 14:48:40, M-A Ruel wrote: > I'd prefer not to log. Done. https://codereview.chromium.org/2534943002/diff/20001/appengine/swarming/even... appengine/swarming/event_mon_metrics.py:29: logging.error('Caught exception while sending event: %s', e) On 2016/11/29 14:48:41, M-A Ruel wrote: > logging.exception() would probably be useful, then you don't need e. Done.
The CQ bit was checked by phajdan.jr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from maruel@chromium.org Link to the patchset: https://codereview.chromium.org/2534943002/#ps40001 (title: "fixes")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 40001, "attempt_start_ts": 1480432467374450,
"parent_rev": "9694fdbdb1c317242d30cc8a6b3547be27a52096", "commit_rev":
"62d9a940b73d2d1497d8dc07b36417be136eb8b1"}
Message was sent while issue was closed.
Description was changed from ========== Upload basic info about completed swarming tasks to event_mon More CLs will follow to fill in more details. BUG=660014 ========== to ========== Upload basic info about completed swarming tasks to event_mon More CLs will follow to fill in more details. BUG=660014 Committed: https://github.com/luci/luci-py/commit/62d9a940b73d2d1497d8dc07b36417be136eb8b1 ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://github.com/luci/luci-py/commit/62d9a940b73d2d1497d8dc07b36417be136eb8b1 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
