|
|
Chromium Code Reviews|
Created:
4 years, 2 months ago by Yoshisato Yanagisawa Modified:
4 years, 1 month ago CC:
chromium-reviews, infra-reviews+build_chromium.org, kjellander-cc_chromium.org Target Ref:
refs/heads/master Project:
build Visibility:
Public. |
DescriptionMake os.waitpid timeout on Windows.
If a process does not finish, os.waitpid won't also finish, and
stop cloudtail blocks until recipe timeout. To avoid the situation,
let me make os.waitpid timeout in 10 seconds.
BUG=656846
Patch Set 1 #
Total comments: 2
Patch Set 2 : fix indent. #Patch Set 3 : rebased. #
Total comments: 1
Patch Set 4 : starts with lower case. #Messages
Total messages: 17 (3 generated)
yyanagisawa@chromium.org changed reviewers: + shinyak@chromium.org, tikuta@chromium.org, ukai@chromium.org
lgtm https://codereview.chromium.org/2431223008/diff/1/scripts/slave/recipe_module... File scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py (right): https://codereview.chromium.org/2431223008/diff/1/scripts/slave/recipe_module... scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py:61: Exception: if something raised in os.waitpid. nit: I prefer to use 'If' here or use 'otherwise' below.
https://codereview.chromium.org/2431223008/diff/1/scripts/slave/recipe_module... File scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py (right): https://codereview.chromium.org/2431223008/diff/1/scripts/slave/recipe_module... scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py:61: Exception: if something raised in os.waitpid. On 2016/10/21 02:35:04, tikuta wrote: > nit: I prefer to use 'If' here or use 'otherwise' below. good catch.
lgtm
lgtm
vadimsh@chromium.org changed reviewers: + vadimsh@chromium.org
I've got access to Windows machine for development. It appears this script is mostly broken on Windows :( I'll rewrite it using Win32 API in a separate CL. https://codereview.chromium.org/2431223008/diff/40001/scripts/slave/recipe_mo... File scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py (right): https://codereview.chromium.org/2431223008/diff/40001/scripts/slave/recipe_mo... scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py:89: os.kill(pid, signal.SIGINT) I've just discovered that on Windows this doesn't actually work. It appears to be killing the process right away (at least that's what I see when I try os.kill(pid, signal.SIGINT) on Windows). See http://stackoverflow.com/questions/7085604/sending-c-to-python-subprocess-obj... So I don't understand how waitpid can get stuck. The only possibility is that the PID got assigned to some other process immediately. Actually, I've just realized that os.waitpid() required to pass a handle to the process, not PID (as the doc clearly states). I've confirmed it by grabbing a handle via win32api.OpenProcess(win32con.PROCESS_QUERY_INFORMATION | win32con.SYNCHRONIZE, False, <pid>) and passing it to os.waitpid. It blocks (as it should). When I try to pass PID directly, it just says "No child process".
Alternatively, we can just kill cloudtail with SIGKILL (and possibly loose some logs, usually their tails) to avoid all this complexity. Wdyt?
Goal of using cloudtail is keeping on tracking compiler_proxy.INFO under sudden death of buildbot slave. Question might be SIGKILL meets the goal or not. I guess that in sudden death situation, "cloudtail stop" step might not be executed for a while, and we might not need to care for the way to kill cloudtail. Maybe ok with SIGKILL. (note that since windows does not have signal.SIGKILL, we might need to use something else)
On 2016/10/21 04:33:32, Yoshisato Yanagisawa wrote: > Goal of using cloudtail is keeping on tracking compiler_proxy.INFO under sudden > death of buildbot slave. > Question might be SIGKILL meets the goal or not. > > I guess that in sudden death situation, "cloudtail stop" step might not be > executed for a while, and we might not need to care for the way to kill > cloudtail. Maybe ok with SIGKILL. (note that since windows does not have > signal.SIGKILL, we might need to use something else) On Windows SIGINT is the same thing as SIGKILL if I understand correctly. To make possible number of lost logs smaller I can expose https://chromium.googlesource.com/infra/infra/+/master/go/src/infra/tools/clo... via CLI. It is 5 sec currently, i.e. the cloudtail can buffer logs up to 5 sec before trying to flush them. Making this value smaller will reduce amount of potentially lost data, but it will increase QPS to Cloud Logging :-/
If there is strong QPS requirement, we can only choose graceful shutdown. Is it possible to make the interval 1 seconds or so?
s/requirement/limitation/.
On 2016/10/21 06:04:23, Yoshisato Yanagisawa wrote: > s/requirement/limitation/. There's a requirement of 500 QPS globally for a cloud project. I think it would be better to implement crbug.com/642299 before changing QPS here :(
On 2016/10/21 17:20:07, Vadim Sh. wrote: > There's a requirement of 500 QPS globally for a cloud project. I think it would > be better to implement crbug.com/642299 before changing QPS here :( Mmm, then graceful shutdown might be the way to go. I will implement it tomorrow. (I thought I should not use win32api or ctypes in this repo, but can I think it allowed to use that?)
This change has been taken over by https://codereview.chromium.org/2444233002/.
Description was changed from ========== Make os.waitpid timeout on Windows. If a process does not finish, os.waitpid won't also finish, and stop cloudtail blocks until recipe timeout. To avoid the situation, let me make os.waitpid timeout in 10 seconds. BUG=656846 ========== to ========== Make os.waitpid timeout on Windows. If a process does not finish, os.waitpid won't also finish, and stop cloudtail blocks until recipe timeout. To avoid the situation, let me make os.waitpid timeout in 10 seconds. BUG=656846 ========== |
