Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(27)

Unified Diff: client/run_isolated.py

Issue 2924283002: Improve zombie process error message to be actionable. (Closed)
Patch Set: Created 3 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: client/run_isolated.py
diff --git a/client/run_isolated.py b/client/run_isolated.py
index 00edf14f6ddd770e7def2a4c692638523dfed8ec..ba5fb26f8b5b0fb6b56d92e0c0d87afbd8287722 100755
--- a/client/run_isolated.py
+++ b/client/run_isolated.py
@@ -105,6 +105,35 @@ ISOLATED_OUT_DIR = u'io'
ISOLATED_TMP_DIR = u'it'
+OUTLIVING_ZOMBIE_MSG = """\
+*** Swarming tried multiple times to delete the %s directory and failed ***
+*** Hard failing the task ***
+
+Swarming detected that your testing script ran an executable, which may have
+started a child executable, and the main script returned early, leaving the
+children executables playing around unguided.
+
+You don't want to leave children processes outliving the task on the Swarming
+bot, don't you? The Swarming bot doesn't.
tandrii(chromium) 2017/06/08 14:27:51 don't you? => do you?
M-A Ruel 2017/06/08 15:02:48 Done.
+
+How to fix?
+- For any process that starts children processes, make sure all children
+ processes terminated properly before each parent process exits. This is
+ especially important in very deep process trees.
+- To achieve this, you MUST handle signals in each executable / python script.
+- When your test script (python or binary) receives a signal like SIGTERM or
+ CTRL_BREAK_EVENT on Windows), send it to all children processes and wait for
+ them to terminate before quitting.
+- You have %s seconds to comply after the signal was sent to the process.
tandrii(chromium) 2017/06/08 14:27:50 perhaps you imply here that this signal was sent..
M-A Ruel 2017/06/08 15:02:48 Reworded, please reread.
tandrii(chromium) 2017/06/08 15:16:11 Really good. Thank you.
+
+See
+https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/Bot.md#graceful-termination-aka-the-sigterm-and-sigkill-dance
+for more information.
+
+*** May the SIGKILL force be with you ***
tandrii(chromium) 2017/06/08 14:27:50 :)
+"""
+
+
def get_as_zip_package(executable=True):
"""Returns ZipPackage with this module and all its dependencies.
@@ -547,11 +576,7 @@ def map_and_run(
logging.error('Failure with %s', e)
success = False
if not success:
- print >> sys.stderr, (
- 'Failed to delete the run directory, thus failing the task.\n'
- 'This may be due to a subprocess outliving the main task\n'
- 'process, holding on to resources. Please fix the task so\n'
- 'that it releases resources and cleans up subprocesses.')
+ sys.stderr.write(OUTLIVING_ZOMBIE_MSG % ('run', grace_period))
if result['exit_code'] == 0:
result['exit_code'] = 1
if fs.isdir(tmp_dir):
@@ -561,11 +586,7 @@ def map_and_run(
logging.error('Failure with %s', e)
success = False
if not success:
- print >> sys.stderr, (
- 'Failed to delete the temp directory, thus failing the task.\n'
- 'This may be due to a subprocess outliving the main task\n'
- 'process, holding on to resources. Please fix the task so\n'
- 'that it releases resources and cleans up subprocesses.')
+ sys.stderr.write(OUTLIVING_ZOMBIE_MSG % ('run', grace_period))
if result['exit_code'] == 0:
result['exit_code'] = 1
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698