Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(63)

Side by Side Diff: client/run_isolated.py

Issue 2924283002: Improve zombie process error message to be actionable. (Closed)
Patch Set: Created 3 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 # Copyright 2012 The LUCI Authors. All rights reserved. 2 # Copyright 2012 The LUCI Authors. All rights reserved.
3 # Use of this source code is governed under the Apache License, Version 2.0 3 # Use of this source code is governed under the Apache License, Version 2.0
4 # that can be found in the LICENSE file. 4 # that can be found in the LICENSE file.
5 5
6 """Runs a command with optional isolated input/output. 6 """Runs a command with optional isolated input/output.
7 7
8 Despite name "run_isolated", can run a generic non-isolated command specified as 8 Despite name "run_isolated", can run a generic non-isolated command specified as
9 args. 9 args.
10 10
(...skipping 87 matching lines...) Expand 10 before | Expand all | Expand 10 after
98 # It is recommended to start the script with a `root_dir` as short as 98 # It is recommended to start the script with a `root_dir` as short as
99 # possible. 99 # possible.
100 # - ir stands for isolated_run 100 # - ir stands for isolated_run
101 # - io stands for isolated_out 101 # - io stands for isolated_out
102 # - it stands for isolated_tmp 102 # - it stands for isolated_tmp
103 ISOLATED_RUN_DIR = u'ir' 103 ISOLATED_RUN_DIR = u'ir'
104 ISOLATED_OUT_DIR = u'io' 104 ISOLATED_OUT_DIR = u'io'
105 ISOLATED_TMP_DIR = u'it' 105 ISOLATED_TMP_DIR = u'it'
106 106
107 107
108 OUTLIVING_ZOMBIE_MSG = """\
109 *** Swarming tried multiple times to delete the %s directory and failed ***
110 *** Hard failing the task ***
111
112 Swarming detected that your testing script ran an executable, which may have
113 started a child executable, and the main script returned early, leaving the
114 children executables playing around unguided.
115
116 You don't want to leave children processes outliving the task on the Swarming
117 bot, don't you? The Swarming bot doesn't.
tandrii(chromium) 2017/06/08 14:27:51 don't you? => do you?
M-A Ruel 2017/06/08 15:02:48 Done.
118
119 How to fix?
120 - For any process that starts children processes, make sure all children
121 processes terminated properly before each parent process exits. This is
122 especially important in very deep process trees.
123 - To achieve this, you MUST handle signals in each executable / python script.
124 - When your test script (python or binary) receives a signal like SIGTERM or
125 CTRL_BREAK_EVENT on Windows), send it to all children processes and wait for
126 them to terminate before quitting.
127 - You have %s seconds to comply after the signal was sent to the process.
tandrii(chromium) 2017/06/08 14:27:50 perhaps you imply here that this signal was sent..
M-A Ruel 2017/06/08 15:02:48 Reworded, please reread.
tandrii(chromium) 2017/06/08 15:16:11 Really good. Thank you.
128
129 See
130 https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/Bot.md#gracef ul-termination-aka-the-sigterm-and-sigkill-dance
131 for more information.
132
133 *** May the SIGKILL force be with you ***
tandrii(chromium) 2017/06/08 14:27:50 :)
134 """
135
136
108 def get_as_zip_package(executable=True): 137 def get_as_zip_package(executable=True):
109 """Returns ZipPackage with this module and all its dependencies. 138 """Returns ZipPackage with this module and all its dependencies.
110 139
111 If |executable| is True will store run_isolated.py as __main__.py so that 140 If |executable| is True will store run_isolated.py as __main__.py so that
112 zip package is directly executable be python. 141 zip package is directly executable be python.
113 """ 142 """
114 # Building a zip package when running from another zip package is 143 # Building a zip package when running from another zip package is
115 # unsupported and probably unneeded. 144 # unsupported and probably unneeded.
116 assert not zip_package.is_zipped_module(sys.modules[__name__]) 145 assert not zip_package.is_zipped_module(sys.modules[__name__])
117 assert THIS_FILE_PATH 146 assert THIS_FILE_PATH
(...skipping 422 matching lines...) Expand 10 before | Expand all | Expand 10 after
540 # process locks *.exe file). Examine out_dir only after that call 569 # process locks *.exe file). Examine out_dir only after that call
541 # completes (since child processes may write to out_dir too and we need 570 # completes (since child processes may write to out_dir too and we need
542 # to wait for them to finish). 571 # to wait for them to finish).
543 if fs.isdir(run_dir): 572 if fs.isdir(run_dir):
544 try: 573 try:
545 success = file_path.rmtree(run_dir) 574 success = file_path.rmtree(run_dir)
546 except OSError as e: 575 except OSError as e:
547 logging.error('Failure with %s', e) 576 logging.error('Failure with %s', e)
548 success = False 577 success = False
549 if not success: 578 if not success:
550 print >> sys.stderr, ( 579 sys.stderr.write(OUTLIVING_ZOMBIE_MSG % ('run', grace_period))
551 'Failed to delete the run directory, thus failing the task.\n'
552 'This may be due to a subprocess outliving the main task\n'
553 'process, holding on to resources. Please fix the task so\n'
554 'that it releases resources and cleans up subprocesses.')
555 if result['exit_code'] == 0: 580 if result['exit_code'] == 0:
556 result['exit_code'] = 1 581 result['exit_code'] = 1
557 if fs.isdir(tmp_dir): 582 if fs.isdir(tmp_dir):
558 try: 583 try:
559 success = file_path.rmtree(tmp_dir) 584 success = file_path.rmtree(tmp_dir)
560 except OSError as e: 585 except OSError as e:
561 logging.error('Failure with %s', e) 586 logging.error('Failure with %s', e)
562 success = False 587 success = False
563 if not success: 588 if not success:
564 print >> sys.stderr, ( 589 sys.stderr.write(OUTLIVING_ZOMBIE_MSG % ('run', grace_period))
565 'Failed to delete the temp directory, thus failing the task.\n'
566 'This may be due to a subprocess outliving the main task\n'
567 'process, holding on to resources. Please fix the task so\n'
568 'that it releases resources and cleans up subprocesses.')
569 if result['exit_code'] == 0: 590 if result['exit_code'] == 0:
570 result['exit_code'] = 1 591 result['exit_code'] = 1
571 592
572 # This deletes out_dir if leak_temp_dir is not set. 593 # This deletes out_dir if leak_temp_dir is not set.
573 if out_dir: 594 if out_dir:
574 isolated_stats = result['stats'].setdefault('isolated', {}) 595 isolated_stats = result['stats'].setdefault('isolated', {})
575 result['outputs_ref'], success, isolated_stats['upload'] = ( 596 result['outputs_ref'], success, isolated_stats['upload'] = (
576 delete_and_upload(storage, out_dir, leak_temp_dir)) 597 delete_and_upload(storage, out_dir, leak_temp_dir))
577 if not success and result['exit_code'] == 0: 598 if not success and result['exit_code'] == 0:
578 result['exit_code'] = 1 599 result['exit_code'] = 1
(...skipping 489 matching lines...) Expand 10 before | Expand all | Expand 10 after
1068 return 1 1089 return 1
1069 1090
1070 1091
1071 if __name__ == '__main__': 1092 if __name__ == '__main__':
1072 subprocess42.inhibit_os_error_reporting() 1093 subprocess42.inhibit_os_error_reporting()
1073 # Ensure that we are always running with the correct encoding. 1094 # Ensure that we are always running with the correct encoding.
1074 fix_encoding.fix_encoding() 1095 fix_encoding.fix_encoding()
1075 file_path.enable_symlink() 1096 file_path.enable_symlink()
1076 1097
1077 sys.exit(main(sys.argv[1:])) 1098 sys.exit(main(sys.argv[1:]))
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698