DescriptionFix silent failures in test runner that lead to long timeouts
Previously the test runner was using pexpect with a timeout to run the
test script. The problem with this is that it can cause the test runner
to hang if the child test script exits while still having live child
processes of its own. The exact sequence of operations leading to the
failure is:
1. TestRunner calls pexpect.run('test_script', timeout, ...).
2. The test script spawns at least one sub process (e.g., the host
forwarder), which is attached to the same stdout as the test script
itself.
3. Pexpect calls select on the test script's stdout file descriptor to
receive output from the script. This is a blocking call.
4. The test script dies for some reason (e.g., disconnected device),
failing to clean up all the sub processes it created in step 2.
5. Because the sub processes still have a reference to the
(now-dead) test script's stdout, the select call in step #3 is not
unblocked, so it hangs until the timeout fires (1.5h by default).
The root cause of the problem is that the test script's sub processes
outlive their parent (which is standard Unix behavior). In order to fix
this we would have to change all the known daemons spawned by every test
script to be proper daemons that release the controlling terminal and
make the test script kill all of its subprocesses when exiting
(i.e., prctl(PR_SET_PDEATHSIG, TERM)).
Because the above fix involves a large number of changes and is a little
fragile in the sense that we may miss updating some scripts by mistake,
this patch takes a defensive approach. Namely, we fix the problem by
replacing the use of pexpect with a custom loop which combines the call
to select() as well as polling for the exit status of the test script.
The call to select uses a smaller timeout (1s), so we can detect a dying
test script almost immediately.
This patch also simplifies some existing code in cmd_helper.
BUG=355952
TEST=src/build/android/test_runner.py perf -v --release --single-step \
--src/tools/perf/run_benchmark -v --output-format=buildbot \
--report-root-metrics --browser=android-chromium-testshell \
rasterize_and_record_micro.key_mobile_sites
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=273530
Patch Set 1 #
Total comments: 1
Patch Set 2 : Also capture stderr. Read output before polling for completion. #Patch Set 3 : Unbreak windows. #
Messages
Total messages: 15 (0 generated)
|