Issue 726563003: Shut down browser processes on Windows by sending WM_CLOSE messages.

Issue 726563003: Shut down browser processes on Windows by sending WM_CLOSE messages. (Closed)

Created:
6 years, 1 month ago by Ken Russell (switch to Gerrit)

Modified:
6 years, 1 month ago

Reviewers:
nednguyen, dtu, tonyg

CC:
chromium-reviews, telemetry+watch_chromium.org, hendrikw, jbauman

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Shut down browser processes on Windows by sending WM_CLOSE messages. This avoids calling TerminateProcess against the browser during the window when doing so would leak a suspended sub-process. BUG=424024 Committed: https://crrev.com/1c02272b77cede124c7985da598b9ff51e1696cd Cr-Commit-Position: refs/heads/master@{#304342}

Patch Set 1 #

Patch Set 2 : Added back in robustness fixes. #

Total comments: 4

Patch Set 3 : Check specifically for Chrome's windows. #

Total comments: 2

Patch Set 4 : Addressed review feedback from nedngyuen. Pass app_name to CooperativelyShutdown. #

Patch Set 5 : Removed tab character #

Created: 6 years, 1 month ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+101 lines, -2 lines)			Patch
M	tools/telemetry/telemetry/core/backends/chrome/desktop_browser_backend.py	View	1 2 3	1 chunk	+19 lines, -0 lines	0 comments	Download
M	tools/telemetry/telemetry/core/platform/__init__.py	View	1 2 3	1 chunk	+21 lines, -0 lines	0 comments	Download
M	tools/telemetry/telemetry/core/platform/platform_backend.py	View	1 2 3	2 chunks	+23 lines, -2 lines	0 comments	Download
M	tools/telemetry/telemetry/core/platform/win_platform_backend.py	View	1 2 3 4	4 chunks	+38 lines, -0 lines	0 comments	Download

Messages

Total messages: 28 (6 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Ken Russell (switch to Gerrit)

Please review. Thanks. This has been tested locally. I'm running a stress test overnight to ...

6 years, 1 month ago (2014-11-14 03:05:44 UTC) #2

nednguyen

https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py File tools/telemetry/telemetry/core/platform/platform_backend.py (right): https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py#newcode130 tools/telemetry/telemetry/core/platform/platform_backend.py:130: self._running_browser_backends.discard(browser_backend) Getting around the weakset look up problem with ...

6 years, 1 month ago (2014-11-14 03:56:43 UTC) #3

Ken Russell (switch to Gerrit)

https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py File tools/telemetry/telemetry/core/platform/platform_backend.py (right): https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py#newcode130 tools/telemetry/telemetry/core/platform/platform_backend.py:130: self._running_browser_backends.discard(browser_backend) On 2014/11/14 03:56:42, nednguyen wrote: > Getting around ...

6 years, 1 month ago (2014-11-14 04:03:58 UTC) #4

nednguyen

On 2014/11/14 04:03:58, Ken Russell wrote: > https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py > File tools/telemetry/telemetry/core/platform/platform_backend.py (right): > > https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetry/core/platform/platform_backend.py#newcode130 ...

6 years, 1 month ago (2014-11-14 04:10:50 UTC) #5

Ken Russell (switch to Gerrit)

Please add your responses inline in the code review. It makes it easier to track ...

6 years, 1 month ago (2014-11-14 04:52:29 UTC) #6

Please add your responses inline in the code review. It makes it easier to track
the conversations about individual changes.

On 2014/11/14 04:10:50, nednguyen wrote:
> On 2014/11/14 04:03:58, Ken Russell wrote:
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > File tools/telemetry/telemetry/core/platform/platform_backend.py (right):
> > 
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > tools/telemetry/telemetry/core/platform/platform_backend.py:130:
> > self._running_browser_backends.discard(browser_backend)
> > On 2014/11/14 03:56:42, nednguyen wrote:
> > > Getting around the weakset look up problem with this is not a good idea. I
> > want
> > > us to understand the root cause of it.
> > 
> > Qualifying this fix is very expensive -- it takes overnight to really
confirm
> > the fix is working. I have already seen that without this workaround, the
weak
> > set lookup occasionally throws an exception, causing tests to fail -- and
with
> > it, the tests run reliably. I am strongly in favor of putting this
workaround
> in
> > now, getting the bots reliable again, and debugging the root cause of the
> > weakset problem separately.
> This is fair. Can you please file a bug and add me and other telemetry guys to
> keep track of this?
> 
> > 
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > tools/telemetry/telemetry/core/platform/platform_backend.py:268: def
> > IsCooperativeShutdownSupported(self):
> > On 2014/11/14 03:56:42, nednguyen wrote:
> > > Instead of this, can we make a function called StopApplication(pid). Posix
> > > platform do "kill -9 ...", win platform do what you want below.
> > 
> > I don't want to change the current behavior of non-Windows platforms with
this
> > fix. Popen.terminate, which is currently used, seems to work fine on all
> > platforms except Windows.
> 
> Yes, then please use Popen.terminate for StopApplication(process_handle) for
> other platform.

I think this is not a good direction. Doing this will require *all* of the
platform backends to change for this fix, not just the Windows backend. This
carries the risk of breaking other platforms. Also, the Android backend (for
example) works very differently than the other ports. This abstraction will not
work for that platform backend.

Also, the try job result
http://build.chromium.org/p/tryserver.chromium.gpu/builders/win_gpu_triggered...
indicates there might be remaining problems with this fix. I am trying to solve
the immediate flakiness problem that has been plaguing the GPU waterfall for
three weeks. Please help toward this goal.

Ken Russell (switch to Gerrit)

It looks like this change ends up killing the buildbot slave process when it's sent ...

6 years, 1 month ago (2014-11-14 04:59:28 UTC) #7

nednguyen

On 2014/11/14 04:52:29, Ken Russell wrote: > Please add your responses inline in the code ...

6 years, 1 month ago (2014-11-14 05:56:20 UTC) #8

On 2014/11/14 04:52:29, Ken Russell wrote:
> Please add your responses inline in the code review. It makes it easier to
track
> the conversations about individual changes.
> 
> On 2014/11/14 04:10:50, nednguyen wrote:
> > On 2014/11/14 04:03:58, Ken Russell wrote:
> > >
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > > File tools/telemetry/telemetry/core/platform/platform_backend.py (right):
> > > 
> > >
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > > tools/telemetry/telemetry/core/platform/platform_backend.py:130:
> > > self._running_browser_backends.discard(browser_backend)
> > > On 2014/11/14 03:56:42, nednguyen wrote:
> > > > Getting around the weakset look up problem with this is not a good idea.
I
> > > want
> > > > us to understand the root cause of it.
> > > 
> > > Qualifying this fix is very expensive -- it takes overnight to really
> confirm
> > > the fix is working. I have already seen that without this workaround, the
> weak
> > > set lookup occasionally throws an exception, causing tests to fail -- and
> with
> > > it, the tests run reliably. I am strongly in favor of putting this
> workaround
> > in
> > > now, getting the bots reliable again, and debugging the root cause of the
> > > weakset problem separately.
> > This is fair. Can you please file a bug and add me and other telemetry guys
to
> > keep track of this?
> > 
> > > 
> > >
> >
>
https://codereview.chromium.org/726563003/diff/20001/tools/telemetry/telemetr...
> > > tools/telemetry/telemetry/core/platform/platform_backend.py:268: def
> > > IsCooperativeShutdownSupported(self):
> > > On 2014/11/14 03:56:42, nednguyen wrote:
> > > > Instead of this, can we make a function called StopApplication(pid).
Posix
> > > > platform do "kill -9 ...", win platform do what you want below.
> > > 
> > > I don't want to change the current behavior of non-Windows platforms with
> this
> > > fix. Popen.terminate, which is currently used, seems to work fine on all
> > > platforms except Windows.
> > 
> > Yes, then please use Popen.terminate for StopApplication(process_handle) for
> > other platform.
> 
> I think this is not a good direction. Doing this will require *all* of the
> platform backends to change for this fix, not just the Windows backend. This
> carries the risk of breaking other platforms. Also, the Android backend (for
> example) works very differently than the other ports. This abstraction will
not
> work for that platform backend.
> 
> Also, the try job result
>
http://build.chromium.org/p/tryserver.chromium.gpu/builders/win_gpu_triggered...
> indicates there might be remaining problems with this fix. I am trying to
solve
> the immediate flakiness problem that has been plaguing the GPU waterfall for
> three weeks. Please help toward this goal.

I understand that you are in rush to fix this. But the way this code is
structured adds more complexity to browser_backend unnecessarily. Imagine
someone implements a new application backend implementation (yes, this is the
goal of telemetry v2), how would they know that they should use
CooperativeShutDown? It seems clear to me that the details of shutting down the
application is platform dependent, then the code should be structured that way
so that it's clear what's the right way to shut down the application.

I can lg2m this but I want to a bug cc'ed to me to keep track of the technical
debt to be cleaned.

nednguyen

On 2014/11/14 04:59:28, Ken Russell wrote: > It looks like this change ends up killing ...

6 years, 1 month ago (2014-11-14 06:22:10 UTC) #9

nednguyen

On 2014/11/14 06:22:10, nednguyen wrote: > On 2014/11/14 04:59:28, Ken Russell wrote: > > It ...

6 years, 1 month ago (2014-11-14 17:54:08 UTC) #10

Ken Russell (switch to Gerrit)

On 2014/11/14 17:54:08, nednguyen wrote: > On 2014/11/14 06:22:10, nednguyen wrote: > > On 2014/11/14 ...

6 years, 1 month ago (2014-11-14 19:56:20 UTC) #11

Ken Russell (switch to Gerrit)

Please re-review. Thanks. I specialized the EnumWindows callback to find only Chrome windows. One of ...

6 years, 1 month ago (2014-11-14 21:43:02 UTC) #12

nednguyen

https://codereview.chromium.org/726563003/diff/40001/tools/telemetry/telemetry/core/platform/win_platform_backend.py File tools/telemetry/telemetry/core/platform/win_platform_backend.py (right): https://codereview.chromium.org/726563003/diff/40001/tools/telemetry/telemetry/core/platform/win_platform_backend.py#newcode391 tools/telemetry/telemetry/core/platform/win_platform_backend.py:391: win32gui.GetClassName(hwnd).lower().startswith('chrome')): Can you please find another way that does ...

6 years, 1 month ago (2014-11-14 22:14:13 UTC) #13

Ken Russell (switch to Gerrit)

6 years, 1 month ago (2014-11-15 00:34:27 UTC) #14

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/726563003/60001

6 years, 1 month ago (2014-11-15 01:30:40 UTC) #17

commit-bot: I haz the power

Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/24426)

6 years, 1 month ago (2014-11-15 01:37:36 UTC) #19

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/726563003/80001

6 years, 1 month ago (2014-11-15 01:42:09 UTC) #21

commit-bot: I haz the power

Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/24430)

6 years, 1 month ago (2014-11-15 01:47:57 UTC) #23

Ken Russell (switch to Gerrit)

On 2014/11/15 01:47:57, I haz the power (commit-bot) wrote: > Try jobs failed on following ...

6 years, 1 month ago (2014-11-15 02:19:17 UTC) #24

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/726563003/80001

6 years, 1 month ago (2014-11-15 02:20:26 UTC) #26

commit-bot: I haz the power

6 years, 1 month ago (2014-11-15 03:56:47 UTC) #28

Message was sent while issue was closed.

Patchset 5 (id:??) landed as
https://crrev.com/1c02272b77cede124c7985da598b9ff51e1696cd
Cr-Commit-Position: refs/heads/master@{#304342}

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages