Issue 661339: Fixed a startup race condition....

Issue 661339: Fixed a startup race condition.... (Closed)

Created:
10 years, 9 months ago by MAD

Modified:
8 years, 8 months ago

Reviewers:
cpu_(ooo_6.6-7.5), Paweł Hajdan Jr., robertshield, Nico, Sigurður Ásgeirsson

CC:
chromium-reviews, ben+cc_chromium.org

Base URL:
svn://chrome-svn/chrome/trunk/src/

Visibility:
Public.

Description

Fixed a startup race condition. Although the new test was written in a platform independent way, it is only added to the Widows specific portion of the ui_test target in the gyp file because it wasn't tried yet on the other platforms. The bug was found and the fix was written in the windows specific version of the process singleton anyway... But if people working on the other platforms would like to try the test there, that would be great. :-) BUG=9593 TEST=A new test have been created to validate this. Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=40629

Patch Set 1 #

Total comments: 12

Patch Set 2 : '' #

Total comments: 16

Patch Set 3 : '' #

Patch Set 4 : '' #

Total comments: 6

Patch Set 5 : '' #

Patch Set 6 : '' #

Patch Set 7 : '' #

Patch Set 8 : '' #

Patch Set 9 : '' #

Total comments: 2

Created: 10 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+304 lines, -6 lines)			Patch
M	chrome/browser/process_singleton_win.cc	View	1 2 3 4 5 6 7 8	3 chunks	+34 lines, -6 lines	1 comment	Download
A	chrome/browser/process_singleton_win_uitest.cc	View	1 2 3 4 5 6 7 8	1 chunk	+266 lines, -0 lines	1 comment	Download
M	chrome/chrome_tests.gypi	View	1 2 3 4 5 6 7 8	2 chunks	+4 lines, -0 lines	0 comments	Download

Messages

Total messages: 17 (0 generated)

Expand Messages | Collapse Messages

Paweł Hajdan Jr.

Drive-by with some test comments. http://codereview.chromium.org/661339/diff/1/3 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/1/3#newcode63 chrome/browser/process_singleton_win_uitest.cc:63: EXPECT_NE(static_cast<base::WaitableEvent*>(NULL), start_event); This should ...

10 years, 9 months ago (2010-03-02 09:24:37 UTC) #2

MAD

Thanks for the drive by... Changes uploaded... BYE MAD http://codereview.chromium.org/661339/diff/1/3 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/1/3#newcode63 chrome/browser/process_singleton_win_uitest.cc:63: ...

10 years, 9 months ago (2010-03-02 14:33:21 UTC) #3

Thanks for the drive by...

Changes uploaded...

BYE
MAD

http://codereview.chromium.org/661339/diff/1/3
File chrome/browser/process_singleton_win_uitest.cc (right):

http://codereview.chromium.org/661339/diff/1/3#newcode63
chrome/browser/process_singleton_win_uitest.cc:63:
EXPECT_NE(static_cast<base::WaitableEvent*>(NULL), start_event);
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> This should be definitely ASSERT_NE, otherwise we're going to kill the entire
> ui_tests binary on a NULL ptr dereference.

Done.
Thanks, I didn't think about that, so I changed a few more where ASSERT makes
more sense then EXPECT...

http://codereview.chromium.org/661339/diff/1/3#newcode64
chrome/browser/process_singleton_win_uitest.cc:64:
EXPECT_TRUE(start_event->Wait());
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> Shouldn't this be ASSERT_TRUE?

Done.

http://codereview.chromium.org/661339/diff/1/3#newcode108
chrome/browser/process_singleton_win_uitest.cc:108:
chrome_starter_threads_[i]->Start();
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> Shouldn't this check the return value?

Done.

http://codereview.chromium.org/661339/diff/1/3#newcode138
chrome/browser/process_singleton_win_uitest.cc:138: for (size_t attempt = 0;
attempt < kNbAttempts && !failed; ++attempt) {
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> Please use SCOPED_TRACE for this loop.

Done.
Cool, I didn't know about this... Thanks again! :-)

http://codereview.chromium.org/661339/diff/1/3#newcode145
chrome/browser/process_singleton_win_uitest.cc:145: for (size_t i = 0; i <
kNbThreads; ++i) {
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> Please use SCOPED_TRACE for this loop.

Done.

http://codereview.chromium.org/661339/diff/1/3#newcode161
chrome/browser/process_singleton_win_uitest.cc:161: for (size_t i = 0; i <
kNbThreads; ++i)
On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote:
> Please use SCOPED_TRACE or just print more info in case of failure via <<.

Done.

Paweł Hajdan Jr.

Thanks, the fixes look good. I just spotted the Sleep I overlooked before. Could you ...

10 years, 9 months ago (2010-03-02 14:42:44 UTC) #4

MAD

http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote: > ...

10 years, 9 months ago (2010-03-02 14:59:11 UTC) #5

robertshield

http://codereview.chromium.org/661339/diff/6/7 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/6/7#newcode46 chrome/browser/process_singleton_win.cc:46: HANDLE only_me = ::CreateMutex(NULL, FALSE, L"Global\\ProcessSingleton"); Shouldn't the object ...

10 years, 9 months ago (2010-03-02 15:06:14 UTC) #6

http://codereview.chromium.org/661339/diff/6/7
File chrome/browser/process_singleton_win.cc (right):

http://codereview.chromium.org/661339/diff/6/7#newcode46
chrome/browser/process_singleton_win.cc:46: HANDLE only_me = ::CreateMutex(NULL,
FALSE, L"Global\\ProcessSingleton");
Shouldn't the object be created in the "Local" session namespace? Is there any
reason to do this globally?

http://codereview.chromium.org/661339/diff/6/7#newcode48
chrome/browser/process_singleton_win.cc:48: DWORD result =
::WaitForSingleObject(only_me, INFINITE);
Please remove the :: here and elsewhere to be consistent with the other calls to
win32 apis in this file.

http://codereview.chromium.org/661339/diff/6/7#newcode63
chrome/browser/process_singleton_win.cc:63: success = ::CloseHandle(only_me);
If you like, ScopedHandle (included via base/scoped_handle.h) could be used
here.

http://codereview.chromium.org/661339/diff/6/8
File chrome/browser/process_singleton_win_uitest.cc (right):

http://codereview.chromium.org/661339/diff/6/8#newcode53
chrome/browser/process_singleton_win_uitest.cc:53: // TODO(port): For some
reasons the LaunchApp call below fail even though
micro-nit: reasons -> reason, fail -> fails

http://codereview.chromium.org/661339/diff/6/8#newcode109
chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new
ChromeStarter;
If you wanted to get fancy, you could look into SetThreadAffinityMask() to get
the threads scheduled on different cores. Don't bother if this test reliably
reproduces the problem pre-patch though.

http://codereview.chromium.org/661339/diff/6/8#newcode221
chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100);
On 2010/03/02 14:59:11, mad1 wrote:
> On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote:
> > I missed this the first time, but it's going to be flaky. Are there some
other
> > choices here (e.g. wait for an event)?
> 
> Yeah, I know, I don't like it either... I couldn't find any event that
confirms
> every process that we tried creating independently have be properly
restarted...
> 
> If I don't sleep here, when the test is running with the release build, some
> chrome.exe processes were left dangling. I could see them in the task manager
> (and thus, I couldn't relink chrome.exe without killing them). When I tried to
> attach them to the debugger, they terminated right away with a
> STATUS_DLL_NOT_FOUND error printed in the output window.
> 
> If the test machines always kill all (potentially dangling) chrome.exe
processes
> after execution, then we should be fine and I could remove it... Otherwise,
I'm
> all ears for suggestions of other ways to avoid this...
> 
> If I had a way of getting the new handles of the processes that are being
> spawned from the main browser process, then I could try to kill them, but I
> don't know if I have access to this from outside...
> 
> Any suggestions?

My vote: just kill them all by name. There are some killall functions in base
somewhere.

MAD

I'll send an update of the patch once I confirm if it is worth adding ...

10 years, 9 months ago (2010-03-02 16:45:57 UTC) #7

I'll send an update of the patch once I confirm if it is worth adding some
windows specific code to set the thread affinity.

http://codereview.chromium.org/661339/diff/6/7
File chrome/browser/process_singleton_win.cc (right):

http://codereview.chromium.org/661339/diff/6/7#newcode46
chrome/browser/process_singleton_win.cc:46: HANDLE only_me = ::CreateMutex(NULL,
FALSE, L"Global\\ProcessSingleton");
On 2010/03/02 15:06:14, robertshield wrote:
> Shouldn't the object be created in the "Local" session namespace? Is there any
> reason to do this globally?

Hum... Good question... I used Global to mimic what was done in
Upgrade::IsBrowserAlreadyRunning(), but I realize now how it is important for
them to use Global but not for us...

Thanks!

http://codereview.chromium.org/661339/diff/6/7#newcode48
chrome/browser/process_singleton_win.cc:48: DWORD result =
::WaitForSingleObject(only_me, INFINITE);
On 2010/03/02 15:06:14, robertshield wrote:
> Please remove the :: here and elsewhere to be consistent with the other calls
to
> win32 apis in this file.

Done.

http://codereview.chromium.org/661339/diff/6/7#newcode63
chrome/browser/process_singleton_win.cc:63: success = ::CloseHandle(only_me);
On 2010/03/02 15:06:14, robertshield wrote:
> If you like, ScopedHandle (included via base/scoped_handle.h) could be used
> here.

Done.

http://codereview.chromium.org/661339/diff/6/8
File chrome/browser/process_singleton_win_uitest.cc (right):

http://codereview.chromium.org/661339/diff/6/8#newcode53
chrome/browser/process_singleton_win_uitest.cc:53: // TODO(port): For some
reasons the LaunchApp call below fail even though
On 2010/03/02 15:06:14, robertshield wrote:
> micro-nit: reasons -> reason, fail -> fails

Done.

http://codereview.chromium.org/661339/diff/6/8#newcode109
chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new
ChromeStarter;
On 2010/03/02 15:06:14, robertshield wrote:
> If you wanted to get fancy, you could look into SetThreadAffinityMask() to get
> the threads scheduled on different cores. Don't bother if this test reliably
> reproduces the problem pre-patch though.

That would be cool, because the the test doesn't unfortunately "reliably"
reproduce the problem pre-patch...
But I don't think we have a platform independent way of doing this... Unless I
missed it somewhere...

I'll try it and see if it is worth adding Windows Specific code in here... If it
helps reproducing the problem more reliably... Maybe...

http://codereview.chromium.org/661339/diff/6/8#newcode221
chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100);
On 2010/03/02 15:06:14, robertshield wrote:
> On 2010/03/02 14:59:11, mad1 wrote:
> > On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote:
> > > I missed this the first time, but it's going to be flaky. Are there some
> other
> > > choices here (e.g. wait for an event)?
> > 
> > Yeah, I know, I don't like it either... I couldn't find any event that
> confirms
> > every process that we tried creating independently have be properly
> restarted...
> > 
> > If I don't sleep here, when the test is running with the release build, some
> > chrome.exe processes were left dangling. I could see them in the task
manager
> > (and thus, I couldn't relink chrome.exe without killing them). When I tried
to
> > attach them to the debugger, they terminated right away with a
> > STATUS_DLL_NOT_FOUND error printed in the output window.
> > 
> > If the test machines always kill all (potentially dangling) chrome.exe
> processes
> > after execution, then we should be fine and I could remove it... Otherwise,
> I'm
> > all ears for suggestions of other ways to avoid this...
> > 
> > If I had a way of getting the new handles of the processes that are being
> > spawned from the main browser process, then I could try to kill them, but I
> > don't know if I have access to this from outside...
> > 
> > Any suggestions?
> 
> My vote: just kill them all by name. There are some killall functions in base
> somewhere.

The only KillAll function I found was in the Chrome Frame tests stuff:
chrome_frame\test_utils.cc...

And I wasn't sure if a ui_test was allowed to kill all running chrome.exe
process... I know I wouldn't like it to kill all MY chromes when I run it on my
machine...

robertshield

Thanks, everything looks good, I think we just need better clean up of the lingering ...

10 years, 9 months ago (2010-03-02 17:03:15 UTC) #8

Thanks, everything looks good, I think we just need better clean up of the
lingering Chrome processes that Pawel mentioned. One idea on how to do it below:

http://codereview.chromium.org/661339/diff/6/8
File chrome/browser/process_singleton_win_uitest.cc (right):

http://codereview.chromium.org/661339/diff/6/8#newcode109
chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new
ChromeStarter;
On 2010/03/02 16:45:57, mad1 wrote:
> On 2010/03/02 15:06:14, robertshield wrote:
> > If you wanted to get fancy, you could look into SetThreadAffinityMask() to
get
> > the threads scheduled on different cores. Don't bother if this test reliably
> > reproduces the problem pre-patch though.
> 
> That would be cool, because the the test doesn't unfortunately "reliably"
> reproduce the problem pre-patch...
> But I don't think we have a platform independent way of doing this... Unless I
> missed it somewhere...
> 
> I'll try it and see if it is worth adding Windows Specific code in here... If
it
> helps reproducing the problem more reliably... Maybe...

ok, np. just a suggestion ;-)

http://codereview.chromium.org/661339/diff/6/8#newcode221
chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100);
On 2010/03/02 16:45:57, mad1 wrote:
> On 2010/03/02 15:06:14, robertshield wrote:
> > On 2010/03/02 14:59:11, mad1 wrote:
> > > On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote:
> > > > I missed this the first time, but it's going to be flaky. Are there some
> > other
> > > > choices here (e.g. wait for an event)?
> > > 
> > > Yeah, I know, I don't like it either... I couldn't find any event that
> > confirms
> > > every process that we tried creating independently have be properly
> > restarted...
> > > 
> > > If I don't sleep here, when the test is running with the release build,
some
> > > chrome.exe processes were left dangling. I could see them in the task
> manager
> > > (and thus, I couldn't relink chrome.exe without killing them). When I
tried
> to
> > > attach them to the debugger, they terminated right away with a
> > > STATUS_DLL_NOT_FOUND error printed in the output window.
> > > 
> > > If the test machines always kill all (potentially dangling) chrome.exe
> > processes
> > > after execution, then we should be fine and I could remove it...
Otherwise,
> > I'm
> > > all ears for suggestions of other ways to avoid this...
> > > 
> > > If I had a way of getting the new handles of the processes that are being
> > > spawned from the main browser process, then I could try to kill them, but
I
> > > don't know if I have access to this from outside...
> > > 
> > > Any suggestions?
> > 
> > My vote: just kill them all by name. There are some killall functions in
base
> > somewhere.
> 
> The only KillAll function I found was in the Chrome Frame tests stuff:
> chrome_frame\test_utils.cc...
> 
> And I wasn't sure if a ui_test was allowed to kill all running chrome.exe
> process... I know I wouldn't like it to kill all MY chromes when I run it on
my
> machine...

One alternative is to run Chrome with a custom --user-data-dir flag (or any
other flag that gets propagated to child processes) and then use
KillAllNamedProcessesWithArgument() defined in test_utils.cc. CF uses that to
kill all CF Chromes during its tests without being so rude as to kill all
Chromes on the system.

cpu_(ooo_6.6-7.5)

I am a little bit lost here. http://codereview.chromium.org/661339/diff/1015/19 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/1015/19#newcode47 chrome/browser/process_singleton_win.cc:47: ScopedHandle only_me(CreateMutex(NULL, ...

10 years, 9 months ago (2010-03-02 19:15:58 UTC) #9

MAD

Uploaded an updated version and answered comments. Thanks! BYE MAD http://codereview.chromium.org/661339/diff/1015/19 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/1015/19#newcode47 ...

10 years, 9 months ago (2010-03-02 19:57:10 UTC) #10

Uploaded an updated version and answered comments.

Thanks!

BYE
MAD

http://codereview.chromium.org/661339/diff/1015/19
File chrome/browser/process_singleton_win.cc (right):

http://codereview.chromium.org/661339/diff/1015/19#newcode47
chrome/browser/process_singleton_win.cc:47: ScopedHandle
only_me(CreateMutex(NULL, FALSE, L"Local\\ProcessSingleton"));
On 2010/03/02 19:15:58, cpu wrote:
> ProcessSingleton feels awfully generic to me. Maybe it clashes with some other
> product. How about mixing chrome into it?
> 
> I don't think you need 'Local\\' to create an object in this session.

Good point about the name, I added the AppId in there...
As for the local, I thought it would ensure that it is only for our session in
case a multi-user remote desktop session is trying to start chrome at the same
time we are trying to do it. But I'm not completely sure here, the documentation
says that the current session is the Default, but it also says that:
'client processes can use the "Local\" prefix to explicitly create an object in
their session namespace'

So I'm open to remove the "Local\" from there if you really want me to remove
it, but I don't think it hurts to have it there. Right?

Thanks!

http://codereview.chromium.org/661339/diff/1015/19#newcode48
chrome/browser/process_singleton_win.cc:48: DCHECK(only_me.Get() != NULL) <<
"GetLastError = " << GetLastError();
On 2010/03/02 19:15:58, cpu wrote:
> CreateMutex(.., FALSE, ..)  that means you don't own the mutex , which in turn
> means that maybe you did not created it but just open it.

Yes, this is on purpose as suggested in the MSDN doc. When you use a named
Mutex, it is clearer to not try to get the initial ownership, since it isn't
guaranteed. CreateMutex will return a non-owned mutex if it was already created
and still owned.

http://codereview.chromium.org/661339/diff/1015/19#newcode62
chrome/browser/process_singleton_win.cc:62: BOOL success =
ReleaseMutex(only_me);
On 2010/03/02 19:15:58, cpu wrote:
> I don't see you acquiring the mutex...?

You acquire a Mutex by waiting on it... Again...
As documented :-)

I added comments to make this clearer...

Thanks!

MAD

Cool Thanks! BYE MAD On Tue, Mar 2, 2010 at 5:15 PM, <cpu@chromium.org> wrote: > ...

10 years, 9 months ago (2010-03-02 22:56:57 UTC) #12

MAD

OK, can you guys (Robert and/or Pawel) take one more look at the unit test? ...

10 years, 9 months ago (2010-03-03 22:46:14 UTC) #13

Paweł Hajdan Jr.

LGTM, thanks for the work on making this solid!

10 years, 9 months ago (2010-03-04 07:22:17 UTC) #14

Nico

http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_singleton_win_uitest.cc File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_singleton_win_uitest.cc#newcode258 chrome/browser/process_singleton_win_uitest.cc:258: pending_starters.empty(); Vectors are cleared with "clear()", empty() just returns ...

8 years, 8 months ago (2012-04-11 03:18:59 UTC) #16

Sigurður Ásgeirsson

8 years, 8 months ago (2012-04-11 12:17:40 UTC) #17

Drive-by, did not review in detail.

http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single...
File chrome/browser/process_singleton_win.cc (right):

http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single...
chrome/browser/process_singleton_win.cc:58: DWORD result =
WaitForSingleObject(only_me, INFINITE);
WFSO can also return WAIT_ABANDONED for success. This is in the case when the
previous owner of the mutex died or was killed before releasing it.
See
http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx

I believe you can test this condition by grabbing the mutex, then closing that
handle. This should implicitly release it into the abandoned state.

Expand Messages | Collapse Messages