|
|
Created:
10 years, 9 months ago by MAD Modified:
8 years, 8 months ago CC:
chromium-reviews, ben+cc_chromium.org Base URL:
svn://chrome-svn/chrome/trunk/src/ Visibility:
Public. |
DescriptionFixed a startup race condition.
Although the new test was written in a platform independent way, it is only added to the Widows specific portion of the ui_test target in the gyp file because it wasn't tried yet on the other platforms.
The bug was found and the fix was written in the windows specific version of the process singleton anyway... But if people working on the other platforms would like to try the test there, that would be great. :-)
BUG=9593
TEST=A new test have been created to validate this.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=40629
Patch Set 1 #
Total comments: 12
Patch Set 2 : '' #
Total comments: 16
Patch Set 3 : '' #Patch Set 4 : '' #
Total comments: 6
Patch Set 5 : '' #Patch Set 6 : '' #Patch Set 7 : '' #Patch Set 8 : '' #Patch Set 9 : '' #
Total comments: 2
Messages
Total messages: 17 (0 generated)
Drive-by with some test comments. http://codereview.chromium.org/661339/diff/1/3 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/1/3#newcode63 chrome/browser/process_singleton_win_uitest.cc:63: EXPECT_NE(static_cast<base::WaitableEvent*>(NULL), start_event); This should be definitely ASSERT_NE, otherwise we're going to kill the entire ui_tests binary on a NULL ptr dereference. http://codereview.chromium.org/661339/diff/1/3#newcode64 chrome/browser/process_singleton_win_uitest.cc:64: EXPECT_TRUE(start_event->Wait()); Shouldn't this be ASSERT_TRUE? http://codereview.chromium.org/661339/diff/1/3#newcode108 chrome/browser/process_singleton_win_uitest.cc:108: chrome_starter_threads_[i]->Start(); Shouldn't this check the return value? http://codereview.chromium.org/661339/diff/1/3#newcode138 chrome/browser/process_singleton_win_uitest.cc:138: for (size_t attempt = 0; attempt < kNbAttempts && !failed; ++attempt) { Please use SCOPED_TRACE for this loop. http://codereview.chromium.org/661339/diff/1/3#newcode145 chrome/browser/process_singleton_win_uitest.cc:145: for (size_t i = 0; i < kNbThreads; ++i) { Please use SCOPED_TRACE for this loop. http://codereview.chromium.org/661339/diff/1/3#newcode161 chrome/browser/process_singleton_win_uitest.cc:161: for (size_t i = 0; i < kNbThreads; ++i) Please use SCOPED_TRACE or just print more info in case of failure via <<.
Thanks for the drive by... Changes uploaded... BYE MAD http://codereview.chromium.org/661339/diff/1/3 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/1/3#newcode63 chrome/browser/process_singleton_win_uitest.cc:63: EXPECT_NE(static_cast<base::WaitableEvent*>(NULL), start_event); On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > This should be definitely ASSERT_NE, otherwise we're going to kill the entire > ui_tests binary on a NULL ptr dereference. Done. Thanks, I didn't think about that, so I changed a few more where ASSERT makes more sense then EXPECT... http://codereview.chromium.org/661339/diff/1/3#newcode64 chrome/browser/process_singleton_win_uitest.cc:64: EXPECT_TRUE(start_event->Wait()); On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > Shouldn't this be ASSERT_TRUE? Done. http://codereview.chromium.org/661339/diff/1/3#newcode108 chrome/browser/process_singleton_win_uitest.cc:108: chrome_starter_threads_[i]->Start(); On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > Shouldn't this check the return value? Done. http://codereview.chromium.org/661339/diff/1/3#newcode138 chrome/browser/process_singleton_win_uitest.cc:138: for (size_t attempt = 0; attempt < kNbAttempts && !failed; ++attempt) { On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > Please use SCOPED_TRACE for this loop. Done. Cool, I didn't know about this... Thanks again! :-) http://codereview.chromium.org/661339/diff/1/3#newcode145 chrome/browser/process_singleton_win_uitest.cc:145: for (size_t i = 0; i < kNbThreads; ++i) { On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > Please use SCOPED_TRACE for this loop. Done. http://codereview.chromium.org/661339/diff/1/3#newcode161 chrome/browser/process_singleton_win_uitest.cc:161: for (size_t i = 0; i < kNbThreads; ++i) On 2010/03/02 09:24:37, Paweł Hajdan Jr. wrote: > Please use SCOPED_TRACE or just print more info in case of failure via <<. Done.
Thanks, the fixes look good. I just spotted the Sleep I overlooked before. Could you comment on that, or preferably remove it? (it's probably non-trivial). http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); I missed this the first time, but it's going to be flaky. Are there some other choices here (e.g. wait for an event)?
http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote: > I missed this the first time, but it's going to be flaky. Are there some other > choices here (e.g. wait for an event)? Yeah, I know, I don't like it either... I couldn't find any event that confirms every process that we tried creating independently have be properly restarted... If I don't sleep here, when the test is running with the release build, some chrome.exe processes were left dangling. I could see them in the task manager (and thus, I couldn't relink chrome.exe without killing them). When I tried to attach them to the debugger, they terminated right away with a STATUS_DLL_NOT_FOUND error printed in the output window. If the test machines always kill all (potentially dangling) chrome.exe processes after execution, then we should be fine and I could remove it... Otherwise, I'm all ears for suggestions of other ways to avoid this... If I had a way of getting the new handles of the processes that are being spawned from the main browser process, then I could try to kill them, but I don't know if I have access to this from outside... Any suggestions?
http://codereview.chromium.org/661339/diff/6/7 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/6/7#newcode46 chrome/browser/process_singleton_win.cc:46: HANDLE only_me = ::CreateMutex(NULL, FALSE, L"Global\\ProcessSingleton"); Shouldn't the object be created in the "Local" session namespace? Is there any reason to do this globally? http://codereview.chromium.org/661339/diff/6/7#newcode48 chrome/browser/process_singleton_win.cc:48: DWORD result = ::WaitForSingleObject(only_me, INFINITE); Please remove the :: here and elsewhere to be consistent with the other calls to win32 apis in this file. http://codereview.chromium.org/661339/diff/6/7#newcode63 chrome/browser/process_singleton_win.cc:63: success = ::CloseHandle(only_me); If you like, ScopedHandle (included via base/scoped_handle.h) could be used here. http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode53 chrome/browser/process_singleton_win_uitest.cc:53: // TODO(port): For some reasons the LaunchApp call below fail even though micro-nit: reasons -> reason, fail -> fails http://codereview.chromium.org/661339/diff/6/8#newcode109 chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new ChromeStarter; If you wanted to get fancy, you could look into SetThreadAffinityMask() to get the threads scheduled on different cores. Don't bother if this test reliably reproduces the problem pre-patch though. http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); On 2010/03/02 14:59:11, mad1 wrote: > On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote: > > I missed this the first time, but it's going to be flaky. Are there some other > > choices here (e.g. wait for an event)? > > Yeah, I know, I don't like it either... I couldn't find any event that confirms > every process that we tried creating independently have be properly restarted... > > If I don't sleep here, when the test is running with the release build, some > chrome.exe processes were left dangling. I could see them in the task manager > (and thus, I couldn't relink chrome.exe without killing them). When I tried to > attach them to the debugger, they terminated right away with a > STATUS_DLL_NOT_FOUND error printed in the output window. > > If the test machines always kill all (potentially dangling) chrome.exe processes > after execution, then we should be fine and I could remove it... Otherwise, I'm > all ears for suggestions of other ways to avoid this... > > If I had a way of getting the new handles of the processes that are being > spawned from the main browser process, then I could try to kill them, but I > don't know if I have access to this from outside... > > Any suggestions? My vote: just kill them all by name. There are some killall functions in base somewhere.
I'll send an update of the patch once I confirm if it is worth adding some windows specific code to set the thread affinity. http://codereview.chromium.org/661339/diff/6/7 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/6/7#newcode46 chrome/browser/process_singleton_win.cc:46: HANDLE only_me = ::CreateMutex(NULL, FALSE, L"Global\\ProcessSingleton"); On 2010/03/02 15:06:14, robertshield wrote: > Shouldn't the object be created in the "Local" session namespace? Is there any > reason to do this globally? Hum... Good question... I used Global to mimic what was done in Upgrade::IsBrowserAlreadyRunning(), but I realize now how it is important for them to use Global but not for us... Thanks! http://codereview.chromium.org/661339/diff/6/7#newcode48 chrome/browser/process_singleton_win.cc:48: DWORD result = ::WaitForSingleObject(only_me, INFINITE); On 2010/03/02 15:06:14, robertshield wrote: > Please remove the :: here and elsewhere to be consistent with the other calls to > win32 apis in this file. Done. http://codereview.chromium.org/661339/diff/6/7#newcode63 chrome/browser/process_singleton_win.cc:63: success = ::CloseHandle(only_me); On 2010/03/02 15:06:14, robertshield wrote: > If you like, ScopedHandle (included via base/scoped_handle.h) could be used > here. Done. http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode53 chrome/browser/process_singleton_win_uitest.cc:53: // TODO(port): For some reasons the LaunchApp call below fail even though On 2010/03/02 15:06:14, robertshield wrote: > micro-nit: reasons -> reason, fail -> fails Done. http://codereview.chromium.org/661339/diff/6/8#newcode109 chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new ChromeStarter; On 2010/03/02 15:06:14, robertshield wrote: > If you wanted to get fancy, you could look into SetThreadAffinityMask() to get > the threads scheduled on different cores. Don't bother if this test reliably > reproduces the problem pre-patch though. That would be cool, because the the test doesn't unfortunately "reliably" reproduce the problem pre-patch... But I don't think we have a platform independent way of doing this... Unless I missed it somewhere... I'll try it and see if it is worth adding Windows Specific code in here... If it helps reproducing the problem more reliably... Maybe... http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); On 2010/03/02 15:06:14, robertshield wrote: > On 2010/03/02 14:59:11, mad1 wrote: > > On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote: > > > I missed this the first time, but it's going to be flaky. Are there some > other > > > choices here (e.g. wait for an event)? > > > > Yeah, I know, I don't like it either... I couldn't find any event that > confirms > > every process that we tried creating independently have be properly > restarted... > > > > If I don't sleep here, when the test is running with the release build, some > > chrome.exe processes were left dangling. I could see them in the task manager > > (and thus, I couldn't relink chrome.exe without killing them). When I tried to > > attach them to the debugger, they terminated right away with a > > STATUS_DLL_NOT_FOUND error printed in the output window. > > > > If the test machines always kill all (potentially dangling) chrome.exe > processes > > after execution, then we should be fine and I could remove it... Otherwise, > I'm > > all ears for suggestions of other ways to avoid this... > > > > If I had a way of getting the new handles of the processes that are being > > spawned from the main browser process, then I could try to kill them, but I > > don't know if I have access to this from outside... > > > > Any suggestions? > > My vote: just kill them all by name. There are some killall functions in base > somewhere. The only KillAll function I found was in the Chrome Frame tests stuff: chrome_frame\test_utils.cc... And I wasn't sure if a ui_test was allowed to kill all running chrome.exe process... I know I wouldn't like it to kill all MY chromes when I run it on my machine...
Thanks, everything looks good, I think we just need better clean up of the lingering Chrome processes that Pawel mentioned. One idea on how to do it below: http://codereview.chromium.org/661339/diff/6/8 File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/6/8#newcode109 chrome/browser/process_singleton_win_uitest.cc:109: chrome_starters_[i] = new ChromeStarter; On 2010/03/02 16:45:57, mad1 wrote: > On 2010/03/02 15:06:14, robertshield wrote: > > If you wanted to get fancy, you could look into SetThreadAffinityMask() to get > > the threads scheduled on different cores. Don't bother if this test reliably > > reproduces the problem pre-patch though. > > That would be cool, because the the test doesn't unfortunately "reliably" > reproduce the problem pre-patch... > But I don't think we have a platform independent way of doing this... Unless I > missed it somewhere... > > I'll try it and see if it is worth adding Windows Specific code in here... If it > helps reproducing the problem more reliably... Maybe... ok, np. just a suggestion ;-) http://codereview.chromium.org/661339/diff/6/8#newcode221 chrome/browser/process_singleton_win_uitest.cc:221: PlatformThread::Sleep(100); On 2010/03/02 16:45:57, mad1 wrote: > On 2010/03/02 15:06:14, robertshield wrote: > > On 2010/03/02 14:59:11, mad1 wrote: > > > On 2010/03/02 14:42:44, Paweł Hajdan Jr. wrote: > > > > I missed this the first time, but it's going to be flaky. Are there some > > other > > > > choices here (e.g. wait for an event)? > > > > > > Yeah, I know, I don't like it either... I couldn't find any event that > > confirms > > > every process that we tried creating independently have be properly > > restarted... > > > > > > If I don't sleep here, when the test is running with the release build, some > > > chrome.exe processes were left dangling. I could see them in the task > manager > > > (and thus, I couldn't relink chrome.exe without killing them). When I tried > to > > > attach them to the debugger, they terminated right away with a > > > STATUS_DLL_NOT_FOUND error printed in the output window. > > > > > > If the test machines always kill all (potentially dangling) chrome.exe > > processes > > > after execution, then we should be fine and I could remove it... Otherwise, > > I'm > > > all ears for suggestions of other ways to avoid this... > > > > > > If I had a way of getting the new handles of the processes that are being > > > spawned from the main browser process, then I could try to kill them, but I > > > don't know if I have access to this from outside... > > > > > > Any suggestions? > > > > My vote: just kill them all by name. There are some killall functions in base > > somewhere. > > The only KillAll function I found was in the Chrome Frame tests stuff: > chrome_frame\test_utils.cc... > > And I wasn't sure if a ui_test was allowed to kill all running chrome.exe > process... I know I wouldn't like it to kill all MY chromes when I run it on my > machine... One alternative is to run Chrome with a custom --user-data-dir flag (or any other flag that gets propagated to child processes) and then use KillAllNamedProcessesWithArgument() defined in test_utils.cc. CF uses that to kill all CF Chromes during its tests without being so rude as to kill all Chromes on the system.
I am a little bit lost here. http://codereview.chromium.org/661339/diff/1015/19 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/1015/19#newcode47 chrome/browser/process_singleton_win.cc:47: ScopedHandle only_me(CreateMutex(NULL, FALSE, L"Local\\ProcessSingleton")); ProcessSingleton feels awfully generic to me. Maybe it clashes with some other product. How about mixing chrome into it? I don't think you need 'Local\\' to create an object in this session. http://codereview.chromium.org/661339/diff/1015/19#newcode48 chrome/browser/process_singleton_win.cc:48: DCHECK(only_me.Get() != NULL) << "GetLastError = " << GetLastError(); CreateMutex(.., FALSE, ..) that means you don't own the mutex , which in turn means that maybe you did not created it but just open it. http://codereview.chromium.org/661339/diff/1015/19#newcode62 chrome/browser/process_singleton_win.cc:62: BOOL success = ReleaseMutex(only_me); I don't see you acquiring the mutex...?
Uploaded an updated version and answered comments. Thanks! BYE MAD http://codereview.chromium.org/661339/diff/1015/19 File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/1015/19#newcode47 chrome/browser/process_singleton_win.cc:47: ScopedHandle only_me(CreateMutex(NULL, FALSE, L"Local\\ProcessSingleton")); On 2010/03/02 19:15:58, cpu wrote: > ProcessSingleton feels awfully generic to me. Maybe it clashes with some other > product. How about mixing chrome into it? > > I don't think you need 'Local\\' to create an object in this session. Good point about the name, I added the AppId in there... As for the local, I thought it would ensure that it is only for our session in case a multi-user remote desktop session is trying to start chrome at the same time we are trying to do it. But I'm not completely sure here, the documentation says that the current session is the Default, but it also says that: 'client processes can use the "Local\" prefix to explicitly create an object in their session namespace' So I'm open to remove the "Local\" from there if you really want me to remove it, but I don't think it hurts to have it there. Right? Thanks! http://codereview.chromium.org/661339/diff/1015/19#newcode48 chrome/browser/process_singleton_win.cc:48: DCHECK(only_me.Get() != NULL) << "GetLastError = " << GetLastError(); On 2010/03/02 19:15:58, cpu wrote: > CreateMutex(.., FALSE, ..) that means you don't own the mutex , which in turn > means that maybe you did not created it but just open it. Yes, this is on purpose as suggested in the MSDN doc. When you use a named Mutex, it is clearer to not try to get the initial ownership, since it isn't guaranteed. CreateMutex will return a non-owned mutex if it was already created and still owned. http://codereview.chromium.org/661339/diff/1015/19#newcode62 chrome/browser/process_singleton_win.cc:62: BOOL success = ReleaseMutex(only_me); On 2010/03/02 19:15:58, cpu wrote: > I don't see you acquiring the mutex...? You acquire a Mutex by waiting on it... Again... As documented :-) I added comments to make this clearer... Thanks!
lgtm
Cool Thanks! BYE MAD On Tue, Mar 2, 2010 at 5:15 PM, <cpu@chromium.org> wrote: > lgtm > > > http://codereview.chromium.org/661339 >
OK, can you guys (Robert and/or Pawel) take one more look at the unit test? I think I addressed all your issues now... Thanks! BYE MAD
LGTM, thanks for the work on making this solid!
LGTM
http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single... File chrome/browser/process_singleton_win_uitest.cc (right): http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single... chrome/browser/process_singleton_win_uitest.cc:258: pending_starters.empty(); Vectors are cleared with "clear()", empty() just returns a bool if the vector is empty or not. Fix here: https://chromiumcodereview.appspot.com/10034006/
Drive-by, did not review in detail. http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single... File chrome/browser/process_singleton_win.cc (right): http://codereview.chromium.org/661339/diff/2006/chrome/browser/process_single... chrome/browser/process_singleton_win.cc:58: DWORD result = WaitForSingleObject(only_me, INFINITE); WFSO can also return WAIT_ABANDONED for success. This is in the case when the previous owner of the mutex died or was killed before releasing it. See http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx I believe you can test this condition by grabbing the mutex, then closing that handle. This should implicitly release it into the abandoned state. |