|
|
Chromium Code Reviews|
Created:
11 years, 7 months ago by Antoine Labour Modified:
9 years, 7 months ago CC:
chromium-reviews_googlegroups.com Visibility:
Public. |
Descriptionlinux: fix main loop issues with windowless plugins
In certain conditions (e.g. NPN_InvalidateRect taking long), the flash plugin runs its own event loop before returning. It looks roughly like this:
while (gtk_events_pending()) {
gtk_main_iteration();
}
The problem is that our worker source used to force gtk_events_pending() to be always true, because the Check handler always returned TRUE (and that also made HandleDispatch to always run). That caused flash animations to stop and other bad behavior (100% CPU).
This CL changes the Check function to return TRUE only if HandleDispatch should be run (i.e., more_work_is_plausible set to true), while also checking whether the loop was woken up, or the delayed work timer expired.
HandlePrepare was forcing more_work_is_plausible to be always true apparently to avoid starvation. I removed this, I'm not sure why it is needed. If we get starvation issue, we should raise the priority of WorkSource (for reference, most events run at prio 0, except redraws that run at prio 120. WorkSource runs at prio 200). Another possibility is to force more_work_is_plausible but only every n calls to HandlePrepare.
BUG=8202, 11843, 12278
Patch Set 1 #
Total comments: 6
Patch Set 2 : Make message loop follow more closely the Windows one #
Total comments: 12
Patch Set 3 : Fixing comments #
Messages
Total messages: 13 (0 generated)
Adding Darin because he's the messageloop ninja, and estade because something like this may fix our file pickers (!?)
I will think on this some more. If we have a good repro case (for example file picker), that would be useful. DSH originally wrote this code. I did something similar to you, found a problem, debugged into it, and changed a bunch of stuff. That lead to another 5 super ugly message loop bugs. This code is really complicated, and from looking at your change I'd guess you have a 90% chance of introducing misbehavior. http://codereview.chromium.org/115812/diff/1/2 File base/message_pump_glib.cc (left): http://codereview.chromium.org/115812/diff/1/2#oldcode171 Line 171: state_->more_work_is_plausible = true; In general you should be very skeptical of removing code that has a 6 line comment above it. I vaguely remember writing this, and I remember it be very important.
This is probably more work than you wanted, but we need some tests. We need a lot of tests for the tricky corners that exist now, a test that can show the current problem, and that your change fixes it without regressing any of the other tricky bits. Thanks -- dean On 2009/05/27 06:52:57, Dean McNamee wrote: > I will think on this some more. If we have a good repro case (for example file > picker), that would be useful. > > DSH originally wrote this code. I did something similar to you, found a > problem, debugged into it, and changed a bunch of stuff. That lead to another 5 > super ugly message loop bugs. This code is really complicated, and from looking > at your change I'd guess you have a 90% chance of introducing misbehavior. > > http://codereview.chromium.org/115812/diff/1/2 > File base/message_pump_glib.cc (left): > > http://codereview.chromium.org/115812/diff/1/2#oldcode171 > Line 171: state_->more_work_is_plausible = true; > In general you should be very skeptical of removing code that has a 6 line > comment above it. I vaguely remember writing this, and I remember it be very > important.
After a deeper look at the code, the previous changes, and the GTK/Glib sources, I think this approach is workable. I think |more_work_is_plausible| is now sort of a bogus name, since we're using it for more state than that. Maybe a name like |do_dispatch|, |needs_dispatch|, etc, would better reflect what a bunch of this code is doing. We are trying to force Dispatch() to happen in the future, even if it's not happening now. This is very important logic to be clear since it's the key to not starving / getting stuck. I appreciate you debugging the problems with Flash, and I now understand exactly what's happening. It's likely this is related to our file picker problems too. I would update the description to be very clear about what this change does, and why. Basically: Sometimes foreign code (such as plugins) will manually pump the loop dry of all pending events. The way the event loop was previously structured, we always had a "pending event", and relied on the timeout value to keep the pump from spinning. This doesn't work in the manually pumping case since the timeout value is ignored. Previously we already returned TRUE from Check(), which indicates that there is an event on the queue. When you are manually pumping the queue, this means we're indicating that there is always a event. We must return FALSE sometime from our Check() to indicate that manual pumping can stop. Thinking and writing about this more makes me a bit worried, that previously we always tried to DoWork() no matter what. Now we are deciding whether we DoWork based on whether it's plausible that there is work. Previously this was only a hint, but now it's actually an enforcement. We don't really have any good tests for this code. http://codereview.chromium.org/115812/diff/1/2 File base/message_pump_glib.cc (left): http://codereview.chromium.org/115812/diff/1/2#oldcode171 Line 171: state_->more_work_is_plausible = true; Ok, my memory is a bit stale, but I think this is more or less what happened. Originally I committed this code, along with using the builtin glib wakeup pipe for ScheduleWork(). When we did that, there was no guarantee that our Check/Dispatch would be called. This is because entire priority levels are processed at once, the highest level first. So basically what would happen is that we wouldn't get a chance to have |more_work_is_plausible| reset. By forcing it to true, we would keep trying until it was properly set in Dispatch(). Basically we just ensured that if Prepare was called but Dispatch wasn't, then we keep returning true from Prepare until eventually Dispatch was called. However, in a later changelist (because of some other bugs), I moved us to our own dedicated wakeup pipe. I thought this through a few times, and it seems like it's possible this isn't necessary anymore. However, I would just keep it, it's not really hurting anything. The only case it will make a difference is if Prepare is called multiple times in a row without a Dispatch. It seems like that shouldn't happen anymore because the wakeup pipe will still have data on it, but protecting against it is still a good idea. I would leave this code in. http://codereview.chromium.org/115812/diff/1/2 File base/message_pump_glib.cc (right): http://codereview.chromium.org/115812/diff/1/2#newcode54 Line 54: // timeout, and check. It will not call dispatch. I've since dug a bit into this code. What you're saying is exactly correct, but we should elaborate more here to explain what's going on for the next guy. In particular, the GTK docs give this style code as an example: while (gtk_events_pending ()) gtk_main_iteration (); gtk_events_pending just calls g_main_context_pending(). this calls an internal method (g_main_context_iterate) and tells it not to block and not to dispatch. not-blocking just sets timeout to 0. This means Prepare will be called, and then Check. The return of Check is used as whether there are "pending" sources. http://codereview.chromium.org/115812/diff/1/2#newcode71 Line 71: // Only return TRUE if Dispatch should be called. I would move this comment up to above the WorkSourceCheck line. It would also be nice just to return, it should be ok to implicit boolean -> gboolean, but whatever, maybe ? : is better. http://codereview.chromium.org/115812/diff/1/2#newcode182 Line 182: state_->more_work_is_plausible = true; You need a comment here. You need to be clear that we're handling the case that calling Check does not mean calling Dispatch. So since we're reading from the pipe and taking away the signaled state, we need to make sure that even if Dispatch is not, and Prepare is called again, that future calls to Prepare and Check will always return TRUE until Dispatch has been called. http://codereview.chromium.org/115812/diff/1/2#newcode183 Line 183: } I would add a: // The |more_work_is_plausible| flag is basically telling us // we need to do the processing in Dispatch(). It could // have been set above because of the wakeup pipe, or // because a previous call to Check() and we have // not yet made it to Dispatch(). if (state_->more_work_is_plausible) return TRUE; (This is just to shortcut GetTimeIntervalMilliseconds).
With this patch, the file picker trouble goes away (at least on the laptop I've been using to test). Antoine you are my hero.
This looks better. We still need some tests :( I am a bit confused, it seems like now we well call DoWork() twice, once from our sources, and again in our own pump iteration? Should we only return TRUE from Check when we are running a nested loop (we can test this with a g_main_depth?). http://codereview.chromium.org/115812/diff/1005/1006 File base/message_pump_glib.cc (right): http://codereview.chromium.org/115812/diff/1005/1006#newcode66 Line 66: // - return true iff any of prepare() or check() returned true if not iff (I realize it's not a typo, but we don't ever say "if and only if" when we just mean if). http://codereview.chromium.org/115812/diff/1005/1006#newcode78 Line 78: // For the GLib pump we try to follow the Windows UI pump model: Just a note, the Windows UI pump model is totally wack for a bunch of Windows specific reasons. I agree we should match Windows if it makes sense for us, but be very weary of that code, it is completely insane. http://codereview.chromium.org/115812/diff/1005/1006#newcode81 Line 81: // event pumps period / capitalization. http://codereview.chromium.org/115812/diff/1005/1006#newcode83 Line 83: // loop, around event handling. capitalization http://codereview.chromium.org/115812/diff/1005/1007 File base/message_pump_glib.h (right): http://codereview.chromium.org/115812/diff/1005/1007#newcode74 Line 74: bool has_work; This is probably not entirely true that we "have work", before it was "work is plausible", and now it's probably "should try to see if there is work".
btw, ignore my comment about has_work, I wrote it before reading the rest of the review. On 2009/05/29 09:48:22, Dean McNamee wrote: > This looks better. We still need some tests :( > > I am a bit confused, it seems like now we well call DoWork() twice, once from > our sources, and again in our own pump iteration? > > Should we only return TRUE from Check when we are running a nested loop (we can > test this with a g_main_depth?). > > http://codereview.chromium.org/115812/diff/1005/1006 > File base/message_pump_glib.cc (right): > > http://codereview.chromium.org/115812/diff/1005/1006#newcode66 > Line 66: // - return true iff any of prepare() or check() returned true > if not iff (I realize it's not a typo, but we don't ever say "if and only if" > when we just mean if). > > http://codereview.chromium.org/115812/diff/1005/1006#newcode78 > Line 78: // For the GLib pump we try to follow the Windows UI pump model: > Just a note, the Windows UI pump model is totally wack for a bunch of Windows > specific reasons. I agree we should match Windows if it makes sense for us, but > be very weary of that code, it is completely insane. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode81 > Line 81: // event pumps > period / capitalization. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode83 > Line 83: // loop, around event handling. > capitalization > > http://codereview.chromium.org/115812/diff/1005/1007 > File base/message_pump_glib.h (right): > > http://codereview.chromium.org/115812/diff/1005/1007#newcode74 > Line 74: bool has_work; > This is probably not entirely true that we "have work", before it was "work is > plausible", and now it's probably "should try to see if there is work".
I should say that I think I like the simplification of moving a bunch of logic around the iteration, and controlling whether it blocks. I think that was pretty smart and cleans things up. Now we just need to make sure that we understand what the sources are supposed to do, what our own pumping does, and that they don't overlap, or overlap on purpose in a clear, correct, and documented way. Thanks again
Looks good to me, but I'm no expert on the glib runloop stuff. http://codereview.chromium.org/115812/diff/1005/1006 File base/message_pump_glib.cc (right): http://codereview.chromium.org/115812/diff/1005/1006#newcode200 Line 200: nit: might be nice to eliminate this newline here for consistency with the code above. might help readability a tad. http://codereview.chromium.org/115812/diff/1005/1006#newcode257 Line 257: // NOTE: on Windows at this point we would call ScheduleWork. But here, Windows does that in the corresponding code to support being called on a background thread. I'm guessing this function can never be called on a background thread, then?
I'd like to make progress on this patch. I'm not sure if Dean's last comments are positive or not. He's on vacation until tomorrow. Antoine: Do you have another version that addresses Dean's comments cooking? Does it pass all the tests?
http://codereview.chromium.org/115812/diff/1005/1006 File base/message_pump_glib.cc (right): http://codereview.chromium.org/115812/diff/1005/1006#newcode66 Line 66: // - return true iff any of prepare() or check() returned true On 2009/05/29 09:48:22, Dean McNamee wrote: > if not iff (I realize it's not a typo, but we don't ever say "if and only if" > when we just mean if). Done. http://codereview.chromium.org/115812/diff/1005/1006#newcode81 Line 81: // event pumps On 2009/05/29 09:48:22, Dean McNamee wrote: > period / capitalization. Done. http://codereview.chromium.org/115812/diff/1005/1006#newcode83 Line 83: // loop, around event handling. On 2009/05/29 09:48:22, Dean McNamee wrote: > capitalization Done. http://codereview.chromium.org/115812/diff/1005/1006#newcode200 Line 200: On 2009/06/01 14:07:43, darin wrote: > nit: might be nice to eliminate this newline here for consistency with the code > above. might help readability a tad. Done. http://codereview.chromium.org/115812/diff/1005/1006#newcode257 Line 257: // NOTE: on Windows at this point we would call ScheduleWork. But here, On 2009/06/01 14:07:43, darin wrote: > Windows does that in the corresponding code to support being called on a > background thread. I'm guessing this function can never be called on a > background thread, then? This function can only be called from the UI thread. has_work is only accessed from that thread. I added a reference to what I was talking about: MessagePumpForUI::HandleWorkMessage. I'm doubtful that it can be called from another thread though (since it calls DoWork, that would be a problem).
LGTM w/ the following conditions: - You're on the hook for some more / better tests. - When you write the tests, make sure they behave the same before and after this patch. It is easy to write tests for the "fixed" code that hide regressions. On 2009/06/01 21:15:17, Antoine Labour wrote: > http://codereview.chromium.org/115812/diff/1005/1006 > File base/message_pump_glib.cc (right): > > http://codereview.chromium.org/115812/diff/1005/1006#newcode66 > Line 66: // - return true iff any of prepare() or check() returned true > On 2009/05/29 09:48:22, Dean McNamee wrote: > > if not iff (I realize it's not a typo, but we don't ever say "if and only if" > > when we just mean if). > > Done. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode81 > Line 81: // event pumps > On 2009/05/29 09:48:22, Dean McNamee wrote: > > period / capitalization. > > Done. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode83 > Line 83: // loop, around event handling. > On 2009/05/29 09:48:22, Dean McNamee wrote: > > capitalization > > Done. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode200 > Line 200: > On 2009/06/01 14:07:43, darin wrote: > > nit: might be nice to eliminate this newline here for consistency with the > code > > above. might help readability a tad. > > Done. > > http://codereview.chromium.org/115812/diff/1005/1006#newcode257 > Line 257: // NOTE: on Windows at this point we would call ScheduleWork. But > here, > On 2009/06/01 14:07:43, darin wrote: > > Windows does that in the corresponding code to support being called on a > > background thread. I'm guessing this function can never be called on a > > background thread, then? > > This function can only be called from the UI thread. has_work is only accessed > from that thread. > I added a reference to what I was talking about: > MessagePumpForUI::HandleWorkMessage. I'm doubtful that it can be called from > another thread though (since it calls DoWork, that would be a problem). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
