|
|
DescriptionProcess pending delayed tasks in kMsgHaveWork
With 1 win32 timer we can only get a callback every 10ms, and since the callback only processes one message, that means the queue could keep growing if delayed tasks are posted faster than that rate, even if they're processed very quickly. To prevent that, schedule a kMsgHaveWork if there's 0ms until the next delayed task should run, and run a delayed task in the kMsgHaveWork handler.
BUG=454333
TEST=chrome resizes smoothly
Committed: https://crrev.com/9316d3b983530169e6c50d61fa87eeebecd8dcfd
Cr-Commit-Position: refs/heads/master@{#330816}
Patch Set 1 #Patch Set 2 : #
Total comments: 3
Patch Set 3 : #
Messages
Total messages: 30 (6 generated)
jbauman@chromium.org changed reviewers: + thakis@chromium.org
This seems to work, but I'm not certain it's the correct approach. Also let me know if there are any reviewers you'd want for this who currently aren't out-of-office.
danakj@chromium.org changed reviewers: + cpu@chromium.org
+cpu
cpu@chromium.org changed reviewers: + brucedawson@chromium.org, jamesr@chromium.org
iiinteresting. Lets see if james wants to opine here. Also bruce was lamenting something around here today This can have dramatic change in when the tasks are dispatched.
One risk here is if we end up always having a posted message in the queue we could completely starve on-demand messages like WM_MOUSEMOVE/WM_TIMER/WM_PAINT. Starving mouse moves could break UI in some cases and starving WM_PAINT might bust things up when not using Aero. Could you check if we still get these messages (or if anything is obviously broken if we don't) in the case where there's always a delayed task ready to go with and without Aero?
While Windows sets uElapse to a minimum of 10 we will sometimes get even lower resolution (if the system timer interval is at its default of 15.625 ms, which does really happen in some cases). FWIW. We already have anti-starving logic in the message pump IIRC? Presumably this anti-starving logic can now get blocked for a bit longer if several tasks come due at the same time, but there should still be time for WM_PAINT to sneak through between each batch of messages. My concern (that cpu@ alluded to) was that MessagePumpDefault::Run ends up getting called when 'delay' is between 0.0 and 1.0 ms. Because the delay is non-zero it doesn't do the work immediately, but because the delay is less than 1.0 it gets truncated to 0 ms, leading to a busy-wait spin that burns CPU cycles. It's unclear whether rounding up or down is safest in this context. Separate change.
On 2015/05/13 17:09:18, jamesr wrote: > One risk here is if we end up always having a posted message in the queue we > could completely starve on-demand messages like WM_MOUSEMOVE/WM_TIMER/WM_PAINT. > Starving mouse moves could break UI in some cases and starving WM_PAINT might > bust things up when not using Aero. Could you check if we still get these > messages (or if anything is obviously broken if we don't) in the case where > there's always a delayed task ready to go with and without Aero? ProcessPumpReplacementMessage() already handles non-starvation of WM_TIMER and WM_PAINT, so that should be fine. There may be issues with WM_MOUSEMOVE, so I'll check and see.
On 2015/05/13 18:42:02, jbauman wrote: > On 2015/05/13 17:09:18, jamesr wrote: > > One risk here is if we end up always having a posted message in the queue we > > could completely starve on-demand messages like > WM_MOUSEMOVE/WM_TIMER/WM_PAINT. > > Starving mouse moves could break UI in some cases and starving WM_PAINT might > > bust things up when not using Aero. Could you check if we still get these > > messages (or if anything is obviously broken if we don't) in the case where > > there's always a delayed task ready to go with and without Aero? > > ProcessPumpReplacementMessage() already handles non-starvation of WM_TIMER and > WM_PAINT, so that should be fine. There may be issues with WM_MOUSEMOVE, so I'll > check and see. The MessageLoop::os_modal_loop logic looks really busted - it's only set by NPAPI plugin code but we enter os modal loops all the time in the browser due to us or someone else calling SendMessage(.., WM_SYSCOMMAND, ...) etc
On 2015/05/13 19:24:49, jamesr wrote: > On 2015/05/13 18:42:02, jbauman wrote: > > On 2015/05/13 17:09:18, jamesr wrote: > > > One risk here is if we end up always having a posted message in the queue we > > > could completely starve on-demand messages like > > WM_MOUSEMOVE/WM_TIMER/WM_PAINT. > > > Starving mouse moves could break UI in some cases and starving WM_PAINT > might > > > bust things up when not using Aero. Could you check if we still get these > > > messages (or if anything is obviously broken if we don't) in the case where > > > there's always a delayed task ready to go with and without Aero? > > > > ProcessPumpReplacementMessage() already handles non-starvation of WM_TIMER and > > WM_PAINT, so that should be fine. There may be issues with WM_MOUSEMOVE, so > I'll > > check and see. > > The MessageLoop::os_modal_loop logic looks really busted - it's only set by > NPAPI plugin code but we enter os modal loops all the time in the browser due to > us or someone else calling SendMessage(.., WM_SYSCOMMAND, ...) etc Based on https://chromium.googlesource.com/chromium/src/+/2ec01fe90944d9489cc3949c80fc... that's really only supposed to apply to nested loops due to modal dialogs, which AFAIK only NPAPI plugins would do in chrome. In resize loops or otherwise, Chrome should handle every event.
On 2015/05/13 19:41:19, jbauman wrote: > On 2015/05/13 19:24:49, jamesr wrote: > > On 2015/05/13 18:42:02, jbauman wrote: > > > On 2015/05/13 17:09:18, jamesr wrote: > > > > One risk here is if we end up always having a posted message in the queue > we > > > > could completely starve on-demand messages like > > > WM_MOUSEMOVE/WM_TIMER/WM_PAINT. > > > > Starving mouse moves could break UI in some cases and starving WM_PAINT > > might > > > > bust things up when not using Aero. Could you check if we still get these > > > > messages (or if anything is obviously broken if we don't) in the case > where > > > > there's always a delayed task ready to go with and without Aero? > > > > > > ProcessPumpReplacementMessage() already handles non-starvation of WM_TIMER > and > > > WM_PAINT, so that should be fine. There may be issues with WM_MOUSEMOVE, so > > I'll > > > check and see. > > > > The MessageLoop::os_modal_loop logic looks really busted - it's only set by > > NPAPI plugin code but we enter os modal loops all the time in the browser due > to > > us or someone else calling SendMessage(.., WM_SYSCOMMAND, ...) etc > > Based on > https://chromium.googlesource.com/chromium/src/+/2ec01fe90944d9489cc3949c80fc... > that's really only supposed to apply to nested loops due to modal dialogs, which > AFAIK only NPAPI plugins would do in chrome. In resize loops or otherwise, > Chrome should handle every event. OK - in that case the ProcessPumpReplacementMessage() should pull all messages out. We could end up peeking out way more WM_MOUSEMOVEs than before, I guess, or a lot more WM_PAINTs if somebody is sending invalidations
jar@chromium.org changed reviewers: + jar@chromium.org
Drive-by comment (invited by cpu). YMMV. https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... File base/message_loop/message_pump_win.cc (right): https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... base/message_loop/message_pump_win.cc:325: state_->delegate->DoDelayedWork(&delayed_work_time_); Note that you *might* effectively double the amount of work (number of tasks) done during each HandleWorkMessage. Worse yet, it might actually swell up the regular task queue during an "overload," rather than "do less work" when we fall behind. The cases where this is crucial are indeed very sensitive to resulting jank (due to CPU usage). The only reason (I think) for this HandleWorkMessage method to run is that we're in a nested (native?) message loop, and not getting to run our traditional list of chores (which do include both a shot at regular tasks, as well as delayed tasks). In that nested-loop situation, it is commonly the case that the OS really wants to tightly control responsiveness (which is why they ran a tight native loop)... so we only very intermittently run a Chrome Task. We deliberately allow them to get progressively more cycles (proportionately) if they are processing a pile of native queue messages. Perhaps we should change lines 323/324 to: if (state_->delegate->DoWork()) { ScheduleWork(); return; // We did absolutely minimal amount of work. } // Fallthrough to see if DelayedWork needs to be done. You could argue that we should randomly try DoDelayedWork vs DoWork... and perhaps that could be crafted in here. IMO, a more common case would be that a delayed task fires, and it posts a bunch of tasks (or tasks that post tasks), and it is probably better to handle most all of those "first" (or at least ASAP), rather than doing some more delayed tasks "sooner" and perhaps REALLY growing our regular task queue. https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... base/message_loop/message_pump_win.cc:343: // A bit gratuitous to set delayed_work_time_ again, but oh well. The CL is plausible.... but I'm not a fan of replicated code (line 327 and 343). I'm not sure of how or why to refactor... but certainly copying a confusing comment is questionable. Can you clarify the meaning of the comment?
On 2015/05/13 20:01:21, jar (doing other things) wrote: > Drive-by comment (invited by cpu). YMMV. > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > File base/message_loop/message_pump_win.cc (right): > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > base/message_loop/message_pump_win.cc:325: > state_->delegate->DoDelayedWork(&delayed_work_time_); > Note that you *might* effectively double the amount of work (number of tasks) > done during each HandleWorkMessage. Worse yet, it might actually swell up the > regular task queue during an "overload," rather than "do less work" when we fall > behind. The cases where this is crucial are indeed very sensitive to resulting > jank (due to CPU usage). > > The only reason (I think) for this HandleWorkMessage method to run is that we're > in a nested (native?) message loop, and not getting to run our traditional list > of chores (which do include both a shot at regular tasks, as well as delayed > tasks). In that nested-loop situation, it is commonly the case that the OS > really wants to tightly control responsiveness (which is why they ran a tight > native loop)... so we only very intermittently run a Chrome Task. I'm not sure I buy this premise - we end up running a native nested message loop for very simple situations like resizing/dragging a window or showing certain system UI dialogs (like the certificate details dialog, iirc) or showing system menus. These aren't particularly tight loops.
On 2015/05/13 20:01:21, jar (doing other things) wrote: > Drive-by comment (invited by cpu). YMMV. > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > File base/message_loop/message_pump_win.cc (right): > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > base/message_loop/message_pump_win.cc:325: > state_->delegate->DoDelayedWork(&delayed_work_time_); > Note that you *might* effectively double the amount of work (number of tasks) > done during each HandleWorkMessage. Worse yet, it might actually swell up the > regular task queue during an "overload," rather than "do less work" when we fall > behind. The cases where this is crucial are indeed very sensitive to resulting > jank (due to CPU usage). The behavior of ScheduleWork is the same (and ScheduleDelayedWork is just calling into that), it's just that we can process one delayed task in addition to the regular task. I don't think this will otherwise cause the windows native message queue to swell up, as we're still limited to one kMsgHaveWork outstanding at a time. > > The only reason (I think) for this HandleWorkMessage method to run is that we're > in a nested (native?) message loop, and not getting to run our traditional list > of chores (which do include both a shot at regular tasks, as well as delayed > tasks). In that nested-loop situation, it is commonly the case that the OS > really wants to tightly control responsiveness (which is why they ran a tight > native loop)... so we only very intermittently run a Chrome Task. We > deliberately allow them to get progressively more cycles (proportionately) if > they are processing a pile of native queue messages. > Fair enough, I just think that delayed tasks that are scheduled to occur at or before the current time should have approximately the same priority as chrome tasks that are scheduled to run, and shouldn't be starved into practical nonexistence.
for completeness https://code.google.com/p/chromium/issues/detail?id=487724
On 2015/05/13 20:14:28, jbauman wrote: > On 2015/05/13 20:01:21, jar (doing other things) wrote: > > Drive-by comment (invited by cpu). YMMV. > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > File base/message_loop/message_pump_win.cc (right): > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > base/message_loop/message_pump_win.cc:325: > > state_->delegate->DoDelayedWork(&delayed_work_time_); > > Note that you *might* effectively double the amount of work (number of tasks) > > done during each HandleWorkMessage. Worse yet, it might actually swell up the > > regular task queue during an "overload," rather than "do less work" when we > fall > > behind. The cases where this is crucial are indeed very sensitive to > resulting > > jank (due to CPU usage). > > The behavior of ScheduleWork is the same (and ScheduleDelayedWork is just > calling into that), it's just that we can process one delayed task in addition > to the regular task. I don't think this will otherwise cause the windows native > message queue to swell up, as we're still limited to one kMsgHaveWork > outstanding at a time. > > > > The only reason (I think) for this HandleWorkMessage method to run is that > we're > > in a nested (native?) message loop, and not getting to run our traditional > list > > of chores (which do include both a shot at regular tasks, as well as delayed > > tasks). In that nested-loop situation, it is commonly the case that the OS > > really wants to tightly control responsiveness (which is why they ran a tight > > native loop)... so we only very intermittently run a Chrome Task. We > > deliberately allow them to get progressively more cycles (proportionately) if > > they are processing a pile of native queue messages. > > > Fair enough, I just think that delayed tasks that are scheduled to occur at or > before the current time should have approximately the same priority as chrome > tasks that are scheduled to run, and shouldn't be starved into practical > nonexistence. As I understand the problem-case, there is some sub-system posting a LOT of delayed tasks, but it isn't worried about whether the delayed tasks are "falling behind." This action causes the delayed task queue to grow without bound <sigh>. IMO, when posting a series of delayed tasks, it is better to post-one, and when that runs, make a decision of what to post next (and when!!). It is a "bad idea" to "just post" an endless series of delayed tasks. That seems to be the case you are handling.... and I don't think we should (or can) handle such sub-systems well. Said another way, this sub-system is a bad citizen, responding improperly (or not at all?) to congestion (growing queues). Congestion avoidance can't be resolved in the message loop... it must be resolved in the source (sub-system). I agree, that we shouldn't "starve into practical nonexistence." ...but the contract with a delayed task is that it will be run "at or after" the delay has expired. We are surely fulfilling that. As a result, tasks can take a while to run... and sub-systems that keep posting constantly, independent of when the "similar" or "related" tasks are run, are probably at fault.
On 2015/05/13 21:10:01, jar (doing other things) wrote: > On 2015/05/13 20:14:28, jbauman wrote: > > On 2015/05/13 20:01:21, jar (doing other things) wrote: > > > Drive-by comment (invited by cpu). YMMV. > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > File base/message_loop/message_pump_win.cc (right): > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > base/message_loop/message_pump_win.cc:325: > > > state_->delegate->DoDelayedWork(&delayed_work_time_); > > > Note that you *might* effectively double the amount of work (number of > tasks) > > > done during each HandleWorkMessage. Worse yet, it might actually swell up > the > > > regular task queue during an "overload," rather than "do less work" when we > > fall > > > behind. The cases where this is crucial are indeed very sensitive to > > resulting > > > jank (due to CPU usage). > > > > The behavior of ScheduleWork is the same (and ScheduleDelayedWork is just > > calling into that), it's just that we can process one delayed task in addition > > to the regular task. I don't think this will otherwise cause the windows > native > > message queue to swell up, as we're still limited to one kMsgHaveWork > > outstanding at a time. > > > > > > The only reason (I think) for this HandleWorkMessage method to run is that > > we're > > > in a nested (native?) message loop, and not getting to run our traditional > > list > > > of chores (which do include both a shot at regular tasks, as well as delayed > > > tasks). In that nested-loop situation, it is commonly the case that the OS > > > really wants to tightly control responsiveness (which is why they ran a > tight > > > native loop)... so we only very intermittently run a Chrome Task. We > > > deliberately allow them to get progressively more cycles (proportionately) > if > > > they are processing a pile of native queue messages. > > > > > Fair enough, I just think that delayed tasks that are scheduled to occur at or > > before the current time should have approximately the same priority as chrome > > tasks that are scheduled to run, and shouldn't be starved into practical > > nonexistence. > > > As I understand the problem-case, there is some sub-system posting a LOT of > delayed tasks, but it isn't worried about whether the delayed tasks are "falling > behind." This action causes the delayed task queue to grow without bound <sigh>. > > > IMO, when posting a series of delayed tasks, it is better to post-one, and when > that runs, make a decision of what to post next (and when!!). It is a "bad > idea" to "just post" an endless series of delayed tasks. That seems to be the > case you are handling.... and I don't think we should (or can) handle such > sub-systems well. Said another way, this sub-system is a bad citizen, responding > improperly (or not at all?) to congestion (growing queues). Congestion > avoidance can't be resolved in the message loop... it must be resolved in the > source (sub-system). > I don't think there's any system that's posting a growing number of tasks - as you keep resizing, the delay stays approximately constant. But if there are 10 subsystems that each post 1 task a frame, then a frame will take at least 100ms to execute, even if each task itself takes a microsecond to execute. Each task is even backing off in a sensible way, as it seems to take 100ms between calls into it. That extra delay hardly seems like a sensible way to execute things.
On 2015/05/13 21:17:11, jbauman wrote: > On 2015/05/13 21:10:01, jar (doing other things) wrote: > > On 2015/05/13 20:14:28, jbauman wrote: > > > On 2015/05/13 20:01:21, jar (doing other things) wrote: > > > > Drive-by comment (invited by cpu). YMMV. > > > > > > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > > File base/message_loop/message_pump_win.cc (right): > > > > > > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > > base/message_loop/message_pump_win.cc:325: > > > > state_->delegate->DoDelayedWork(&delayed_work_time_); > > > > Note that you *might* effectively double the amount of work (number of > > tasks) > > > > done during each HandleWorkMessage. Worse yet, it might actually swell up > > the > > > > regular task queue during an "overload," rather than "do less work" when > we > > > fall > > > > behind. The cases where this is crucial are indeed very sensitive to > > > resulting > > > > jank (due to CPU usage). > > > > > > The behavior of ScheduleWork is the same (and ScheduleDelayedWork is just > > > calling into that), it's just that we can process one delayed task in > addition > > > to the regular task. I don't think this will otherwise cause the windows > > native > > > message queue to swell up, as we're still limited to one kMsgHaveWork > > > outstanding at a time. > > > > > > > > The only reason (I think) for this HandleWorkMessage method to run is that > > > we're > > > > in a nested (native?) message loop, and not getting to run our traditional > > > list > > > > of chores (which do include both a shot at regular tasks, as well as > delayed > > > > tasks). In that nested-loop situation, it is commonly the case that the > OS > > > > really wants to tightly control responsiveness (which is why they ran a > > tight > > > > native loop)... so we only very intermittently run a Chrome Task. We > > > > deliberately allow them to get progressively more cycles (proportionately) > > if > > > > they are processing a pile of native queue messages. > > > > > > > Fair enough, I just think that delayed tasks that are scheduled to occur at > or > > > before the current time should have approximately the same priority as > chrome > > > tasks that are scheduled to run, and shouldn't be starved into practical > > > nonexistence. > > > > > > As I understand the problem-case, there is some sub-system posting a LOT of > > delayed tasks, but it isn't worried about whether the delayed tasks are > "falling > > behind." This action causes the delayed task queue to grow without bound > <sigh>. > > > > > > IMO, when posting a series of delayed tasks, it is better to post-one, and > when > > that runs, make a decision of what to post next (and when!!). It is a "bad > > idea" to "just post" an endless series of delayed tasks. That seems to be the > > case you are handling.... and I don't think we should (or can) handle such > > sub-systems well. Said another way, this sub-system is a bad citizen, > responding > > improperly (or not at all?) to congestion (growing queues). Congestion > > avoidance can't be resolved in the message loop... it must be resolved in the > > source (sub-system). > > > I don't think there's any system that's posting a growing number of tasks - as > you keep resizing, the delay stays approximately constant. But if there are 10 > subsystems that each post 1 task a frame, then a frame will take at least 100ms > to execute, even if each task itself takes a microsecond to execute. Each task > is even backing off in a sensible way, as it seems to take 100ms between calls > into it. That extra delay hardly seems like a sensible way to execute things. OK.... that is a better motivation (aggregate delay of several sub-systems, each of which "properly" responded to CPU congestion). I was reading the CL description which said "the queue could keep growing," and assumed you were handling such a scenario. FWIW: Long ago, one of the requirements(?) was to be sure the "message box resizing" was smooth as silk (or as smooth as the OS could make it). That was what drove the effort to minimize task run times (running only one task) when we got a time slice. We surely "could" have run more (2 task? 5 tasks?), but the system messages (mouse movement responses) were deliberately given priority. I do think it conceptually makes sense that we should be willing to run tasks, including DelayedWork, as often as standard work... so that seems rightish. Your point that we currently restrict DelayedWork to no more than a total of 100 (delayed) tasks pers second (maybe only 60ish if the interrupt time is 16ms), seems wrongish. IMO, it would be nice if we ran no more than one task (either DoWork, or DoDelayedWork) when we got a time-slice via these tickler messages. Your code seems to do pairs of calls (potentially)... and it would be nicer if it could (fairly? starving neither task queue?) do only one task per slice... but perhaps two tasks is not "the end of the world" either.
Things are also different now that we draw from delayed tasks and not from WM_PAINT.
On 2015/05/14 02:08:29, jar (doing other things) wrote: > On 2015/05/13 21:17:11, jbauman wrote: > > On 2015/05/13 21:10:01, jar (doing other things) wrote: > > > On 2015/05/13 20:14:28, jbauman wrote: > > > > On 2015/05/13 20:01:21, jar (doing other things) wrote: > > > > > Drive-by comment (invited by cpu). YMMV. > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > > > File base/message_loop/message_pump_win.cc (right): > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... > > > > > base/message_loop/message_pump_win.cc:325: > > > > > state_->delegate->DoDelayedWork(&delayed_work_time_); > > > > > Note that you *might* effectively double the amount of work (number of > > > tasks) > > > > > done during each HandleWorkMessage. Worse yet, it might actually swell > up > > > the > > > > > regular task queue during an "overload," rather than "do less work" when > > we > > > > fall > > > > > behind. The cases where this is crucial are indeed very sensitive to > > > > resulting > > > > > jank (due to CPU usage). > > > > > > > > The behavior of ScheduleWork is the same (and ScheduleDelayedWork is just > > > > calling into that), it's just that we can process one delayed task in > > addition > > > > to the regular task. I don't think this will otherwise cause the windows > > > native > > > > message queue to swell up, as we're still limited to one kMsgHaveWork > > > > outstanding at a time. > > > > > > > > > > The only reason (I think) for this HandleWorkMessage method to run is > that > > > > we're > > > > > in a nested (native?) message loop, and not getting to run our > traditional > > > > list > > > > > of chores (which do include both a shot at regular tasks, as well as > > delayed > > > > > tasks). In that nested-loop situation, it is commonly the case that the > > OS > > > > > really wants to tightly control responsiveness (which is why they ran a > > > tight > > > > > native loop)... so we only very intermittently run a Chrome Task. We > > > > > deliberately allow them to get progressively more cycles > (proportionately) > > > if > > > > > they are processing a pile of native queue messages. > > > > > > > > > Fair enough, I just think that delayed tasks that are scheduled to occur > at > > or > > > > before the current time should have approximately the same priority as > > chrome > > > > tasks that are scheduled to run, and shouldn't be starved into practical > > > > nonexistence. > > > > > > > > > As I understand the problem-case, there is some sub-system posting a LOT of > > > delayed tasks, but it isn't worried about whether the delayed tasks are > > "falling > > > behind." This action causes the delayed task queue to grow without bound > > <sigh>. > > > > > > > > > IMO, when posting a series of delayed tasks, it is better to post-one, and > > when > > > that runs, make a decision of what to post next (and when!!). It is a "bad > > > idea" to "just post" an endless series of delayed tasks. That seems to be > the > > > case you are handling.... and I don't think we should (or can) handle such > > > sub-systems well. Said another way, this sub-system is a bad citizen, > > responding > > > improperly (or not at all?) to congestion (growing queues). Congestion > > > avoidance can't be resolved in the message loop... it must be resolved in > the > > > source (sub-system). > > > > > I don't think there's any system that's posting a growing number of tasks - as > > you keep resizing, the delay stays approximately constant. But if there are 10 > > subsystems that each post 1 task a frame, then a frame will take at least > 100ms > > to execute, even if each task itself takes a microsecond to execute. Each task > > is even backing off in a sensible way, as it seems to take 100ms between calls > > into it. That extra delay hardly seems like a sensible way to execute things. > > OK.... that is a better motivation (aggregate delay of several sub-systems, each > of which "properly" responded to CPU congestion). I was reading the CL > description which said "the queue could keep growing," and assumed you were > handling such a scenario. > > FWIW: Long ago, one of the requirements(?) was to be sure the "message box > resizing" was smooth as silk (or as smooth as the OS could make it). That was > what drove the effort to minimize task run times (running only one task) when we > got a time slice. We surely "could" have run more (2 task? 5 tasks?), but the > system messages (mouse movement responses) were deliberately given priority. > > I do think it conceptually makes sense that we should be willing to run tasks, > including DelayedWork, as often as standard work... so that seems rightish. > Your point that we currently restrict DelayedWork to no more than a total of 100 > (delayed) tasks pers second (maybe only 60ish if the interrupt time is 16ms), > seems wrongish. > > IMO, it would be nice if we ran no more than one task (either DoWork, or > DoDelayedWork) when we got a time-slice via these tickler messages. Your code > seems to do pairs of calls (potentially)... and it would be nicer if it could > (fairly? starving neither task queue?) do only one task per slice... but perhaps > two tasks is not "the end of the world" either. I think the code is pretty close to what MessagePumpForUI::DoRunLoop tries to do - one native windows message, one regular task, and then one delayed task - though of course other windows messages will sneak through the outer message loop, and there could be an extra delayed task once per 10ms. I don't know exactly what the fairest balance is, but I think we're pretty resilient to minor unfairness and this seems like the simplest solution.
we'll find you tomorrow, see if we can do a high bandwidth data exchange via visible spectrum photons.
Ok, we tried mightily to give John a better thing but all the things we came up with were uglier. I had not realized that there is an effective ~100 tasks per second maximum that we can't get past. Both that and the fact that we can queue 2K messages because of that feels very broken. I am willing to give this a try, the worst it can happen is a funky canary build (but the branch was already cut) and some perf issues. If we come up with a better solution we'll do that instead but I want to see what this does in practice so lgtm.
https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... File base/message_loop/message_pump_win.cc (right): https://codereview.chromium.org/918473002/diff/20001/base/message_loop/messag... base/message_loop/message_pump_win.cc:343: // A bit gratuitous to set delayed_work_time_ again, but oh well. On 2015/05/13 20:01:21, jar (doing other things) wrote: > The CL is plausible.... but I'm not a fan of replicated code (line 327 and 343). > > I'm not sure of how or why to refactor... but certainly copying a confusing > comment is questionable. > > Can you clarify the meaning of the comment? Ok, refactored out the timer rescheduling logic to a separate function so there's less duplication.
The CQ bit was checked by jbauman@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from cpu@chromium.org Link to the patchset: https://codereview.chromium.org/918473002/#ps40001 (title: " ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/918473002/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/9316d3b983530169e6c50d61fa87eeebecd8dcfd Cr-Commit-Position: refs/heads/master@{#330816} |