|
|
Created:
5 years, 9 months ago by alex clarke (OOO till 29th) Modified:
5 years, 9 months ago CC:
chromium-reviews, mkwst+moarreviews-renderer_chromium.org, darin-cc_chromium.org, jam, scheduler-bugs_chromium.org, mlamouri+watch-content_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdds a couple of simple micro benchmarks for the TaskQueueManager
The work loads are very synthetic, but this should give us some idea of
scalability.
BUG=
Committed: https://crrev.com/f02cff06da47c87ad27982ed569b70ae850591ba
Cr-Commit-Position: refs/heads/master@{#319436}
Patch Set 1 #Patch Set 2 : Remove header #
Total comments: 16
Patch Set 3 : Responding to feedback #
Total comments: 9
Patch Set 4 : Made it run all the tasks #Patch Set 5 : Pulled some initalization out of Benchmark and into a helper #Patch Set 6 : Makes the number of tasks in flight much more variable #
Total comments: 7
Patch Set 7 : Finishing touches? #
Messages
Total messages: 17 (4 generated)
skyostil@chromium.org changed reviewers: + skyostil@chromium.org
https://codereview.chromium.org/971393002/diff/20001/content/content_tests.gypi File content/content_tests.gypi (right): https://codereview.chromium.org/971393002/diff/20001/content/content_tests.gy... content/content_tests.gypi:672: '../testing/perf/perf_test.cc', I think we need to list the perf_test target as a dependency instead. See src/cc/cc_tests.gyp. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:1: Nit: extra blank line. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:8: #include <functional> I'm not sure we can use this yet. Lambdas by themselves are fine. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:42: return found_one; nit: This is guaranteed to be true because the selector won't be called unless there is work to be done. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:91: base::TimeDelta::FromMilliseconds(sequence_number_ % 10)); I think you said this takes less than a millisecond to run, so most of these delayed tasks wouldn't get a chance to fire, right? Should we make the delays much smaller? Also, this is pretty heavily biased towards delayed tasks (which can be fine since the benchmarks mention delayed tasks specifically). I wonder if we should instead do one run of top_25/key_mobile_sites and plug in the rough ratio of delayed vs. non-delayed here? https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:94: runner->PostDelayedTask(FROM_HERE, I'm worried this is a little too atypical workload since the number of posted tasks is exponential. How deep are the task queues that you're seeing? https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:115: Benchmark("post 10000 delayed tasks", [this]() { Could you pass in the number of task queues as a modifier to make the results unique?
PTAL https://codereview.chromium.org/971393002/diff/20001/content/content_tests.gypi File content/content_tests.gypi (right): https://codereview.chromium.org/971393002/diff/20001/content/content_tests.gy... content/content_tests.gypi:672: '../testing/perf/perf_test.cc', On 2015/03/03 18:09:18, Sami wrote: > I think we need to list the perf_test target as a dependency instead. See > src/cc/cc_tests.gyp. Done. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:1: On 2015/03/03 18:09:18, Sami wrote: > Nit: extra blank line. Done. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:8: #include <functional> On 2015/03/03 18:09:18, Sami wrote: > I'm not sure we can use this yet. Lambdas by themselves are fine. Done. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:42: return found_one; On 2015/03/03 18:09:18, Sami wrote: > nit: This is guaranteed to be true because the selector won't be called unless > there is work to be done. Sure. I put a CHECK in there to hopefully make that more obvious. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:91: base::TimeDelta::FromMilliseconds(sequence_number_ % 10)); On 2015/03/03 18:09:18, Sami wrote: > I think you said this takes less than a millisecond to run, so most of these > delayed tasks wouldn't get a chance to fire, right? Should we make the delays > much smaller? No it takes much longer to complete than that. Putting a printf in TestTask shows it posts 10000x before exiting. > > Also, this is pretty heavily biased towards delayed tasks (which can be fine > since the benchmarks mention delayed tasks specifically). I wonder if we should > instead do one run of top_25/key_mobile_sites and plug in the rough ratio of > delayed vs. non-delayed here? https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:94: runner->PostDelayedTask(FROM_HERE, On 2015/03/03 18:09:18, Sami wrote: > I'm worried this is a little too atypical workload since the number of posted > tasks is exponential. How deep are the task queues that you're seeing? 50 - 200 delayed tasks are common on heavy pages like the verge. Typically between 2-30 fire at the same time. I've adjusted the code to limit the number of tasks in flight. https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:115: Benchmark("post 10000 delayed tasks", [this]() { On 2015/03/03 18:09:18, Sami wrote: > Could you pass in the number of task queues as a modifier to make the results > unique? Done.
https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:91: base::TimeDelta::FromMilliseconds(sequence_number_ % 10)); On 2015/03/04 11:38:06, alexclarke1 wrote: > On 2015/03/03 18:09:18, Sami wrote: > > I think you said this takes less than a millisecond to run, so most of these > > delayed tasks wouldn't get a chance to fire, right? Should we make the delays > > much smaller? > > No it takes much longer to complete than that. Putting a printf in TestTask > shows it posts 10000x before exiting. We're counting tasks posted and not tasks run though. How many tasks do we end up running before the test ends? > > > > > > > Also, this is pretty heavily biased towards delayed tasks (which can be fine > > since the benchmarks mention delayed tasks specifically). I wonder if we > should > > instead do one run of top_25/key_mobile_sites and plug in the rough ratio of > > delayed vs. non-delayed here? > https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:64: void TestTask() { nit: TestDelayedTask() since that's what this is mostly posting. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:66: unsigned int max_tasks_to_post = num_tasks_to_run_ % 2 ? 1 : 10; How deep queues do you see in DoWork() with this and how does that compare to real sites? It feels like the the fan-out factor is still pretty large. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:93: void Benchmark(const std::string& trace, unsigned int num_tasks_to_run) { I guess you could make this a template if you don't want to hardcode the test function here.
PTAL https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/20001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:91: base::TimeDelta::FromMilliseconds(sequence_number_ % 10)); On 2015/03/04 13:00:08, Sami wrote: > On 2015/03/04 11:38:06, alexclarke1 wrote: > > On 2015/03/03 18:09:18, Sami wrote: > > > I think you said this takes less than a millisecond to run, so most of these > > > delayed tasks wouldn't get a chance to fire, right? Should we make the > delays > > > much smaller? > > > > No it takes much longer to complete than that. Putting a printf in TestTask > > shows it posts 10000x before exiting. > > We're counting tasks posted and not tasks run though. How many tasks do we end > up running before the test ends? > > > > > > > > > > > > > Also, this is pretty heavily biased towards delayed tasks (which can be fine > > > since the benchmarks mention delayed tasks specifically). I wonder if we > > should > > > instead do one run of top_25/key_mobile_sites and plug in the rough ratio of > > > delayed vs. non-delayed here? > > > I changed it so that it runs all. I don't see much difference in the results. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:64: void TestTask() { On 2015/03/04 13:00:09, Sami wrote: > nit: TestDelayedTask() since that's what this is mostly posting. Done. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:66: unsigned int max_tasks_to_post = num_tasks_to_run_ % 2 ? 1 : 10; On 2015/03/04 13:00:09, Sami wrote: > How deep queues do you see in DoWork() with this and how does that compare to > real sites? It feels like the the fan-out factor is still pretty large. The number of delayed tasks tasks is capped to max_tasks_in_flight_. I'm only ever seeing 1 task on the queue in TaskQueueManager::ProcessTaskFromWorkQueue. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:93: void Benchmark(const std::string& trace, unsigned int num_tasks_to_run) { On 2015/03/04 13:00:09, Sami wrote: > I guess you could make this a template if you don't want to hardcode the test > function here. Good point, but I think base::Closure is more readable than templating it.
https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:66: unsigned int max_tasks_to_post = num_tasks_to_run_ % 2 ? 1 : 10; On 2015/03/04 16:34:20, alexclarke1 wrote: > On 2015/03/04 13:00:09, Sami wrote: > > How deep queues do you see in DoWork() with this and how does that compare to > > real sites? It feels like the the fan-out factor is still pretty large. > > The number of delayed tasks tasks is capped to max_tasks_in_flight_. > > I'm only ever seeing 1 task on the queue in > TaskQueueManager::ProcessTaskFromWorkQueue. I more meant how many tasks there are pending in each of the incoming and work queues. I think if we post most of the tasks up front instead of getting them as a steady trickle, then we get away with just a few reloads which may not be realistic. https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:93: void Benchmark(const std::string& trace, unsigned int num_tasks_to_run) { On 2015/03/04 16:34:20, alexclarke1 wrote: > On 2015/03/04 13:00:09, Sami wrote: > > I guess you could make this a template if you don't want to hardcode the test > > function here. > > Good point, but I think base::Closure is more readable than templating it. Works for me.
https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/40001/content/renderer/schedul... content/renderer/scheduler/task_queue_manager_perftest.cc:66: unsigned int max_tasks_to_post = num_tasks_to_run_ % 2 ? 1 : 10; On 2015/03/04 18:33:29, Sami wrote: > On 2015/03/04 16:34:20, alexclarke1 wrote: > > On 2015/03/04 13:00:09, Sami wrote: > > > How deep queues do you see in DoWork() with this and how does that compare > to > > > real sites? It feels like the the fan-out factor is still pretty large. > > > > The number of delayed tasks tasks is capped to max_tasks_in_flight_. > > > > I'm only ever seeing 1 task on the queue in > > TaskQueueManager::ProcessTaskFromWorkQueue. > > I more meant how many tasks there are pending in each of the incoming and work > queues. I think if we post most of the tasks up front instead of getting them as > a steady trickle, then we get away with just a few reloads which may not be > realistic. There is a maximum of 200 tasks in flight. It was occasionally dipping down to 1 but was mostly at two hundred. The change I just added makes the number of tasks in flight much more variable.
Cool, thanks for improving this. Let's go with this these numbers for now and tweak them later if we need to. lgtm with some nits. https://codereview.chromium.org/971393002/diff/100001/content/content_tests.gypi File content/content_tests.gypi (right): https://codereview.chromium.org/971393002/diff/100001/content/content_tests.g... content/content_tests.gypi:1246: 'renderer/scheduler/task_queue_manager_perftest.cc', nit: Please keep these sorted. https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... content/renderer/scheduler/task_queue_manager_perftest.cc:125: unsigned int max_tasks_in_flight_; Could you initialize these to zeros in the constructor?
rmcilroy@chromium.org changed reviewers: + rmcilroy@chromium.org
https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... content/renderer/scheduler/task_queue_manager_perftest.cc:89: num_tasks_to_post_ % 2 ? 1 : (10 + num_tasks_to_post_ % 10); Drive by comment (sorry if this has been covered already) - it looks like all tasks end up with a delay of at least 1ms - are we sure that the benchmark is measuring overhead of the TQM or is the runtime dominated by the task delays? Also, it might be nice to gave a test which has a realistic mix of delayed and non-delayed tasks, WDYT?
https://codereview.chromium.org/971393002/diff/100001/content/content_tests.gypi File content/content_tests.gypi (right): https://codereview.chromium.org/971393002/diff/100001/content/content_tests.g... content/content_tests.gypi:1246: 'renderer/scheduler/task_queue_manager_perftest.cc', On 2015/03/05 19:26:10, Sami wrote: > nit: Please keep these sorted. Done. https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... content/renderer/scheduler/task_queue_manager_perftest.cc:89: num_tasks_to_post_ % 2 ? 1 : (10 + num_tasks_to_post_ % 10); On 2015/03/06 09:52:30, rmcilroy wrote: > Drive by comment (sorry if this has been covered already) - it looks like all > tasks end up with a delay of at least 1ms - are we sure that the benchmark is > measuring overhead of the TQM or is the runtime dominated by the task delays? Yes it does appear to be strongly influenced by the TQM implementation. Times for RunTenThousandDelayedTasks_OneQueue on my z620 in release: This patch: ~17657ms This patch plus my proposed delayed task implementation: ~10314 ms This patch plus your simplified delayed task suggestion: ~20862 ms > Also, it might be nice to gave a test which has a realistic mix of delayed and > non-delayed tasks, WDYT? Yes I agree, but maybe in a follow up patch. I've added a TODO. https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... content/renderer/scheduler/task_queue_manager_perftest.cc:125: unsigned int max_tasks_in_flight_; On 2015/03/05 19:26:10, Sami wrote: > Could you initialize these to zeros in the constructor? Done.
The CQ bit was checked by alexclarke@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from skyostil@chromium.org Link to the patchset: https://codereview.chromium.org/971393002/#ps120001 (title: "Finishing touches?")
https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... File content/renderer/scheduler/task_queue_manager_perftest.cc (right): https://codereview.chromium.org/971393002/diff/100001/content/renderer/schedu... content/renderer/scheduler/task_queue_manager_perftest.cc:89: num_tasks_to_post_ % 2 ? 1 : (10 + num_tasks_to_post_ % 10); On 2015/03/06 11:07:30, alexclarke1 wrote: > On 2015/03/06 09:52:30, rmcilroy wrote: > > Drive by comment (sorry if this has been covered already) - it looks like all > > tasks end up with a delay of at least 1ms - are we sure that the benchmark is > > measuring overhead of the TQM or is the runtime dominated by the task delays? > > Yes it does appear to be strongly influenced by the TQM implementation. Times > for RunTenThousandDelayedTasks_OneQueue on my z620 in release: > > This patch: ~17657ms > This patch plus my proposed delayed task implementation: ~10314 ms > This patch plus your simplified delayed task suggestion: ~20862 ms > > > > Also, it might be nice to gave a test which has a realistic mix of delayed and > > non-delayed tasks, WDYT? > > Yes I agree, but maybe in a follow up patch. I've added a TODO. Great, sounds good - thanks!
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/971393002/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/f02cff06da47c87ad27982ed569b70ae850591ba Cr-Commit-Position: refs/heads/master@{#319436} |