|
|
Created:
4 years, 9 months ago by prashant.n Modified:
4 years, 8 months ago CC:
blink-worker-reviews_chromium.org, chromium-reviews, darin-cc_chromium.org, jam, kinuko+watch, mkwst+moarreviews-renderer_chromium.org, mlamouri+watch-content_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptioncontent: Implement dynamic priorities for raster threads.
With current implementation for raster threads, suppose background task
is being run by background priority thread and if this task is
rescheduled as foreground task, then executing the task by raster
thread with same priority delays the completion of task.
This patch implements changing the priority of raster thread while task
is being executed by that thread, so that task execution can be speeded
up or slowed down. The raster thread's priority is changed if already
running task is rescheduled with new category. This patch handles
speeding up of background task, which is rescheduled as foreground task
and implementation is supported only on Android platform as of now.
This patch implements
- N static priority threads having normal priority,
- 1 dynamic priority thread having background priority.
BUG=
CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel
Patch Set 1 #Patch Set 2 : Test mismatch. #Patch Set 3 : Test delays. #Patch Set 4 : Don't check yet. #Patch Set 5 : #Patch Set 6 : test this. #Patch Set 7 : android 1F + 1B #Patch Set 8 : preparing for checkin. #
Total comments: 2
Patch Set 9 : fixed build error. #Patch Set 10 : modified RWP::start function. #Patch Set 11 : nits. #Patch Set 12 : android only. #Patch Set 13 : stop. #Patch Set 14 : analyse playback #Patch Set 15 : Test reschedule task. #Patch Set 16 : Traces corrected. #
Depends on Patchset: Messages
Total messages: 70 (8 generated)
Description was changed from ========== [WIP] content: Dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= ========== to ========== [WIP] content: Dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= ==========
prashant.n@samsung.com changed reviewers: + ericrk@chromium.org, reveman@chromium.org, vmpstr@chromium.org
WIP patch, should we have something like this?
The preferred sandbox used by chrome doesn't allow us to dynamically change the priority of threads. We can only lower it. Android currently doesn't have this sandbox restriction but I think that's unfortunate and it will hopefully change in the future. I'm not convinced that this is an improvement over the existing system with one low priority background thread. More importantly, I don't think we should rely on the ability to dynamically change priority of threads as that's not something that should be allowed in the sandbox.
On 2016/02/26 19:16:41, reveman wrote: > The preferred sandbox used by chrome doesn't allow us to dynamically change the > priority of threads. We can only lower it. Android currently doesn't have this > sandbox restriction but I think that's unfortunate and it will hopefully change > in the future. > > I'm not convinced that this is an improvement over the existing system with one > low priority background thread. More importantly, I don't think we should rely > on the ability to dynamically change priority of threads as that's not something > that should be allowed in the sandbox. Even if we could change priorities, I have some concerns: - 1 foreground thread and a bunch of low-pri backgorund threads is not ideal - if there's enough foreground work, we want to scale to N foreground threads ASAP. If we've queued up N background tasks, then we'd have to wait for these to complete before we could go beyond 1 foreground task. - If a background task is blocking a foreground task that could run on the same thread, would it need to run with higher priority to get out of the way of the foreground task? Otherwise it seems like it might take even longer to get to the desired N foreground threads.
reveman@, 1. Here I'm trying to change the priority of current thread from thread itself. If changing priority for the given thread from low to high OR high to low is not possible OR not advisable due to sandbox restrictions, then we stop on having dynamic priorities for threads. But I guess we can change the priorities of the given thread from itself safely. (I may be wrong.) This is first patch which will have thread priority changed before starting the execution of task. 2. My further idea was to change the priority while task is getting executed. e.g. Suppose the task is getting executed by backgound thread and before execution finishes the task is needed on high priority. In such case increasing the prioriy of runner thread would help to finish task as quickly as possible. ericrk@, For every thread the categories are in the following sequence - if there is no task with given category, then only it moves to next category. NONCONCURRENT_FOREGROUND, FOREGROUND, BACKGROUND. I agree with your point, when N threads are executing background tasks and suppose executing those tasks takes longer time, then we are left with only 1 foreground task thread as per this patch. Problem1 - Here cancelling the tasks would waste the efforts invested in executing background tasks. Problem2 - Foretasks would have to wait for background tasks as explained by you. Just few thoughts. In such case creating few more threads on the fly would help and once tasks are executed, thread count could be maintained within threshold limit. We can have max number of threads equal to the no. of cores and min could be as per current logic, N.
Changing priority of thread within thread before task is getting executed is not a good approach and we cannot set thread priority from outside thread as per sandbox restriction or current implementation. Should we have thread creation on the fly and delete the thread after task is done and next task is not in that category and num threads limit is exceeded?
On 2016/03/01 at 17:03:53, prashant.n wrote: > Changing priority of thread within thread before task is getting executed is not a good approach and we cannot set thread priority from outside thread as per sandbox restriction or current implementation. > > Should we have thread creation on the fly and delete the thread after task is done and next task is not in that category and num threads limit is exceeded? Sorry, I'm failing to see the motivation for changing the code in the first place. Is there some discussion I've missed?
> Sorry, I'm failing to see the motivation for changing the code in the first > place. Is there some discussion I've missed? I've been referring to crbug.com/504515. My motivation to change code was - 1. With 1 background thread we could not use CPU cores efficiently when remaining cores of CPU are idle and background tasks are being run. For this we should either have N foreground tasks threads (normal priority) and M background threads (background priority). So this will be better than existing implementation. Suppose N = 4 and M = 4 and no. of cores is 8, we will have many raster threads, tut this will ensure efficient use of cores. 2. The above solution still is still leaves 1 problem. If there is a task t1 being executed by background thread and after user has scrolled, t1 is needed on high priority, there without increasing priority of runner thread, it cannot be finished quickly. So increasing priority of the thread from outside of thread is quite needed.
On 2016/03/02 at 06:26:04, prashant.n wrote: > > Sorry, I'm failing to see the motivation for changing the code in the first > > place. Is there some discussion I've missed? > > I've been referring to crbug.com/504515. My motivation to change code was - > > 1. With 1 background thread we could not use CPU cores efficiently when remaining cores of CPU are idle and background tasks are being run. That's intentional. We don't want to waste too much power on background tasks that are not necessarily needed. > > For this we should either have N foreground tasks threads (normal priority) and M background threads (background priority). So this will be better than existing implementation. Suppose N = 4 and M = 4 and no. of cores is 8, we will have many raster threads, tut this will ensure efficient use of cores. I'm not sure we want efficient use of cores in these situations. For foreground work maximizing usage of cores makes sense but for background work I think we'd rather minimize power usage. > > 2. The above solution still is still leaves 1 problem. If there is a task t1 being executed by background thread and after user has scrolled, t1 is needed on high priority, there without increasing priority of runner thread, it cannot be finished quickly. So increasing priority of the thread from outside of thread is quite needed. Yes, this problem does exist in theory but I think we should first collect some data showing that this is a problem in practice before we add more complexity to the raster-worker-pool implementation to solve it.
On 2016/03/02 15:24:22, reveman wrote: > On 2016/03/02 at 06:26:04, prashant.n wrote: > > > Sorry, I'm failing to see the motivation for changing the code in the first > > > place. Is there some discussion I've missed? > > > > I've been referring to crbug.com/504515. My motivation to change code was - > > > > 1. With 1 background thread we could not use CPU cores efficiently when > remaining cores of CPU are idle and background tasks are being run. > > That's intentional. We don't want to waste too much power on background tasks > that are not necessarily needed. Ok. I understand now. The tiles which are now bin are the only which are categorized as foreground tasks, we may be interested in soon bin tiles too, may be runner thread priority inbetween normal and background. > > > > > For this we should either have N foreground tasks threads (normal priority) > and M background threads (background priority). So this will be better than > existing implementation. Suppose N = 4 and M = 4 and no. of cores is 8, we will > have many raster threads, tut this will ensure efficient use of cores. > > I'm not sure we want efficient use of cores in these situations. For foreground > work maximizing usage of cores makes sense but for background work I think we'd > rather minimize power usage. > > > > > 2. The above solution still is still leaves 1 problem. If there is a task t1 > being executed by background thread and after user has scrolled, t1 is needed on > high priority, there without increasing priority of runner thread, it cannot be > finished quickly. So increasing priority of the thread from outside of thread is > quite needed. > > Yes, this problem does exist in theory but I think we should first collect some > data showing that this is a problem in practice before we add more complexity to > the raster-worker-pool implementation to solve it. I've been working on it, sooner I'll try to get valid data.
Description was changed from ========== [WIP] content: Dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= ========== to ========== [WIP] content: Dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
reveman@, patchset2 gives the data showing problem and occurrence is frequent, even on simple pages like google.com. If this is justified can I start a bug for this to work on?
Check with vmpstr@ and ericrk@ who are making changes to raster work.
On 2016/03/04 20:10:34, reveman wrote: > Check with vmpstr@ and ericrk@ who are making changes to raster work. vmpstr@ and ericrk@, I'm interested to work on this?
On 2016/03/05 03:06:46, prashant.n wrote: > On 2016/03/04 20:10:34, reveman wrote: > > Check with vmpstr@ and ericrk@ who are making changes to raster work. > > vmpstr@ and ericrk@, I'm interested to work on this? When you say that patchset2 gives the data showing that the problem/occurrence is frequent, can you explain a bit more - I see the log statements you've inserted, but I'd like a bit more detail. It seems that you are logging when low priority background tasks are not completed by the time a they get bumped to high-priority, indicating that we have a backlog of background work. While running more background threads would potentially alleviate this, it does so at the cost of blocking future foreground work that may come in, especially on pages with a lot of invalidations. This is an issue we've run into in the past when trying to run more threads - if we end up running 4 background threads and new foreground work comes in (due to an invalidation), these 4 background threads may starve the foreground work (and may not even be needed in the case of an invalidation). I guess what I'm saying is that I'm not currently too concerned if we see that some background work is not always finished by the time it is bumped to foreground, unless we are also seeing drops in some metric we care about (smoothness, framerate, etc...) - we are intentionally underutilizing cores for background work to ensure foreground work does not get blocked... Do you have specific test sites where we see a drop in smoothness or frame-rate due to under-utilization of background threads? I took a quick look at telemetry and didn't see any significant changes when this CL landed, but we could have definitely missed something.
On 2016/03/07 18:51:56, ericrk wrote: > On 2016/03/05 03:06:46, prashant.n wrote: > > On 2016/03/04 20:10:34, reveman wrote: > > > Check with vmpstr@ and ericrk@ who are making changes to raster work. > > > > vmpstr@ and ericrk@, I'm interested to work on this? > > When you say that patchset2 gives the data showing that the problem/occurrence > is frequent, can you explain a bit more - I see the log statements you've > inserted, but I'd like a bit more detail. Yes. The logs come when running tasks get rescheduled with different priority. Most of the times we could see low priority tasks being rescheduled at high priotity. > > It seems that you are logging when low priority background tasks are not > completed by the time a they get bumped to high-priority, indicating that we > have a backlog of background work. While running more background threads would > potentially alleviate this, it does so at the cost of blocking future foreground > work that may come in, especially on pages with a lot of invalidations. This is > an issue we've run into in the past when trying to run more threads - if we end > up running 4 background threads and new foreground work comes in (due to an > invalidation), these 4 background threads may starve the foreground work (and > may not even be needed in the case of an invalidation). Pls. ignore the code implemented in patchset 1. It does not implement the purpose I intent to say. > > I guess what I'm saying is that I'm not currently too concerned if we see that > some background work is not always finished by the time it is bumped to > foreground, unless we are also seeing drops in some metric we care about > (smoothness, framerate, etc...) - we are intentionally underutilizing cores for > background work to ensure foreground work does not get blocked... I could see some background raster tasks take lot of time, upto 35 ms. I'll check how much time running task for which priority is changed takes, which will clarify the intent. > > Do you have specific test sites where we see a drop in smoothness or frame-rate > due to under-utilization of background threads? I took a quick look at telemetry > and didn't see any significant changes when this CL landed, but we could have > definitely missed something. I've not checked any specific site, but my previous analysis on fast pinch zoom had lead me to this and on google.com also it shows problem, may be I'll check actual times by which raster tasks are delayed which would give us better picture.
1. Data for two sites. (Contents zoomed and flinged from top to bottom). It is observed that background priority thread take longer time to finish task (data shown only for those tasks which are re-scheduled with different priority), which could have been finished early if threads would have normal priority. From the logs below, entries ending "DELAYED TASK ****" is our point of interest. 2. For log entries ending with "WASTED" we need to device some other solution - something like cancellation. google.com PRAS::RasterTask [0x81cbeab0], Thread Priority = background, Total Execution Time = 11.7 ms, Old Cat = background, old_cat_time = 6.38 ms, New Cat = foreground, new_cat_time = 5.34 ms DELAYED TASK **** PRAS::RasterTask [0x812ebda8], Thread Priority = normal, Total Execution Time = 9.03 ms, Old Cat = foreground, old_cat_time = 7.9 ms, New Cat = background, new_cat_time = 1.13 ms WASTED PRAS::RasterTask [0x814fb3f8], Thread Priority = normal, Total Execution Time = 7.6 ms, Old Cat = foreground, old_cat_time = 1.19 ms, New Cat = background, new_cat_time = 6.41 ms WASTED PRAS::RasterTask [0x8118f148], Thread Priority = background, Total Execution Time = 5.55 ms, Old Cat = background, old_cat_time = 1.13 ms, New Cat = foreground, new_cat_time = 4.42 ms DELAYED TASK **** PRAS::RasterTask [0x829220c0], Thread Priority = background, Total Execution Time = 3.14 ms, Old Cat = background, old_cat_time = 1.74 ms, New Cat = foreground, new_cat_time = 1.4 ms DELAYED TASK **** PRAS::RasterTask [0x812c16e0], Thread Priority = normal, Total Execution Time = 8.18 ms, Old Cat = foreground, old_cat_time = 7.02 ms, New Cat = background, new_cat_time = 1.16 ms WASTED PRAS::RasterTask [0x826cc900], Thread Priority = background, Total Execution Time = 28.1 ms, Old Cat = background, old_cat_time = 28 ms, New Cat = foreground, new_cat_time = 0.122 ms DELAYED TASK **** PRAS::RasterTask [0x82608410], Thread Priority = background, Total Execution Time = 15.4 ms, Old Cat = background, old_cat_time = 3.05 ms, New Cat = foreground, new_cat_time = 12.4 ms DELAYED TASK **** PRAS::RasterTask [0x826b8268], Thread Priority = background, Total Execution Time = 6.32 ms, Old Cat = background, old_cat_time = 5.31 ms, New Cat = foreground, new_cat_time = 1.01 ms DELAYED TASK **** PRAS::RasterTask [0x81186ff0], Thread Priority = background, Total Execution Time = 20.9 ms, Old Cat = background, old_cat_time = 10.8 ms, New Cat = foreground, new_cat_time = 10.1 ms DELAYED TASK **** PRAS::RasterTask [0x811cce80], Thread Priority = background, Total Execution Time = 14.6 ms, Old Cat = background, old_cat_time = 7.11 ms, New Cat = foreground, new_cat_time = 7.45 ms DELAYED TASK **** theverge.com PRAS::RasterTask [0x820eea68], Thread Priority = background, Total Execution Time = 16.3 ms, Old Cat = background, old_cat_time = 15.3 ms, New Cat = foreground, new_cat_time = 0.977 ms DELAYED TASK **** PRAS::RasterTask [0x82091e50], Thread Priority = background, Total Execution Time = 2.69 ms, Old Cat = background, old_cat_time = 1.86 ms, New Cat = foreground, new_cat_time = 0.824 ms DELAYED TASK **** PRAS::RasterTask [0x810f47e8], Thread Priority = background, Total Execution Time = 663 ms, Old Cat = background, old_cat_time = 187 ms, New Cat = foreground, new_cat_time = 477 ms DELAYED TASK **** PRAS::RasterTask [0x7d845d50], Thread Priority = background, Total Execution Time = 7.78 ms, Old Cat = background, old_cat_time = 0.336 ms, New Cat = foreground, new_cat_time = 7.45 ms DELAYED TASK **** PRAS::RasterTask [0x80d3cf20], Thread Priority = background, Total Execution Time = 8.82 ms, Old Cat = background, old_cat_time = 7.72 ms, New Cat = foreground, new_cat_time = 1.1 ms DELAYED TASK **** PRAS::RasterTask [0x829a0dd8], Thread Priority = background, Total Execution Time = 4.55 ms, Old Cat = background, old_cat_time = 2.5 ms, New Cat = foreground, new_cat_time = 2.04 ms DELAYED TASK **** PRAS::RasterTask [0x81132030], Thread Priority = background, Total Execution Time = 3.14 ms, Old Cat = background, old_cat_time = 0.854 ms, New Cat = foreground, new_cat_time = 2.29 ms DELAYED TASK **** PRAS::RasterTask [0x82089f90], Thread Priority = background, Total Execution Time = 27.4 ms, Old Cat = background, old_cat_time = 24.2 ms, New Cat = foreground, new_cat_time = 3.23 ms DELAYED TASK **** PRAS::RasterTask [0x81113240], Thread Priority = background, Total Execution Time = 5.04 ms, Old Cat = background, old_cat_time = 0.214 ms, New Cat = foreground, new_cat_time = 4.82 ms DELAYED TASK **** PRAS::RasterTask [0x810e2738], Thread Priority = background, Total Execution Time = 2.78 ms, Old Cat = background, old_cat_time = 1.1 ms, New Cat = foreground, new_cat_time = 1.68 ms DELAYED TASK **** PRAS::RasterTask [0x811dedb8], Thread Priority = background, Total Execution Time = 19.3 ms, Old Cat = background, old_cat_time = 1.44 ms, New Cat = foreground, new_cat_time = 17.9 ms DELAYED TASK **** PRAS::RasterTask [0x7d85d910], Thread Priority = background, Total Execution Time = 5.31 ms, Old Cat = background, old_cat_time = 0.244 ms, New Cat = foreground, new_cat_time = 5.07 ms DELAYED TASK **** PRAS::RasterTask [0x81180da0], Thread Priority = background, Total Execution Time = 1.68 ms, Old Cat = background, old_cat_time = 1.19 ms, New Cat = foreground, new_cat_time = 0.488 ms DELAYED TASK **** PRAS::RasterTask [0x7d846438], Thread Priority = background, Total Execution Time = 10.7 ms, Old Cat = background, old_cat_time = 8.73 ms, New Cat = foreground, new_cat_time = 2.01 ms DELAYED TASK ****
As per https://code.google.com/p/chromium/codesearch#chromium/src/sandbox/linux/secc... we can change priority of thread from current process. Correct me if I understood wrong.
On 2016/03/08 at 15:22:21, prashant.n wrote: > As per https://code.google.com/p/chromium/codesearch#chromium/src/sandbox/linux/secc... > > we can change priority of thread from current process. Correct me if I understood wrong. You can change priority from the current thread but I thought it was restricted to only decreasing the priority. I could be wrong. Please try increasing the priority of a thread on Linux to verify that it actually works. Btw, I'm not sure how to interpret the data above. Sure, some background tasks become foreground tasks while running and other foreground tasks might incorrectly have higher priority relative to this task in that case but I'm not sure how much of a difference this actually has on the user experience. Some data showing that our telemetry smoothness tests improve if we adjust thread priority would be useful.
> Btw, I'm not sure how to interpret the data above. Thread Priority = priority of thread i.e. normal or background, (this does not change in current code) Total Execution Time = total time needed to finish task Old Cat = task category of running task old_cat_time = time elapsed till same task is rescheduled New Cat = task category when it is rescheduled new_cat_time = time elapsed from task is rescheduled till it is finished on existing runner thread On theverge.com new_cat_time comes ~ 477 ms sometimes. If we change runner thread priority, it would be <= what currently is coming. Yes. taking smoothness data would prove the intent. I'll share it tomorrow.
When I re-read the comment at https://code.google.com/p/chromium/codesearch#chromium/src/sandbox/linux/secc... it would give EPERM error if |who| is not 0. But when we want to change the priority for the given thread, we need to pass thread id. Let me check further.
On 2016/03/09 04:42:02, prashant.n wrote: > When I re-read the comment at > > https://code.google.com/p/chromium/codesearch#chromium/src/sandbox/linux/secc... > > it would give EPERM error if |who| is not 0. > > But when we want to change the priority for the given thread, we need to pass > thread id. Let me check further. The above comment can be related to main thread, so we don't need to worry. I've modified code for android, not yet assessed whether priority is getting changed actually or not, but per requests made it looks like changing. See given below the logs for it. Current = Current priority Default = Default priority Low = Low priority when to slow down High = High priority when to speed up PRAS::RasterWorkerPool::Start NumRasterThreads = 2 PRAS:: [0x7cc428f8] Current = normal, { Default = background, Low = background, High = normal}. Runner speeded up ^^^^^^^^^ PRAS:: [0x7cc428f8] Current = background, { Default = background, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc420a8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc3e930] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc413e8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc413e8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc420a8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc3e930] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc428f8] Current = normal, { Default = background, Low = background, High = normal}. Runner speeded up ^^^^^^^^^ PRAS:: [0x7cc428f8] Current = background, { Default = background, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc420a8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc420a8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc413e8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc3e930] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc413e8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc413e8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc420a8] Current = background, { Default = normal, Low = background, High = normal}. Runner slowed down. PRAS:: [0x7cc413e8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc3e930] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. PRAS:: [0x7cc420a8] Current = normal, { Default = normal, Low = background, High = normal}. Restored to default. However I felt flinge little better on Galaxy S4.
https://codereview.chromium.org/1784623005/ is the perf tryjob on android. I tested smoothness.top_25_smooth. I don't know exactly what all fields mean, but more or less patch looks good. May be you can check the perf results. For easy reference I'm giving below the links. P.N. In my patch I've disabled gpu rasterization for testing background threads. android_s5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus9 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus6 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus9 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus7 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03...
Description was changed from ========== [WIP] content: Dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== Test dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
On 2016/03/10 at 09:29:00, prashant.n wrote: > https://codereview.chromium.org/1784623005/ is the perf tryjob on android. I tested smoothness.top_25_smooth. I don't know exactly what all fields mean, but more or less patch looks good. May be you can check the perf results. For easy reference I'm giving below the links. > > P.N. In my patch I've disabled gpu rasterization for testing background threads. > > android_s5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... > android_nexus9 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... > android_nexus6 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... > android_nexus5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... > android_nexus9 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... > android_nexus7 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... You're changing the number of threads used for raster as part of that patch so I don't see how we can use these results.
> You're changing the number of threads used for raster as part of that patch so I > don't see how we can use these results. Hmm. Good catch. I will change it to default num threads (so 1 foreground and 1 dynamic background), but to use background threads, I need to disable gpu rasterization. Any idea on gpu vs. cpu rasterization benefit. On Android, I verified priority changed takes effect.
On 2016/03/10 18:07:28, prashant.n wrote: > > You're changing the number of threads used for raster as part of that patch so > I > > don't see how we can use these results. > > Hmm. Good catch. I will change it to default num threads (so 1 foreground and 1 > dynamic background), but to use background threads, I need to disable gpu > rasterization. Any idea on gpu vs. cpu rasterization benefit. > > On Android, I verified priority changed takes effect. GPU rasterization can give a fair amount of benefit over CPU, depending on the content - in tests I've done (on desktop), GPU rasterization can give similar or better performance than multi-thread SW raster, with lower overall power usage. That said, there is still some content for which GPU raster is problematic. I'm currently doing work that will allow GPU raster to effectively use background threads for image decode, which should make your investigation valid for both cases. That said, as long as you disable GPU for both runs you're comparing, the comparison should be valid.
> I'm currently doing work that will allow GPU raster to effectively use > background threads for image decode, which should make your investigation valid > for both cases. > I was thinking of creating raster tasks with low priority as background non-concurrent when gpu rasterization is enabled, so that while using GPU too, we don't waste CPU, GPU cycles much for background tasks.
Results for 1 foreground thread and 1 dynamic thread at https://codereview.chromium.org/1787453003/ For easy reference - android_s5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_one http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus6 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus5 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03... android_nexus7 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-03...
Have a look at the patch, it's not final though.
https://codereview.chromium.org/1739993004/diff/140001/cc/raster/task_graph_r... File cc/raster/task_graph_runner.cc (right): https://codereview.chromium.org/1739993004/diff/140001/cc/raster/task_graph_r... cc/raster/task_graph_runner.cc:8: #include <iomanip> Remove. https://codereview.chromium.org/1739993004/diff/140001/cc/raster/task_graph_r... cc/raster/task_graph_runner.cc:23: void Task::AttachWorker(TaskWorker* worker) { AttachTaskWorker/DetachTaskWorker
Description was changed from ========== Test dynamic priorities for raster threads. Implement dynamic priorities for raster threads depending on task category. Having only one background thread would not be able to use remaining CPU cores when available. This patch implements - 1 normal priority thread not changing priority dedicated for foreground tasks. - N background priority threads which change the priority depending on task being executed. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== content: Implement dynamic priorities for raster threads. With current implementation for raster threads, suppose background task is being run by background priority thread and if this task is rescheduled as foreground task, then executing the task by raster thread with same priority delays the completion of task. This patch implements changing the priority of raster thread while task is being executed by that thread, so that task execution can be speeded up or slowed down. The raster thread's priority is changed if already running task is rescheduled with new category. This patch handles speeding up of background task, which is rescheduled as foreground task and implementation is supported only on Android platform as of now. This patch implements - N static priority threads having normal priority, - 1 dynamic priority thread having background priority. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
prashant.n@samsung.com changed reviewers: + jam@chromium.org
PTAL, I'll implement win/linux/freebsd and slowing down task in subsequent patches.
IMO the current version of this patch is much too complicated for the benefit it's hoping to provide. This needs to be reduced to a small implementation detail of the RasterWorkerPool class before we can consider it. The PlatformThread API changes should be in a separate patch.
On 2016/03/15 01:49:30, reveman wrote: > IMO the current version of this patch is much too complicated for the benefit > it's hoping to provide. This needs to be reduced to a small implementation > detail of the RasterWorkerPool class before we can consider it. The > PlatformThread API changes should be in a separate patch. Actually decision of skipping of already running task is taken in task_graph_work_queue.cc and work queue contains only tasks (it is unaware of threads), so task needs to have information by which thread it is being executed (which in this patch have been abstracted by TaskWorker). With current implementation extending it to other than raster tasks would be easier. I'll check if we can have minimal changes. I'll create separate patches for platform thread changes and for this.
While writing unit test and verifying it on linux, I got that on linux we cannot change priority due to not having sufficient previledges. However on Android it works well.
On 2016/03/15 at 13:46:37, prashant.n wrote: > While writing unit test and verifying it on linux, I got that on linux we cannot change priority due to not having sufficient previledges. > > However on Android it works well. That's what I was suspecting. I don't think is worthwhile unless it works across all platforms. I'd rather see work being done to reduce the size of tasks.
> That's what I was suspecting. I don't think is worthwhile unless it works across > all platforms. But at least on Android or (windows, I'll check this soon) it works well to improve perf. > I'd rather see work being done to reduce the size of tasks. Yes. For RequestSlowdown I thought to work on this. After this patch I'm going to work on that.
As per https://code.google.com/p/chromium/codesearch#chromium/src/base/threading/pla..., it would work on Windows. So there are 2 platforms which can support this - Android and Windows. Do let me know if it okay to continue with this. Anyways I'm focusing more on reducing tasks and maintaining states of it.
> I'm focusing more on reducing tasks and maintaining states of it. task size*
On 2016/03/16 at 05:29:14, prashant.n wrote: > > I'm focusing more on reducing tasks and maintaining states of it. > task size* I would prefer if we didn't use different logic for different platforms unless absolutely necessary. Let's focus on reducing the size of tasks and we can revisit these type of dynamic thread priority adjustments later if necessary. Thanks for being thorough and checking what platforms this is and is not supported on.
On 2016/03/16 14:59:15, reveman wrote: > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > I'm focusing more on reducing tasks and maintaining states of it. > > task size* > > I would prefer if we didn't use different logic for different platforms unless > absolutely necessary. Let's focus on reducing the size of tasks and we can > revisit these type of dynamic thread priority adjustments later if necessary. > Thanks for being thorough and checking what platforms this is and is not > supported on. Yes. The logic I was thinking for slow down can also be used for speed up, which will make changes in non-platform layer code. I was thinking of splitting raster buffer playback into small atomic task/work items and storing needed data in raster task. These work items can be executed one at time (we can progressively make sure that work items are small so that running it would consume few cpu/gpu cycles.) and before executing next work item check whether to continue/cancel/reschedule the task. If to be rescheduled, the same raster task can be worked upon by another appropriate priority thread. Here we'll keep thread priorities predefined. Only we need to implement raster task's work items to be executed by different threads, sequentially. e.g. If there are 5 work items for the given raster task, and suppose 2 work items are executed by current thread and 3rd is being executed and suppose raster task's category is changed at the same time, then current thread can continue to work on only on 3rd item, 4 and 5 would be scheduled to be run by needed priority thread.) If this looks okay, I'll create a prototype for it.
prashant.n@samsung.com changed reviewers: - jam@chromium.org
On 2016/03/16 at 17:18:47, prashant.n wrote: > On 2016/03/16 14:59:15, reveman wrote: > > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > > I'm focusing more on reducing tasks and maintaining states of it. > > > task size* > > > > I would prefer if we didn't use different logic for different platforms unless > > absolutely necessary. Let's focus on reducing the size of tasks and we can > > revisit these type of dynamic thread priority adjustments later if necessary. > > Thanks for being thorough and checking what platforms this is and is not > > supported on. > > Yes. The logic I was thinking for slow down can also be used for speed up, which will make changes in non-platform layer code. > > I was thinking of splitting raster buffer playback into small atomic task/work items and storing needed data in raster task. These work items can be executed one at time (we can progressively make sure that work items are small so that running it would consume few cpu/gpu cycles.) and before executing next work item check whether to continue/cancel/reschedule the task. If to be rescheduled, the same raster task can be worked upon by another appropriate priority thread. Here we'll keep thread priorities predefined. Only we need to implement raster task's work items to be executed by different threads, sequentially. > > e.g. If there are 5 work items for the given raster task, and suppose 2 work items are executed by current thread and 3rd is being executed and suppose raster task's category is changed at the same time, then current thread can continue to work on only on 3rd item, 4 and 5 would be scheduled to be run by needed priority thread.) > > If this looks okay, I'll create a prototype for it. Raster tasks are not typically that large. I think it would be much better to focus on image decode tasks as they can be really large. Note that to split tile raster into multiple tasks we need to first refactor or remove the TileTaskRunner interface.
On 2016/03/16 18:20:51, reveman wrote: > On 2016/03/16 at 17:18:47, prashant.n wrote: > > On 2016/03/16 14:59:15, reveman wrote: > > > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > > > I'm focusing more on reducing tasks and maintaining states of it. > > > > task size* > > > > > > I would prefer if we didn't use different logic for different platforms > unless > > > absolutely necessary. Let's focus on reducing the size of tasks and we can > > > revisit these type of dynamic thread priority adjustments later if > necessary. > > > Thanks for being thorough and checking what platforms this is and is not > > > supported on. > > > > Yes. The logic I was thinking for slow down can also be used for speed up, > which will make changes in non-platform layer code. > > > > I was thinking of splitting raster buffer playback into small atomic task/work > items and storing needed data in raster task. These work items can be executed > one at time (we can progressively make sure that work items are small so that > running it would consume few cpu/gpu cycles.) and before executing next work > item check whether to continue/cancel/reschedule the task. If to be rescheduled, > the same raster task can be worked upon by another appropriate priority thread. > Here we'll keep thread priorities predefined. Only we need to implement raster > task's work items to be executed by different threads, sequentially. > > > > e.g. If there are 5 work items for the given raster task, and suppose 2 work > items are executed by current thread and 3rd is being executed and suppose > raster task's category is changed at the same time, then current thread can > continue to work on only on 3rd item, 4 and 5 would be scheduled to be run by > needed priority thread.) > > > > If this looks okay, I'll create a prototype for it. > > Raster tasks are not typically that large. I think it would be much better to > focus on image decode tasks as they can be really large. > > Note that to split tile raster into multiple tasks we need to first refactor or > remove the TileTaskRunner interface. FYI, both vmpstr@ and myself are doing significant work to pull image decodes and scaling out of both SW and GPU raster tasks. It might make sense to hold off on trying to split up image decodes until after this work is completed. In software, splitting up image decodes/scaling may not be too tricky (at least scaling seems like it could be tiled). On the GPU side of things, splitting image decodes is a bit trickier. GPU image decode and upload is pretty much a black box to CC. This is by design, we don't want CC to worry about whether we are decoding to yuv, compressed textures, generating mips, etc.. Allowing this to be split up would involve pushing changes to the Skia API as well. Happy to hear thoughts you have in this area.
On 2016/03/16 18:48:08, ericrk wrote: > On 2016/03/16 18:20:51, reveman wrote: > > On 2016/03/16 at 17:18:47, prashant.n wrote: > > > On 2016/03/16 14:59:15, reveman wrote: > > > > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > > > > I'm focusing more on reducing tasks and maintaining states of it. > > > > > task size* > > > > > > > > I would prefer if we didn't use different logic for different platforms > > unless > > > > absolutely necessary. Let's focus on reducing the size of tasks and we can > > > > revisit these type of dynamic thread priority adjustments later if > > necessary. > > > > Thanks for being thorough and checking what platforms this is and is not > > > > supported on. > > > > > > Yes. The logic I was thinking for slow down can also be used for speed up, > > which will make changes in non-platform layer code. > > > > > > I was thinking of splitting raster buffer playback into small atomic > task/work > > items and storing needed data in raster task. These work items can be > executed > > one at time (we can progressively make sure that work items are small so that > > running it would consume few cpu/gpu cycles.) and before executing next work > > item check whether to continue/cancel/reschedule the task. If to be > rescheduled, > > the same raster task can be worked upon by another appropriate priority > thread. > > Here we'll keep thread priorities predefined. Only we need to implement raster > > task's work items to be executed by different threads, sequentially. > > > > > > e.g. If there are 5 work items for the given raster task, and suppose 2 work > > items are executed by current thread and 3rd is being executed and suppose > > raster task's category is changed at the same time, then current thread can > > continue to work on only on 3rd item, 4 and 5 would be scheduled to be run by > > needed priority thread.) > > > > > > If this looks okay, I'll create a prototype for it. > > > > Raster tasks are not typically that large. I think it would be much better to > > focus on image decode tasks as they can be really large. > > > > Note that to split tile raster into multiple tasks we need to first refactor > or > > remove the TileTaskRunner interface. > > FYI, both vmpstr@ and myself are doing significant work to pull image decodes > and scaling out of both SW and GPU raster tasks. It might make sense to hold off > on trying to split up image decodes until after this work is completed. > > In software, splitting up image decodes/scaling may not be too tricky (at least > scaling seems like it could be tiled). On the GPU side of things, splitting > image decodes is a bit trickier. GPU image decode and upload is pretty much a > black box to CC. This is by design, we don't want CC to worry about whether we > are decoding to yuv, compressed textures, generating mips, etc.. Allowing this > to be split up would involve pushing changes to the Skia API as well. > > Happy to hear thoughts you have in this area. +1 I'd prefer that we hold off on splitting up decoding work until the work we've been doing had a chance to become a bit more stable.
On 2016/03/16 18:52:06, vmpstr wrote: > On 2016/03/16 18:48:08, ericrk wrote: > > On 2016/03/16 18:20:51, reveman wrote: > > > On 2016/03/16 at 17:18:47, prashant.n wrote: > > > > On 2016/03/16 14:59:15, reveman wrote: > > > > > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > > > > > I'm focusing more on reducing tasks and maintaining states of it. > > > > > > task size* > > > > > > > > > > I would prefer if we didn't use different logic for different platforms > > > unless > > > > > absolutely necessary. Let's focus on reducing the size of tasks and we > can > > > > > revisit these type of dynamic thread priority adjustments later if > > > necessary. > > > > > Thanks for being thorough and checking what platforms this is and is not > > > > > supported on. > > > > > > > > Yes. The logic I was thinking for slow down can also be used for speed up, > > > which will make changes in non-platform layer code. > > > > > > > > I was thinking of splitting raster buffer playback into small atomic > > task/work > > > items and storing needed data in raster task. These work items can be > > executed > > > one at time (we can progressively make sure that work items are small so > that > > > running it would consume few cpu/gpu cycles.) and before executing next work > > > item check whether to continue/cancel/reschedule the task. If to be > > rescheduled, > > > the same raster task can be worked upon by another appropriate priority > > thread. > > > Here we'll keep thread priorities predefined. Only we need to implement > raster > > > task's work items to be executed by different threads, sequentially. > > > > > > > > e.g. If there are 5 work items for the given raster task, and suppose 2 > work > > > items are executed by current thread and 3rd is being executed and suppose > > > raster task's category is changed at the same time, then current thread can > > > continue to work on only on 3rd item, 4 and 5 would be scheduled to be run > by > > > needed priority thread.) > > > > > > > > If this looks okay, I'll create a prototype for it. > > > > > > Raster tasks are not typically that large. I think it would be much better > to > > > focus on image decode tasks as they can be really large. > > > > > > Note that to split tile raster into multiple tasks we need to first refactor > > or > > > remove the TileTaskRunner interface. > > > > FYI, both vmpstr@ and myself are doing significant work to pull image decodes > > and scaling out of both SW and GPU raster tasks. It might make sense to hold > off > > on trying to split up image decodes until after this work is completed. > > > > In software, splitting up image decodes/scaling may not be too tricky (at > least > > scaling seems like it could be tiled). On the GPU side of things, splitting > > image decodes is a bit trickier. GPU image decode and upload is pretty much a > > black box to CC. This is by design, we don't want CC to worry about whether we > > are decoding to yuv, compressed textures, generating mips, etc.. Allowing this > > to be split up would involve pushing changes to the Skia API as well. > > > > Happy to hear thoughts you have in this area. > > +1 I'd prefer that we hold off on splitting up decoding work until the work > we've been doing had a chance to become a bit more stable. I was thinking to split up raster tasks first as I saw they consume much time in heavy sites like 55bbs.com, theverge.com, etc (many times > 16.66 ms). May be this work can go in parallel with what you have been working on for image decode tasks.
On 2016/03/17 02:17:05, prashant.n wrote: > On 2016/03/16 18:52:06, vmpstr wrote: > > On 2016/03/16 18:48:08, ericrk wrote: > > > On 2016/03/16 18:20:51, reveman wrote: > > > > On 2016/03/16 at 17:18:47, prashant.n wrote: > > > > > On 2016/03/16 14:59:15, reveman wrote: > > > > > > On 2016/03/16 at 05:29:14, prashant.n wrote: > > > > > > > > I'm focusing more on reducing tasks and maintaining states of it. > > > > > > > task size* > > > > > > > > > > > > I would prefer if we didn't use different logic for different > platforms > > > > unless > > > > > > absolutely necessary. Let's focus on reducing the size of tasks and we > > can > > > > > > revisit these type of dynamic thread priority adjustments later if > > > > necessary. > > > > > > Thanks for being thorough and checking what platforms this is and is > not > > > > > > supported on. > > > > > > > > > > Yes. The logic I was thinking for slow down can also be used for speed > up, > > > > which will make changes in non-platform layer code. > > > > > > > > > > I was thinking of splitting raster buffer playback into small atomic > > > task/work > > > > items and storing needed data in raster task. These work items can be > > > executed > > > > one at time (we can progressively make sure that work items are small so > > that > > > > running it would consume few cpu/gpu cycles.) and before executing next > work > > > > item check whether to continue/cancel/reschedule the task. If to be > > > rescheduled, > > > > the same raster task can be worked upon by another appropriate priority > > > thread. > > > > Here we'll keep thread priorities predefined. Only we need to implement > > raster > > > > task's work items to be executed by different threads, sequentially. > > > > > > > > > > e.g. If there are 5 work items for the given raster task, and suppose 2 > > work > > > > items are executed by current thread and 3rd is being executed and suppose > > > > raster task's category is changed at the same time, then current thread > can > > > > continue to work on only on 3rd item, 4 and 5 would be scheduled to be run > > by > > > > needed priority thread.) > > > > > > > > > > If this looks okay, I'll create a prototype for it. > > > > > > > > Raster tasks are not typically that large. I think it would be much better > > to > > > > focus on image decode tasks as they can be really large. > > > > > > > > Note that to split tile raster into multiple tasks we need to first > refactor > > > or > > > > remove the TileTaskRunner interface. > > > > > > FYI, both vmpstr@ and myself are doing significant work to pull image > decodes > > > and scaling out of both SW and GPU raster tasks. It might make sense to hold > > off > > > on trying to split up image decodes until after this work is completed. > > > > > > In software, splitting up image decodes/scaling may not be too tricky (at > > least > > > scaling seems like it could be tiled). On the GPU side of things, splitting > > > image decodes is a bit trickier. GPU image decode and upload is pretty much > a > > > black box to CC. This is by design, we don't want CC to worry about whether > we > > > are decoding to yuv, compressed textures, generating mips, etc.. Allowing > this > > > to be split up would involve pushing changes to the Skia API as well. > > > > > > Happy to hear thoughts you have in this area. > > > > +1 I'd prefer that we hold off on splitting up decoding work until the work > > we've been doing had a chance to become a bit more stable. > > I was thinking to split up raster tasks first as I saw they consume much time in > heavy sites like http://55bbs.com, http://theverge.com, etc (many times > 16.66 ms). May be > this work can go in parallel with what you have been working on for image decode > tasks. There definitely may be sites with slow raster tasks, but note that depending on the settings (medium filter quality, gpu raster, etc...), we may be decoding / scaling images within raster tasks, making them look longer than expected. We should make sure this isn't the case before doing too much work here. Try tracing these sites with the "blink" and "cc.debug" categories - if there are image decodes happening during raster, you should see them with those categories enabled. Thanks!
> There definitely may be sites with slow raster tasks, but note that depending on > the settings (medium filter quality, gpu raster, etc...), we may be decoding / > scaling images within raster tasks, making them look longer than expected. We > should make sure this isn't the case before doing too much work here. Try > tracing these sites with the "blink" and "cc.debug" categories - if there are > image decodes happening during raster, you should see them with those categories > enabled. Thanks! I had checked with tracing that on an average 1 raster task takes 0.5 ms to run (on platform linux i7, 12 GB ram) and on slower devices like samsung galaxy s4, it even consumes more. My point for raster tasks is - if we get on an average 0.1 ms from 1 running task, we could get at max 0.4 ms with 4 raster threads while tasks are recategorized as background tasks, which might give us 1 needed raster task run, which could reduce white patches when not all "now" tiles are rasterized within frame time. I'll analyse more soon.
On 2016/03/19 02:57:47, prashant.n wrote: > > There definitely may be sites with slow raster tasks, but note that depending > on > > the settings (medium filter quality, gpu raster, etc...), we may be decoding / > > scaling images within raster tasks, making them look longer than expected. We > > should make sure this isn't the case before doing too much work here. Try > > tracing these sites with the "blink" and "cc.debug" categories - if there are > > image decodes happening during raster, you should see them with those > categories > > enabled. Thanks! > > I had checked with tracing that on an average 1 raster task takes 0.5 ms to run > (on platform linux i7, 12 GB ram) and on slower devices like samsung galaxy s4, > it even consumes more. > > My point for raster tasks is - if we get on an average 0.1 ms from 1 running > task, we could get at max 0.4 ms with 4 raster threads while tasks are > recategorized as background tasks, which might give us 1 needed raster task run, > which could reduce white patches when not all "now" tiles are rasterized within > frame time. > > I'll analyse more soon. I'm not sure how you're proposing to split raster tasks into smaller tasks (if that's what you're proposing)? We can have smaller raster tasks by making smaller tiles. However, we have found that there is a fairly expensive overhead associated with every new raster task. In fact, it is usually faster to have larger tiles. If I'm misunderstanding things, please let me know.
On 2016/03/21 at 19:55:17, vmpstr wrote: > On 2016/03/19 02:57:47, prashant.n wrote: > > > There definitely may be sites with slow raster tasks, but note that depending > > on > > > the settings (medium filter quality, gpu raster, etc...), we may be decoding / > > > scaling images within raster tasks, making them look longer than expected. We > > > should make sure this isn't the case before doing too much work here. Try > > > tracing these sites with the "blink" and "cc.debug" categories - if there are > > > image decodes happening during raster, you should see them with those > > categories > > > enabled. Thanks! > > > > I had checked with tracing that on an average 1 raster task takes 0.5 ms to run > > (on platform linux i7, 12 GB ram) and on slower devices like samsung galaxy s4, > > it even consumes more. > > > > My point for raster tasks is - if we get on an average 0.1 ms from 1 running > > task, we could get at max 0.4 ms with 4 raster threads while tasks are > > recategorized as background tasks, which might give us 1 needed raster task run, > > which could reduce white patches when not all "now" tiles are rasterized within > > frame time. > > > > I'll analyse more soon. > > I'm not sure how you're proposing to split raster tasks into smaller tasks (if that's what you're proposing)? We can have smaller raster tasks by making smaller tiles. However, we have found that there is a fairly expensive overhead associated with every new raster task. In fact, it is usually faster to have larger tiles. If I'm misunderstanding things, please let me know. Yes, tasks would have to be split in a way that doesn't add significant overhead. X number of scanlines per task instead of smaller tiles is probably better but I'm not sure that's good enough. I'm guessing we'd have to make some changes to skia to make this work well.
On 2016/03/21 20:50:32, reveman wrote: > On 2016/03/21 at 19:55:17, vmpstr wrote: > > On 2016/03/19 02:57:47, prashant.n wrote: > > > > There definitely may be sites with slow raster tasks, but note that > depending > > > on > > > > the settings (medium filter quality, gpu raster, etc...), we may be > decoding / > > > > scaling images within raster tasks, making them look longer than expected. > We > > > > should make sure this isn't the case before doing too much work here. Try > > > > tracing these sites with the "blink" and "cc.debug" categories - if there > are > > > > image decodes happening during raster, you should see them with those > > > categories > > > > enabled. Thanks! > > > > > > I had checked with tracing that on an average 1 raster task takes 0.5 ms to > run > > > (on platform linux i7, 12 GB ram) and on slower devices like samsung galaxy > s4, > > > it even consumes more. > > > > > > My point for raster tasks is - if we get on an average 0.1 ms from 1 running > > > task, we could get at max 0.4 ms with 4 raster threads while tasks are > > > recategorized as background tasks, which might give us 1 needed raster task > run, > > > which could reduce white patches when not all "now" tiles are rasterized > within > > > frame time. > > > > > > I'll analyse more soon. > > > > I'm not sure how you're proposing to split raster tasks into smaller tasks (if > that's what you're proposing)? We can have smaller raster tasks by making > smaller tiles. However, we have found that there is a fairly expensive overhead > associated with every new raster task. In fact, it is usually faster to have > larger tiles. If I'm misunderstanding things, please let me know. > > Yes, tasks would have to be split in a way that doesn't add significant > overhead. X number of scanlines per task instead of smaller tiles is probably > better but I'm not sure that's good enough. I'm guessing we'd have to make some > changes to skia to make this work well. Yes. we need to do skia changes. Creating small tiles is not going to give ultimate solution when the number of layers grows to larger. I've shared my approach in google doc. Kindly have a look at it.
On 2016/03/21 20:50:32, reveman wrote: > On 2016/03/21 at 19:55:17, vmpstr wrote: > > On 2016/03/19 02:57:47, prashant.n wrote: > > > > There definitely may be sites with slow raster tasks, but note that > depending > > > on > > > > the settings (medium filter quality, gpu raster, etc...), we may be > decoding / > > > > scaling images within raster tasks, making them look longer than expected. > We > > > > should make sure this isn't the case before doing too much work here. Try > > > > tracing these sites with the "blink" and "cc.debug" categories - if there > are > > > > image decodes happening during raster, you should see them with those > > > categories > > > > enabled. Thanks! > > > > > > I had checked with tracing that on an average 1 raster task takes 0.5 ms to > run > > > (on platform linux i7, 12 GB ram) and on slower devices like samsung galaxy > s4, > > > it even consumes more. > > > > > > My point for raster tasks is - if we get on an average 0.1 ms from 1 running > > > task, we could get at max 0.4 ms with 4 raster threads while tasks are > > > recategorized as background tasks, which might give us 1 needed raster task > run, > > > which could reduce white patches when not all "now" tiles are rasterized > within > > > frame time. > > > > > > I'll analyse more soon. > > > > I'm not sure how you're proposing to split raster tasks into smaller tasks (if > that's what you're proposing)? We can have smaller raster tasks by making > smaller tiles. However, we have found that there is a fairly expensive overhead > associated with every new raster task. In fact, it is usually faster to have > larger tiles. If I'm misunderstanding things, please let me know. > > Yes, tasks would have to be split in a way that doesn't add significant > overhead. X number of scanlines per task instead of smaller tiles is probably > better but I'm not sure that's good enough. I'm guessing we'd have to make some > changes to skia to make this work well. Yes. we need to do skia changes. Creating small tiles is not going to give ultimate solution when the number of layers grows to larger. I've shared my approach in google doc. Kindly have a look at it.
> ultimate solution when the number of layers grows to larger or recorded operations are more...
> ultimate solution when the number of layers grows to larger or recorded operations are more...
From the attached traces (scrolled theverge.com from top to bottom), it can be found that few skia playback operations take longer time (upto 5 ms). If we split this as per proposed approach, we can easily speed up or slow down working on tasks. The attached skia_plaback_traces.png image shows playback operation which took longer time (> 5 ms), but playing back individual recorded operation takes max upto ~ 1ms. I'm still analysing. May be we can focus on split up first delimited by recorded op playback and later concentrate on making recorded op playback smaller/splitted. (To implement this we need to split up the skpicture::playback api and many ::Raster functions in cc/. The implementation may depend on Threadticks, which is not supported on iOS.)
On 2016/03/23 at 13:21:50, prashant.n wrote: > From the attached traces (scrolled theverge.com from top to bottom), it can be found that few skia playback operations take longer time (upto 5 ms). If we split this as per proposed approach, we can easily speed up or slow down working on tasks. The attached skia_plaback_traces.png image shows playback operation which took longer time (> 5 ms), but playing back individual recorded operation takes max upto ~ 1ms. I'm still analysing. May be we can focus on split up first delimited by recorded op playback and later concentrate on making recorded op playback smaller/splitted. (To implement this we need to split up the skpicture::playback api and many ::Raster functions in cc/. The implementation may depend on Threadticks, which is not supported on iOS.) I'm skeptical. Sounds like a lot of work and a lot of added complexity for a relatively small gain.
> I'm skeptical. Sounds like a lot of work and a lot of added complexity for a > relatively small gain. I believe this approach would help more as slowing down or cancelling not needed tasks would make benefit a lot along with speeding up the tasks, which is not currently handled in codebase. With this approach concurrency would increase. And yes there's lot of work. But I think without we solve these kinda proglems, heavy pages would not have better visible scrolling (less white patches).
On 2016/03/23 18:46:03, prashant.n wrote: > > I'm skeptical. Sounds like a lot of work and a lot of added complexity for a > > relatively small gain. > > > I believe this approach would help more as slowing down or cancelling not needed > tasks would make benefit a lot along with speeding up the tasks, which is not > currently handled in codebase. With this approach concurrency would increase. > And yes there's lot of work. But I think without we solve these kinda proglems, > heavy pages would not have better visible scrolling (less white patches). A few questions: - Can you attach the trace of theverge.com? I'd be curious to look at it. - What platform / Version of Chrome are you tracing with? Is this with Gpu raster on or off?
> A few questions: > - Can you attach the trace of theverge.com? I'd be curious to look at it. > - What platform / Version of Chrome are you tracing with? Is this with Gpu > raster on or off? I'd attached traces from theverge.com only, but my codebase was little old. I'll attach traces soon on ToT.
> I'd attached traces from http://theverge.com only, https://codereview.chromium.org/1739993004/patch/260001/270002 are the traces I added, but I took only SkBigPicture playback splitup traces for verification purpose. Platform I'd used is Linux (ubuntu 14.04), i7 (4 cores, 8 HT) 16GB RAM. Here the worst case time for raster task is ~5ms, while on Galaxy s4 I've seen it > 500 ms few times. Detailed traces I'll be taking on Monday. I'm thinking of adding Skia patch for splitup playback. Do let me know your opinion on that. Once that is verified we can decide whether to implement further or not. Can you pls. point me out how to take skia traces? (I've hard coded SkEventTracer::SetInstance(new skia::SkChromiumEventTracer()); to take the traces.)
Skia changes used for testing, are uploaded at https://codereview.chromium.org/1828233003/
> A few questions: > - Can you attach the trace of theverge.com? I'd be curious to look at it. > - What platform / Version of Chrome are you tracing with? Is this with Gpu > raster on or off? Traces (cc, cc.debug) for theverge.com (scrolled top to bottom) are attached in crbug/598145. galaxy_s4_theverge.com_trace.json - Android - (GPU raster off) trace_desktop_theverge.com_trace.json.gz - Desktop - Linux ubuntu 14.04, intel i7, 16 GB RAM (GPU raster off) The following 3 different traces added to check whether task is rescheduled (patch set 15). Task::Rescheduled::Speedup Task::Rescheduled::Slowdown Task::Rescheduled::Same From traces it can be found that there are many speedup/slowdown reschedules.
There is mistake in the patchset 15, I'll take data once again.
Patchset #16 (id:300001) has been deleted
The different traces added are Task::Rescheduled::Speedup, Task::Rescheduled::Slowdown, Task::Rescheduled::Same, Task::Scheduled::New, Task::Rescheduled::Cancelled. 1. With patch-set 16, I observed that by multiple runs on theverge.com, the count of [rescheduled category is different (speedup/slowdown) OR task itself is not needed (cancelled)] is in the range (0.5% to 4%) of the total tasks scheduled. The number looks very small to make the changes to be done as per approach proposed. Also with progressive rastering, there will be some cost added to check the lock every threshold time. So as pointed by reveman@, change would be too complicated for small gain. But with this concurrency would be improved. Also on low end devices, rescheduling/cancelling tasks would help. 2. As per benchmark runs with smoothness tests it showed better results with patchset 7. With 1 and 2, I'm little confused. What is your opinion on continueing this task?
Pls. ignore the traces at crbug/598145.
I checked touch pinch zoom cases, touch scolling cases with traces in patchset 16 on Galaxy s4 device and observed following - 1. Flinging shows speedups and slowdowns [0.2 to 2%] of total tasks. 2. Pinch zoom shows cancels [1 to 4.5%] of total tasks. So the task slitup would help in these gestures. |