|
|
Chromium Code Reviews|
Created:
4 years, 3 months ago by vmpstr Modified:
4 years ago CC:
alex clarke (OOO till 29th), blink-reviews, chromium-reviews, gavinp+loader_chromium.org, Nate Chapin, loading-reviews+fetch_chromium.org, tyoshino+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionBlink: Throttle progressively loaded images.
This patch ensures that when the ImageResource is being
loaded, we don't flush every time we receive new data.
Instead, we introduce a flush delay so that we wait a
specified amount of time between flushes (1 sec by default).
R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org
Committed: https://crrev.com/5715aa51ef8794ca6f671d55f5845281c44b148f
Cr-Commit-Position: refs/heads/master@{#432331}
Patch Set 1 #Patch Set 2 : wip: update #
Total comments: 6
Patch Set 3 : throttle-images: update #Patch Set 4 : throttle-images: update #
Total comments: 2
Patch Set 5 : throttle-images: update #
Total comments: 8
Patch Set 6 : throttle-images: update #Patch Set 7 : throttle-images: timer approach #Patch Set 8 : throttle-images: update #Patch Set 9 : throttle-images: update #Patch Set 10 : throttle-images: layout test fixes #Patch Set 11 : throttle-images: update #Patch Set 12 : throttle-images: update #Patch Set 13 : throttle-images: update #Patch Set 14 : throttle-images: update #Patch Set 15 : land: rebase #Messages
Total messages: 87 (47 generated)
Hey, this is roughly what I'm proposing for throttling how often we process images that are loading slowly. This seems to work on one example that I have pretty well, instead of doing ~50ms decode immediately followed by ~80ms decode, we just do one 80ms decode slightly later. It also only has one commit message when the image loads instead of 2. I'm having a bit of trouble finding a case that would take many seconds to load though. Anyway, let me know if this seems like a reasonable place to introduce this throttling, or should I be looking elsewhere.
Is the goal to throttle all image loads or just progressive jpegs? For example, should this affect a slow-loading rapidly-animating gif? Do we keep around the decoded data from the previous progressive decode? For example, if the page's layout is wriggling around and causing paints while we throttle the load, will this throttling help? We have a pretty advanced network scheduler and I bet we could just plug this type of logic in there. +cc skyostil and alexclarke who are our scheduling experts and might have thoughts.
On 2016/09/23 01:24:30, pdr. wrote: > Is the goal to throttle all image loads or just progressive jpegs? For example, > should this affect a slow-loading rapidly-animating gif? > > Do we keep around the decoded data from the previous progressive decode? For > example, if the page's layout is wriggling around and causing paints while we > throttle the load, will this throttling help? > > We have a pretty advanced network scheduler and I bet we could just plug this > type of logic in there. +cc skyostil and alexclarke who are our scheduling > experts and might have thoughts. I wonder if there's a way to restructure this so that updateImage is done during the next document lifecycle update? That way we could simply request a new frame in appendData and the Blink scheduler could figure out the appropriate frame rate for lifecycle updates based on what is happening globally.
I poked around the blink code a bit more and here are my thoughts... We certainly don't want to throttle gifs in this way and I think there's an easy check I can add here for "maybeAnimated" which would fallback to the current behavior. Also, instead of appending the data to the image and throttling update (which does help in pages that don't do relayout for other reasons), we can append the data into a separate container on the image. When we then do the flush, we can append that data into the actual image data and do an updateImage. This does cause an extra copy though, so I'm not sure that's a good approach. The problem with throttling this at the network scheduler level is that IMO the network layer should be delivering data as fast as possible and it's up to the client to decide what to do with that. The problem with doing this later in the pipeline is that it's basically equivalent except less direct: the LayoutImage code reacts to updateImage callback by invalidating. This invalidation can probably happen for any number of reasons and throttling that seems risky. WDYT?
On 2016/09/26 at 23:19:03, vmpstr wrote: > I poked around the blink code a bit more and here are my thoughts... We certainly don't want to throttle gifs in this way and I think there's an easy check I can add here for "maybeAnimated" which would fallback to the current behavior. Also, instead of appending the data to the image and throttling update (which does help in pages that don't do relayout for other reasons), we can append the data into a separate container on the image. When we then do the flush, we can append that data into the actual image data and do an updateImage. This does cause an extra copy though, so I'm not sure that's a good approach. > > The problem with throttling this at the network scheduler level is that IMO the network layer should be delivering data as fast as possible and it's up to the client to decide what to do with that. > > The problem with doing this later in the pipeline is that it's basically equivalent except less direct: the LayoutImage code reacts to updateImage callback by invalidating. This invalidation can probably happen for any number of reasons and throttling that seems risky. > > WDYT? The scheduler is more general than just network requests, but I agree that it's a big hammer for a low-level compositing issue. For the image gallery usecase, I think your cache+flush approach in Image itself sounds reasonable. Some misc questions I had when thinking about this: * If the compositor is asked to re-raster an unchanged image a second time, what decoded data is retained? Do we keep a copy of the decoded image by itself in discardable memory, only tiles, nothing? * What does a graph of image load %age vs decode time ms look like? * How are image decodes different from really-expensive-to-raster tiles? * Do we have any UMA data that could help the decisions here?
On 2016/09/27 05:18:56, pdr. wrote: > On 2016/09/26 at 23:19:03, vmpstr wrote: > > I poked around the blink code a bit more and here are my thoughts... We > certainly don't want to throttle gifs in this way and I think there's an easy > check I can add here for "maybeAnimated" which would fallback to the current > behavior. Also, instead of appending the data to the image and throttling update > (which does help in pages that don't do relayout for other reasons), we can > append the data into a separate container on the image. When we then do the > flush, we can append that data into the actual image data and do an updateImage. > This does cause an extra copy though, so I'm not sure that's a good approach. > > > > The problem with throttling this at the network scheduler level is that IMO > the network layer should be delivering data as fast as possible and it's up to > the client to decide what to do with that. > > > > The problem with doing this later in the pipeline is that it's basically > equivalent except less direct: the LayoutImage code reacts to updateImage > callback by invalidating. This invalidation can probably happen for any number > of reasons and throttling that seems risky. > > > > WDYT? > > The scheduler is more general than just network requests, but I agree that it's > a big hammer for a low-level compositing issue. > > > For the image gallery usecase, I think your cache+flush approach in Image itself > sounds reasonable. > > > Some misc questions I had when thinking about this: > * If the compositor is asked to re-raster an unchanged image a second time, what > decoded data is retained? Do we keep a copy of the decoded image by itself in > discardable memory, only tiles, nothing? > * What does a graph of image load %age vs decode time ms look like? > * How are image decodes different from really-expensive-to-raster tiles? > * Do we have any UMA data that could help the decisions here? In absense of UMA for this (for reasons we discussed offline), here is an example with simulated "regular 3g" connection. ToT: https://drive.google.com/open?id=0B5cKk7MxQ2LsOG4wWmFxcHZfYXM ToT+this patch: https://drive.google.com/open?id=0B5cKk7MxQ2LsV2RSM1lhY3NMT28 Here we significantly reduce the amount of cpu time used within the same real time slice. Do you think we should go ahead and land a patch similar to this (minus the animated gif case)?
On 2016/09/28 at 23:29:42, vmpstr wrote: > In absense of UMA for this (for reasons we discussed offline), here is an example with simulated "regular 3g" connection. > > ToT: > https://drive.google.com/open?id=0B5cKk7MxQ2LsOG4wWmFxcHZfYXM > > ToT+this patch: > https://drive.google.com/open?id=0B5cKk7MxQ2LsV2RSM1lhY3NMT28 > > Here we significantly reduce the amount of cpu time used within the same real time slice. > > Do you think we should go ahead and land a patch similar to this (minus the animated gif case)? (fun bug while looking at your videos: https://crbug.com/575245#c60) I'd be okay with implementing this today behind a flag. I think we should add a uma measurement, or determine existing metrics that will be affected, before turning the flag on so we can prove this is working. Are you still planning on going with the approach that throttles appendData instead of updateImage?
On 2016/09/29 02:44:22, pdr. wrote: > On 2016/09/28 at 23:29:42, vmpstr wrote: > > In absense of UMA for this (for reasons we discussed offline), here is an > example with simulated "regular 3g" connection. > > > > ToT: > > https://drive.google.com/open?id=0B5cKk7MxQ2LsOG4wWmFxcHZfYXM > > > > ToT+this patch: > > https://drive.google.com/open?id=0B5cKk7MxQ2LsV2RSM1lhY3NMT28 > > > > Here we significantly reduce the amount of cpu time used within the same real > time slice. > > > > Do you think we should go ahead and land a patch similar to this (minus the > animated gif case)? > > (fun bug while looking at your videos: https://crbug.com/575245#c60) O_O > > > I'd be okay with implementing this today behind a flag. I think we should add a > uma measurement, or determine existing metrics that will be affected, before > turning the flag on so we can prove this is working. > > Are you still planning on going with the approach that throttles appendData > instead of updateImage? We can implement this with an extra copy approach. The append-to-buffer-but-don't-update-size approach is going to be extra complicated, since this isn't a trivial buffer that we're appending to.
https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResource.cpp (right): https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:48: static double flushDelay = 1.; flushDelayInSeconds? Mumble mumble wtb base::TimeDelta. https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:258: updateImage(false); Can you leave a big ol' comment here saying this throttles invalidations, but if outside invalidations occur a new image will still be used? And that this could be fixed with an extra copy, but it's expected this shouldn't happen in practice. I think that'd be super useful for future code archeologists. I *do* think this approach could cause extra decodes potentially. It could also cause a different level of decoding to be used across multiple tiles (but I think that wouldn't look too weird, as it would eventually fix itself). It'd just be a hole. I think this is 100% ok as a first pass and I don't think it'll cause problems. I'd rather land something simple and make it more complicated later than making it more complicated to start with. https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResource.h (right): https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.h:185: double m_lastFlushTime; Just initialize here?
Description was changed from ========== wip: throttle image loading R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org ========== to ========== Blink: Throttle progressively loaded images. This patch ensures that when the ImageResource is being loaded, we don't flush every time we receive new data. Instead, we introduce a flush delay so that we wait a specified amount of time between flushes (1 sec by default). R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org ==========
I think this is ready for a formal review. PTAL. https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResource.cpp (right): https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:48: static double flushDelay = 1.; On 2016/09/29 22:49:34, enne wrote: > flushDelayInSeconds? > > Mumble mumble wtb base::TimeDelta. Done. https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:258: updateImage(false); On 2016/09/29 22:49:34, enne wrote: > Can you leave a big ol' comment here saying this throttles invalidations, but if > outside invalidations occur a new image will still be used? And that this could > be fixed with an extra copy, but it's expected this shouldn't happen in > practice. I think that'd be super useful for future code archeologists. > > I *do* think this approach could cause extra decodes potentially. It could also > cause a different level of decoding to be used across multiple tiles (but I > think that wouldn't look too weird, as it would eventually fix itself). It'd > just be a hole. > > I think this is 100% ok as a first pass and I don't think it'll cause problems. > I'd rather land something simple and make it more complicated later than making > it more complicated to start with. Actually looking closer at the code, I think even outside invalidations wouldn't see the new data. There's m_image that we actually grab and data() that lives on Resource base class. updateImage, other than informing the observers, also puts data() into m_image. So, if we don't call it, the outside observers should see the old image. https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResource.h (right): https://codereview.chromium.org/2361263003/diff/20001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.h:185: double m_lastFlushTime; On 2016/09/29 22:49:34, enne wrote: > Just initialize here? Done.
lgtm
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
@pdr, could you also take a look? Thanks.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
Overall this looks good. Can you add a test in "LayoutTests/http/tests/images" similar to png-progressive-load.html but where we track either repaints invalidations? I think this can be as simple as changing the setTimeout in runTest to 500ms and tracking invalidations like in "paint/invalidation/scroll-absolute-layer-with-reflection.html". https://codereview.chromium.org/2361263003/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResource.cpp (right): https://codereview.chromium.org/2361263003/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:51: static double flushDelaySeconds = 1.; Should we use constexpr and a k prefix here? E.g., "constexpr double kFlushDelaySeconds = 1.;"? https://codereview.chromium.org/2361263003/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResource.cpp:257: // For other images, only update at |flushDelaySeconds| intervals. Can you expand on this comment a bit? Similar to your comment above flushDelaySeconds, I'd like to make it clear that we're throttling the updateImage call to reduce invalidations.
Layout test with a timeout sounds unfortunate. If you want to test correctness here, is there a way to unit test this cleanly?
On 2016/10/05 16:54:08, enne wrote: > Layout test with a timeout sounds unfortunate. If you want to test correctness > here, is there a way to unit test this cleanly? Added a unittest (which caught an unfortunate refactor bug I introduced between patchsets). PTAL.
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
LGTM with a few minor comments https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp (right): https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:690: cachedImage->finish(); Why is this needed? https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:709: request.setLoFiState(WebURLRequest::LoFiOn); Is this needed? https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:726: size_t meaningfulImageSize = 280; Can you add a comment about where this number came from? https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:761: MockTime::time += 0.2001; If this ends up being flaky, we could just increment time by 0.1 within this loop, and then increment it by a full second after the } (still inside the flushCount loop).
https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp (right): https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:690: cachedImage->finish(); On 2016/10/05 23:24:25, pdr. wrote: > Why is this needed? Without this, we don't try to decode the image, so we get a status of Resource::Pending, instead of Resource::DecodeError. https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:709: request.setLoFiState(WebURLRequest::LoFiOn); On 2016/10/05 23:24:26, pdr. wrote: > Is this needed? Nope :) https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:726: size_t meaningfulImageSize = 280; On 2016/10/05 23:24:26, pdr. wrote: > Can you add a comment about where this number came from? Done. https://codereview.chromium.org/2361263003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/fetch/ImageResourceTest.cpp:761: MockTime::time += 0.2001; On 2016/10/05 23:24:26, pdr. wrote: > If this ends up being flaky, we could just increment time by 0.1 within this > loop, and then increment it by a full second after the } (still inside the > flushCount loop). I don't think it's flaky, it's just if I start with 10 and add 0.2 5 times, then it still evaluates as < 11. But yeah, there are other things we could try.
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks! LGTM still
The CQ bit was unchecked by vmpstr@chromium.org
The CQ bit was checked by vmpstr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from enne@chromium.org Link to the patchset: https://codereview.chromium.org/2361263003/#ps100001 (title: "throttle-images: update")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
Sorry for not replying to this earlier... * Could you describe the motivation for this change and why you think it won't hurt image decoding performance? * 1s seems like a huge delay for updating image decoding... Any reason it was picked over a lower value? * This change makes updateImage calls async even on the first buffer. Can we avoid that? * This change means that racy network conditions may trigger significant slowdowns in image decoding. While it may be OK to do so, there's a clear tradeoff here. Could you describe the reasoning?
On 2016/10/10 12:10:47, Yoav Weiss wrote: > Sorry for not replying to this earlier... > > * Could you describe the motivation for this change and why you think it won't > hurt image decoding performance? > * 1s seems like a huge delay for updating image decoding... Any reason it was > picked over a lower value? > * This change makes updateImage calls async even on the first buffer. Can we > avoid that? > * This change means that racy network conditions may trigger significant > slowdowns in image decoding. While it may be OK to do so, there's a clear > tradeoff here. Could you describe the reasoning? The reasoning here is to prevent continuous decodes that happen as a result of frequently updating the image. This saves quite a bit of CPU usage. Note that end-to-end image load time is still the same, since when the image is done loading we force a flush at the same time as before. In other words, if an image takes 100ms to load, then we'll flush it at 100ms and display it then. The main issue here is that a partially loaded image is treated as a different image down in the pipeline (in cc), which means that each flush causes a new image decode which can be slow. In the example I have in comment #6, you can see tot performance does 13 decodes ~39ms each. However, with the patch we do 4 decodes ~41ms each. I'm not sure what reason there would be to do the flush on the first packet, could you explain the reasoning?
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
vmpstr, your changes lgtm. It makes me sad to have a timer running, but I'm not sure how else to do this. It's not like ImageResource is well-positioned to hear when raf happens.
Another look has been taken here, since word was received of a new timer. This great patch, henceforth known as Patch The Fast, shaltow land quickly, save a review from Yoav The Great. LGTM, Godspeed!
Yoav, I'm going to wait for your approval as well, so please take a look.
On 2016/10/10 16:04:43, vmpstr wrote: > > The reasoning here is to prevent continuous decodes that happen as a result of > frequently updating the image. This saves quite a bit of CPU usage. Note that > end-to-end image load time is still the same, since when the image is done > loading we force a flush at the same time as before. In other words, if an image > takes 100ms to load, then we'll flush it at 100ms and display it then. OK. That's good. > > The main issue here is that a partially loaded image is treated as a different > image down in the pipeline (in cc), which means that each flush causes a new > image decode which can be slow. In the example I have in comment #6, you can see > tot performance does 13 decodes ~39ms each. However, with the patch we do 4 > decodes ~41ms each. I don't argue that 13 separate decodes is a Bad Thing™ (even if one could argue that this should be fixed further down the image decoding pipeline). What scares me is: * The image's dimensions are critical in many cases for layout to take place. Delaying the time in which the renderer knows the image dimensions could result in re-layouts and visible slowness. To make things worse, JPEGs with EXIF data can have their dimensions buried in an arbitrary byte offset from the file's start. So there's a good chance that this patch slows down the time in which we're aware of image dimensions. * Progressive JPEGs can have quality approaching the final image quality pretty before the entire file is received. Delaying that decoding can result in speedindex score degradation. (and real-life visual impact on users and their perception of the site's speed) I'm not saying we shouldn't land this, but I'd love to see: * Mechanisms that make sure image dimensions are always flushed ASAP * Mechanisms that make sure that for progressive JPEGs' layers are flushed within reason (Not necessarily a flush each layer to be flushed) * Experimental/synthetic data that shows this doesn't degrade real-life performance * 1 second flushing delay seems huge, but maybe above mechanisms are not necessary if delay is smaller If we're certain that it won't degrade performance (or willing to revert if we'd see that it does), I'm fine with landing this and adding those mechanisms later as necessary. > > I'm not sure what reason there would be to do the flush on the first packet, > could you explain the reasoning? Image dimensions on everything that's not an EXIF.
On 2016/10/12 06:59:06, Yoav Weiss wrote: > On 2016/10/10 16:04:43, vmpstr wrote: > > > > The reasoning here is to prevent continuous decodes that happen as a result of > > frequently updating the image. This saves quite a bit of CPU usage. Note that > > end-to-end image load time is still the same, since when the image is done > > loading we force a flush at the same time as before. In other words, if an > image > > takes 100ms to load, then we'll flush it at 100ms and display it then. > > OK. That's good. > > > > > The main issue here is that a partially loaded image is treated as a different > > image down in the pipeline (in cc), which means that each flush causes a new > > image decode which can be slow. In the example I have in comment #6, you can > see > > tot performance does 13 decodes ~39ms each. However, with the patch we do 4 > > decodes ~41ms each. > > I don't argue that 13 separate decodes is a Bad Thing™ (even if one could argue > that this should be fixed further down the image decoding pipeline). > > What scares me is: > * The image's dimensions are critical in many cases for layout to take place. > Delaying the time in which the renderer knows the image dimensions could result > in re-layouts and visible slowness. To make things worse, JPEGs with EXIF data > can have their dimensions buried in an arbitrary byte offset from the file's > start. So there's a good chance that this patch slows down the time in which > we're aware of image dimensions. > * Progressive JPEGs can have quality approaching the final image quality pretty > before the entire file is received. Delaying that decoding can result in > speedindex score degradation. (and real-life visual impact on users and their > perception of the site's speed) > > > I'm not saying we shouldn't land this, but I'd love to see: > * Mechanisms that make sure image dimensions are always flushed ASAP > * Mechanisms that make sure that for progressive JPEGs' layers are flushed > within reason (Not necessarily a flush each layer to be flushed) > * Experimental/synthetic data that shows this doesn't degrade real-life > performance > * 1 second flushing delay seems huge, but maybe above mechanisms are not > necessary if delay is smaller > > If we're certain that it won't degrade performance (or willing to revert if we'd > see that it does), I'm fine with landing this and adding those mechanisms later > as necessary. > > > > > I'm not sure what reason there would be to do the flush on the first packet, > > could you explain the reasoning? > > Image dimensions on everything that's not an EXIF. I'd MUCH rather see us move this logic upstream into the image processing and decoding pipeline. 1 second is quite a long time and things are going to feel significantly slower even if the end state is the same. Kind of like how we used to optimize for page load time but we've moved more towards getting as much on the screen as soon as possible. Progressive JPEGs are one case but also animated GIF's (and presumably WebP) and baseline images with a bunch of bloated metadata included in the actual file. I'd much rather see the image decoding pipeline be smart about the framing of the inbound images and do something more like: - Process dimensions as soon as the data is available - Decode "immediately" when a frame/scan boundary is passed - Allow for a slightly longer delay in the middle of data frames/chunks "Immediately" could mean scheduling the decode to happen on the vsync intervals so rapidly arriving scans don't get decoded more often than they can be displayed and "slightly longer" would hopefully be measured in units smaller than seconds (maybe intelligently based on the hardware it is running on). Granted, this may not work well when we are using libraries to do the image decoding and aren't processing the data frames ourselves but adding the logic to do that for the few supported image types seems worth the effort.
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I've updated the patch to update image immediately until Image::SizeAvailable is set. So, in effect, this should get the size when available, notify observers, then throttle at one second per update until finish. Also, images with maybeAnimated set as true don't throttle, which should handle gif animation cases. Do you think this is sufficient to address your concerns?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by vmpstr@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/10/13 21:21:44, vmpstr wrote: > I've updated the patch to update image immediately until Image::SizeAvailable is > set. So, in effect, this should get the size when available, notify observers, > then throttle at one second per update until finish. > > Also, images with maybeAnimated set as true don't throttle, which should handle > gif animation cases. > > Do you think this is sufficient to address your concerns? I'm still rather uncomfortable about progressive JPEGs in general and limiting decodes to once per second but I don't really have hard data one way or the other and I don't know how we'd go about measuring it either. Ultimately I'd still like to see us be aware of scans in progressive JPEGs and decode each time a full scan is complete since that needs to be done anyway (internally to JPEG). There is a lot of data showing that every millisecond of user-percieved performance matters (even if they don't realize they are seeing a difference) so the 1-second tradeoff is a bit worrisome. If it buys us faster overall loads because we can spend those cycles doing other work it could be worth it. If the trade-off is more for battery or general "lower CPU" then maybe not so much. Do we at least limit decodes to frame boundaries currently or are we decoding on literally every data chunk?
> There is a lot of data showing that every millisecond of user-percieved performance matters (even if they don't realize they are seeing a difference) so the 1-second tradeoff is a bit worrisome. If it buys us faster overall loads because we can spend those cycles doing other work it could be worth it. If the trade-off is more for battery or general "lower CPU" then maybe not so much. I don't agree with these concerns. This patch only applies when an image takes longer than a second to load. On most pages on good networks, this will have no effect whatsoever. However, on slow networks or with really large images, this makes a big difference to the performance of the page. To avoid caching problems, every time an image is painted with more information then the last time, the image gets a new id. On the compositor thread, it means that we can't reuse the decoded or scaled version of the image, and have to go through all the work of redecoding this slightly larger image data. This work increases latency of page updates; if the user is clicking as the page loads, the user-perceived performance will get demonstrably worse because handling that click is now put behind image decoding work all for a tiny bit more of a piece of image. I think that's the wrong tradeoff to make and it's ok to make long loading images update less frequently. I think the layout concerns that were brought up above were valid concerns. It's really important to layout the page correctly as soon as possible using image data. However, I'm not convinced that constantly paying a large cost to make small updates to slow loading images is the right thing to do, and this patch makes good tradeoffs in addressing that issue.
On 2016/10/17 17:22:37, enne wrote: > > There is a lot of data showing that every millisecond of user-percieved > performance matters (even if they don't realize they are seeing a difference) so > the 1-second tradeoff is a bit worrisome. If it buys us faster overall loads > because we can spend those cycles doing other work it could be worth it. If the > trade-off is more for battery or general "lower CPU" then maybe not so much. > > I don't agree with these concerns. This patch only applies when an image takes > longer than a second to load. On most pages on good networks, this will have no > effect whatsoever. However, on slow networks or with really large images, this > makes a big difference to the performance of the page. > > To avoid caching problems, every time an image is painted with more information > then the last time, the image gets a new id. On the compositor thread, it means > that we can't reuse the decoded or scaled version of the image, and have to go > through all the work of redecoding this slightly larger image data. This work > increases latency of page updates; if the user is clicking as the page loads, > the user-perceived performance will get demonstrably worse because handling that > click is now put behind image decoding work all for a tiny bit more of a piece > of image. I think that's the wrong tradeoff to make and it's ok to make long > loading images update less frequently. > > I think the layout concerns that were brought up above were valid concerns. > It's really important to layout the page correctly as soon as possible using > image data. However, I'm not convinced that constantly paying a large cost to > make small updates to slow loading images is the right thing to do, and this > patch makes good tradeoffs in addressing that issue. Here's an example where this patch helps smoothness. Before: https://drive.google.com/open?id=0B5cKk7MxQ2LsQ0ZoRFVScXBhUUk After: https://drive.google.com/open?id=0B5cKk7MxQ2Lsck5QaENTN1c5VDQ (note that for whatever reason the playback is sped up on my machine. if that's the case for you, you can adjust the playback speed by clicking the gears icon in the player) We also spend less time decoding all of the partial loads, although admittedly it seems that the decoder is caching some of the work since the decodes that happen with the patch are noticeably larger than without. That doesn't seem to be the case for all image types though. Trace snapshot before: https://drive.google.com/open?id=0B5cKk7MxQ2LsN3JVSEkzaFNHc3c Trace snapshot after: https://drive.google.com/open?id=0B5cKk7MxQ2LsMVU3OGhLRXp6blk
ping. Do you folks think this is good to land?
On 2016/10/25 18:03:52, vmpstr wrote: > ping. Do you folks think this is good to land? Yoav, Pat, do you think you have time to take a look at this again?
On 2016/11/08 20:16:49, vmpstr wrote: > On 2016/10/25 18:03:52, vmpstr wrote: > > ping. Do you folks think this is good to land? > > Yoav, Pat, do you think you have time to take a look at this again? I'm going to land this on Monday, if there are no objections.
lgtm
The CQ bit was checked by vmpstr@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pdr@chromium.org, enne@chromium.org, chrishtr@chromium.org Link to the patchset: https://codereview.chromium.org/2361263003/#ps280001 (title: "land: rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Blink: Throttle progressively loaded images. This patch ensures that when the ImageResource is being loaded, we don't flush every time we receive new data. Instead, we introduce a flush delay so that we wait a specified amount of time between flushes (1 sec by default). R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org ========== to ========== Blink: Throttle progressively loaded images. This patch ensures that when the ImageResource is being loaded, we don't flush every time we receive new data. Instead, we introduce a flush delay so that we wait a specified amount of time between flushes (1 sec by default). R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org ==========
Message was sent while issue was closed.
Committed patchset #15 (id:280001)
Message was sent while issue was closed.
Description was changed from ========== Blink: Throttle progressively loaded images. This patch ensures that when the ImageResource is being loaded, we don't flush every time we receive new data. Instead, we introduce a flush delay so that we wait a specified amount of time between flushes (1 sec by default). R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org ========== to ========== Blink: Throttle progressively loaded images. This patch ensures that when the ImageResource is being loaded, we don't flush every time we receive new data. Instead, we introduce a flush delay so that we wait a specified amount of time between flushes (1 sec by default). R=chrishtr@chromium.org, enne@chromium.org, pdr@chromium.org Committed: https://crrev.com/5715aa51ef8794ca6f671d55f5845281c44b148f Cr-Commit-Position: refs/heads/master@{#432331} ==========
Message was sent while issue was closed.
Patchset 15 (id:??) landed as https://crrev.com/5715aa51ef8794ca6f671d55f5845281c44b148f Cr-Commit-Position: refs/heads/master@{#432331}
Message was sent while issue was closed.
Too late but, I still think this is a bad idea. Can we at least file a bug with the decoding team and add a TODO here indicating that this is a temporary solution to a decoding issue, and once the decoding issue is resolved, we can rip out this mechanism?
Message was sent while issue was closed.
On 2016/12/14 at 10:18:21, yoav wrote: > Too late but, I still think this is a bad idea. Can we at least file a bug with the decoding team and add a TODO here indicating that this is a temporary solution to a decoding issue, and once the decoding issue is resolved, we can rip out this mechanism? I think this solution is reasonable long-term so there's a mismatch in our views here. I'd like to better understand your concerns. Would you be willing to file a bug with a concrete example/usecase we can discuss? |
