|
|
Created:
4 years, 9 months ago by dalmirsilva Modified:
4 years, 8 months ago Base URL:
https://github.com/chromium/dom-distiller.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionRetain image sizes
Small images lost their dimension attributes and got bigger than they
were before the distillation.
Our solution is to retain width and height attributes from the original
image.
Contributors: marcelorcorrea@hp.com, dalmirsilva@hp.com
BUG=591143
R=mdjones@chromium.org, wychen@chromium.org
Committed: 84cfcd26c60ab6b1547dd4e08f35d236f566e5d1
Patch Set 1 #Patch Set 2 : Moving preserved attributes to the whitelist. #
Total comments: 13
Patch Set 3 : wychen's comments addressed #Patch Set 4 : comments addressed #Patch Set 5 : wychen's comment addressed #
Total comments: 13
Patch Set 6 : comments addressed #
Messages
Total messages: 20 (3 generated)
wychen@chromium.org changed reviewers: - kuan@chromium.org
Thanks for the patch! I'll take a look. BTW, it's recommended to send a message after adding the reviewers, or reviewers don't get notified.
In this particular case, I think tweaking the @media print CSS would be enough. Maybe max-height: 80vh? Saving the width and height might be needed for inline images, especially when their natural size is much larger than the display size. http://crbug.com/568408
Yes, limiting the max height would work for that specific example. But, there are several others examples in which the problem will remain, like these ones: http://366daysofpinterest.com/2012/04/03/day-63-diy-corona-glasses/ http://www.landbetweenthelakes.us/seendo/camping/hillman-ferry-campground/ http://www.nytimes.com/2016/02/28/magazine/managed-by-qs-good-jobs-gamble.htm... Although this CSS tweak is cheaper than retain width and height it is not a generic fix. On the other hand, retaining width and height only for inline images will force us to intruduce this code anyway and also an extra test to check whether the image is inline by computing the style for each image, which is expensive. The CL's approach works for every case.
On 2016/03/09 19:55:50, marcelorcorrea wrote: > Yes, limiting the max height would work for that specific example. > But, there are several others examples in which the problem > will remain, like these ones: > http://366daysofpinterest.com/2012/04/03/day-63-diy-corona-glasses/ > http://www.landbetweenthelakes.us/seendo/camping/hillman-ferry-campground/ > http://www.nytimes.com/2016/02/28/magazine/managed-by-qs-good-jobs-gamble.htm... > > Although this CSS tweak is cheaper than retain width and height it > is not a generic fix. OK. I'm convinced. Keeping the dimension sounds better than guessing their intention. > On the other hand, retaining width and height only for inline images will force > us > to intruduce this code anyway and also an extra test to check whether > the image is inline by computing the style for each image, which is > expensive. We only need to compute the style for the images we extracted. This is much cheaper than computing the style when traversing the DOM tree. > > The CL's approach works for every case.
https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:37: int height = e.getOffsetHeight(); offsetWidth/Height would be affected by padding, but I guess this might be a reasonable default if we can handle tags other than img. Might worth noting this in the comment. https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:43: width = imageElement.getWidth(); If only width or height is specified, the other one would be derived so that the aspect ratio remains the same. We need to honor that. https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:43: width = imageElement.getWidth(); What if the dimension specified in CSS is different? Have you considered using computed style instead of width/height attributes? I can see the argument on both sides. Some sites load images only when it's close to the view port, and I'm not sure how well this works for those late-loading images. https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:108: "width=\"0\" height=\"0\">" + If the dimension is not specified in the source, we might want to keep it that way. https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:211: "<h1 style=\"font-weight: folder\">" + The padding text can be simplified. We don't need the style here. It is to test style removal. You can refer to testImage(). https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:228: "width=\"200\" height=\"300\">"; Maybe another img with only width, and one with neither width and height? https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:232: assertEquals(0, result.getHeight()); We need to keep aspect ratio.
Addressing wychen's comments. https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:37: int height = e.getOffsetHeight(); On 2016/03/11 08:25:35, wychen wrote: > offsetWidth/Height would be affected by padding, but I guess this might be a > reasonable default if we can handle tags other than img. > Might worth noting this in the comment. Done. https://codereview.chromium.org/1754213004/diff/20001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:43: width = imageElement.getWidth(); On 2016/03/11 08:25:35, wychen wrote: > What if the dimension specified in CSS is different? Have you considered using > computed style instead of width/height attributes? > > I can see the argument on both sides. Some sites load images only when it's > close to the view port, and I'm not sure how well this works for those > late-loading images. We believe GWT uses getWidth/Height as native methods. And by its definition: "This property can also return the width of an image that has been styled with CSS." See more here: http://www.w3schools.com/jsref/prop_img_width.asp. Therefore, both properties should preserve the image aspect ratio and are, already, computed by the browser. https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:211: "<h1 style=\"font-weight: folder\">" + On 2016/03/11 08:25:35, wychen wrote: > The padding text can be simplified. We don't need the style here. It is to test > style removal. You can refer to testImage(). Done. https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:228: "width=\"200\" height=\"300\">"; On 2016/03/11 08:25:35, wychen wrote: > Maybe another img with only width, and one with neither width and height? Done. https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:232: assertEquals(0, result.getHeight()); On 2016/03/11 08:25:35, wychen wrote: > We need to keep aspect ratio. Aspect ratio will be kept. However, we could not reproduce it in Unit Test because it doesn't behave as the real DOM. In order to avoid confusion maybe we could just remove this assert. Do you have any suggestions on this?
https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:232: assertEquals(0, result.getHeight()); On 2016/03/14 18:28:06, dalmirsilva wrote: > On 2016/03/11 08:25:35, wychen wrote: > > We need to keep aspect ratio. > > Aspect ratio will be kept. However, we could not reproduce it in Unit Test > because it doesn't behave as the real DOM. > In order to avoid confusion maybe we could just remove this assert. > Do you have any suggestions on this? If it is not possible to test here, we can add browsertests in chrome. You can find some examples here: https://code.google.com/p/chromium/codesearch#search/&q=dom%20distiller%20bro... We'd like to at least cover cases like no dimension specified, (both/only one) of {width, height} attributes, specified by CSS in px, %, other units, and probably some more complicated cases.
On 2016/03/14 19:55:14, wychen wrote: > https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... > File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): > > https://codereview.chromium.org/1754213004/diff/20001/javatests/org/chromium/... > javatests/org/chromium/distiller/EmbedExtractorTest.java:232: assertEquals(0, > result.getHeight()); > On 2016/03/14 18:28:06, dalmirsilva wrote: > > On 2016/03/11 08:25:35, wychen wrote: > > > We need to keep aspect ratio. > > > > Aspect ratio will be kept. However, we could not reproduce it in Unit Test > > because it doesn't behave as the real DOM. > > In order to avoid confusion maybe we could just remove this assert. > > Do you have any suggestions on this? > > If it is not possible to test here, we can add browsertests in chrome. You can > find some examples here: > https://code.google.com/p/chromium/codesearch#search/&q=dom%20distiller%20bro... > > We'd like to at least cover cases like no dimension specified, (both/only one) > of {width, height} attributes, specified by CSS in px, %, other units, and > probably some more complicated cases. I agree, I'd like to see a test in chrome for this. Otherwise this change looks good.
First, it is possible that we have made some wrong assumptions: 1. We said that this unit test was running on some sort of GWT sandbox, not manipulating the real DOM. This is a wrong hypothesis. We did a investigation to know how these tests are run and it turns out they actually run inside a real chrome. There is this python script which launches the chrome browser (using selenium), issues a get request to out/package/test/data/war/test.html, this HTML includes domdistillerjstest.nocache.js file that have all tests inside, then calls the entry point defined by this JS, running the tests. 2. components_browsertests on Chrome doesn't seems to be a solution, it just includes the Dom Distiller tests in the binary, then run the tests in a very similar way we do on Dom Distiller itself (https://goo.gl/y0OI0m). Maybe we are missing something here, but it looks there is nothing special there. So, why did we think we were not dealing with the real DOM? Because when we used a existing image to be injected into the document's body and made a few assertions about its attributes like getting width/height and we got zeros. Now we know why, we got zeros because it is an async operation. When some image is inserted into the DOM, the browser performs the necessary I/O asynchronously and the test fails because we are not waiting the browser to finish the operation. This is a big problem, there isn't any infrastructure for async test on Dom Distiller (at least we didn't find it). However, if we pre-cache the images (or any resource) before running the test, the image is ready at the moment we use it, passing the tests. To preload resources, one possible way is to put the HTML tags that include them inside test.html. But, if we use a local image in our tests, this image should be sent to the components_browsertests binary in order to have the tests passing there. It is not a elegant solution, but we are running out of alternatives here. Does all of this make sense? Do you guys have some more specific suggestion on that? PS: using a real local image, we confirmed that the image's aspect ratio is being kept.
I didn't suggest running in DomDistillerJsTest.RunJsTests. There are many other browsertests, like this one: https://code.google.com/p/chromium/codesearch#chromium/src/components/dom_dis... The key difference is that in browsertest, it is possible to wait for these IO to finish, and then do the assertions. If it is possible to accomplish all the tests we want in dom distiller unit test, it would be best. Have you tried base64 encoded data URI as images? This might be a possible approach.
What a good idea. It worked!
https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:42: // the real image dimension. "Real" is kind of ambiguous here. Maybe "displayed"? https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:47: width = imageElement.getWidth(); Do we always know the size at this point? Late loaded images might not have dimensions yet. Some sites only load images when it's close to the viewport. We might want to handle 0 somewhere, just in case. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:108: "width=\"0\" height=\"0\">" + I still don't feel 0 is right. If we know nothing about the dimension, maybe just show it in natural size. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:214: "<p style=\"\">" + Why specifying style? https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:233: "width=\"200\" height=\"0\">" + 0 handling. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:25: final String IMAGE_BASE64 = "data:image/png;base64,iVBORw0KGgo" + Add a comment about what this looks like, especially the natural dimension. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:212: public void testImageExtractorWithWidthHeightAttributes() { Add a few more tests to handle 0.
comments addressed https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:42: // the real image dimension. On 2016/03/18 08:20:25, wychen wrote: > "Real" is kind of ambiguous here. Maybe "displayed"? Done. https://codereview.chromium.org/1754213004/diff/80001/java/org/chromium/disti... java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:47: width = imageElement.getWidth(); On 2016/03/18 08:20:25, wychen wrote: > Do we always know the size at this point? Late loaded images might not have > dimensions yet. Some sites only load images when it's close to the viewport. We > might want to handle 0 somewhere, just in case. Done. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:108: "width=\"0\" height=\"0\">" + On 2016/03/18 08:20:25, wychen wrote: > I still don't feel 0 is right. If we know nothing about the dimension, maybe > just show it in natural size. Done. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:214: "<p style=\"\">" + On 2016/03/18 08:20:25, wychen wrote: > Why specifying style? Done. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/ContentExtractorTest.java:233: "width=\"200\" height=\"0\">" + On 2016/03/18 08:20:25, wychen wrote: > 0 handling. Done. https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... File javatests/org/chromium/distiller/EmbedExtractorTest.java (right): https://codereview.chromium.org/1754213004/diff/80001/javatests/org/chromium/... javatests/org/chromium/distiller/EmbedExtractorTest.java:212: public void testImageExtractorWithWidthHeightAttributes() { On 2016/03/18 08:20:25, wychen wrote: > Add a few more tests to handle 0. Done.
lgtm
lgtm
On 2016/03/16 19:45:54, dalmirdasilva wrote: > What a good idea. It worked! Glad it worked, and we don't have to touch Chromium just to test this CL. By the way, in the hind sight, the reason you had misunderstanding of the unit test partly came from the lack of documentation. The core team knew this for granted, and it became challenge for us to see what's not trivial for external contributors to know. If someone with a new perspective like you guys could points these out in the docs, it would be enormously helpful for other external contributors to have better understanding of this project. Please feel free to modify README.md and amend ambiguous or missing parts if you want.
Description was changed from ========== Fix for big images after distillation Small images lost their dimension attributes and got bigger than they were before the distillation. Our solution is to retain width and height attributes from the original image. Contributors: marcelorcorrea@hp.com, dalmirsilva@hp.com BUG=591143 R=wychen@chromium.org, kuan@chromium.org, mdjones@chromium.org ========== to ========== Retain image sizes Small images lost their dimension attributes and got bigger than they were before the distillation. Our solution is to retain width and height attributes from the original image. Contributors: marcelorcorrea@hp.com, dalmirsilva@hp.com BUG=591143 ==========
Description was changed from ========== Retain image sizes Small images lost their dimension attributes and got bigger than they were before the distillation. Our solution is to retain width and height attributes from the original image. Contributors: marcelorcorrea@hp.com, dalmirsilva@hp.com BUG=591143 ========== to ========== Retain image sizes Small images lost their dimension attributes and got bigger than they were before the distillation. Our solution is to retain width and height attributes from the original image. Contributors: marcelorcorrea@hp.com, dalmirsilva@hp.com BUG=591143 R=mdjones@chromium.org, wychen@chromium.org Committed: 84cfcd26c60ab6b1547dd4e08f35d236f566e5d1 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001) manually as 84cfcd26c60ab6b1547dd4e08f35d236f566e5d1 (presubmit successful). |