|
|
Created:
6 years, 9 months ago by mathiasb Modified:
6 years, 6 months ago CC:
blink-reviews, sof, eae+blinkwatch, dglazkov+blink, adamk+blink_chromium.org, Inactive, rwlbuis, hsivonen_iki.fi Base URL:
https://chromium.googlesource.com/chromium/blink@master Visibility:
Public. |
DescriptionAdd use counter for UTF-16 as a Web-exposed encoding
R=jsbell@chromium.org
BUG=
Patch Set 1 #
Messages
Total messages: 16 (0 generated)
Seems fine to me, but not an OWNER. I assume you expect the difference between UTF-16 little and big endianness to be small enough to not measure them separately? Don't forget to update "tools/metrics/histograms/histograms.xml" in Chromium with the new value too :-).
See https://codereview.chromium.org/182733006 for the corresponding histograms update.
I'm not an OWNER. Do we also want to hook up the counter in XMLHttpRequest::didReceiveData ? Can any other resource types appear as UTF-16 (e.g. CSS, script, ...) ?
Measuring TextResourceDecoder::setEncoding() might be a good idea as well. I don't know when that's used, but presumably at least if you XHR an UTF-16 text file. To be sure that we're measuring the correct thing, it's probably helpful to think about what changes would be required to drop UTF-16 support and measure at those points. Possibly adding UTF-16 to textEncodingNameBlacklist in TextEncodingRegistry.cpp would be the way to do it, but that's kind of low-level and I don't know all the call sites that could behave differently with that change.
Measuring UTF-32 at the same time would be nice.
I'm confused as to what you're trying to measure. We have data from a 100M site web-crawl last fall which showed UTF16 content accounting for 0.005% of the crawl. UTF8 was 69%. I presume that Henri was more interested in things like codeAtPoint() or other APIs which expose string guts as UTF16 codepoints?
https://groups.google.com/a/chromium.org/forum/#!searchin/blink-dev/visual$20... discussed encoding frequency (and is a much more interesting encoding to remove from blink IMO). Here is the data from a 100M site crawl last fall: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AidRaO7Awc-DdG1IS...
On 2014/03/04 17:46:29, eseidel wrote: > I presume that Henri was more interested in things like codeAtPoint() or other > APIs which expose string guts as UTF16 codepoints? I don’t think that’s what he meant; the way strings are exposed in JavaScript definitely cannot be changed without breaking the web. This is about documents that end up getting decoded as UTF-16.
Having another supported encoding is basically 0 cost to Blink with the exception of visual hebrew (which is both an encoding and a bidi-algorithm-disabler). I wouldn't bother removing UTF16 from the web, despite its' 0 usage.
On Tue, Mar 4, 2014 at 8:52 PM, <eseidel@chromium.org> wrote: > Having another supported encoding is basically 0 cost to Blink I can't speak for Blink, but supporting ASCII-incompatible encodings in a way that doesn't expose XSS vulnerabilities is most definitely not zero-cost. If UTF-16 looks zero-cost to you, maybe it's not as secure in Blink as in Gecko. I encourage you to also add a use counter for hz-gb-2312. -- Henri Sivonen hsivonen@hsivonen.fi https://hsivonen.fi/ To unsubscribe from this group and stop receiving emails from it, send an email to blink-reviews+unsubscribe@chromium.org.
I now better understand to what you were referring. Yes, UTF-16, and any encoding which represents <script>, etc. with non-ascii bytes, make server and network work extra hard. Once these strings get into the engine, we have little work. I'd be supportive of UseCounting and removing support for any encodings which don't represent the bytes "<script>" the same as ASCII as a supported network encoding.
Is there still desire to pursue this (given that we have other ways go gather stats about encoding usage), or can we close this? (Just tidying my "Incoming Reviews" list)
It looks like thsi bug is lacking an owner. CCing SYD folks as they've historically been very gung-ho about UseCounting (and produced thusly some awesome stats!). but w/o an owner we'll need to close this.
On 2014/05/29 00:42:15, eseidel wrote: > It looks like thsi bug is lacking an owner. CCing SYD folks as they've > historically been very gung-ho about UseCounting (and produced thusly some > awesome stats!). but w/o an owner we'll need to close this. I'm all for UseCounting things, but I'm not completely up on what would need to be done here. If it's what Eric suggested with "removing support for any encodings which don't represent the bytes "<script>" the same as ASCII as a supported network encoding." - I don't understand why this would need a UseCounter. Can't we just figure this out offline? Or is it just that we do already know what the encodings are, we just want to know how much of the web we'd break when removing them?
We have encoding data (see comment 8 in this CL) we don't have blink usage data of those encodings. You're right that I'm not sure we need it to justify removing support for decoding deprecated formats. |