DescriptionUTF-16 Decoder: Convert unpaired surrogates to replacement characters
The decoder blithely passed any old 16-bit code unit through, in
violation of the Encoding standard. Surrogate pairs should go through
unscathed:
[ ... 0xD800 0xDC00 ... ] => [ ... U+D800 U+DC00 ... ]
But cases like these should result in replacement characters:
[ ... 0xD800 ... ] => [ ... U+FFFD ... ]
[ ... 0xDC00 ... ] => [ ... U+FFFD ... ]
[ ... 0xDC00 0xD800 ... ] => [ ... U+FFFD U+FFFD ... ]
This aligns Chrome's behavior with Firefox and Edge.
Note that flushing at the end of a stream remains a special case.
Streams terminating in the above sequences will not get replacements
emitted (current behavior). In addition, a lead surrogate appearing at
the end of a stream will now not be emitted, matching other browsers.
BUG=368904
R=jshin@chromium.org,foolip@chromium.org
Committed: https://crrev.com/9158f6d5f23e54cbce3748539c68cbfdce218bd4
Cr-Commit-Position: refs/heads/master@{#422929}
Patch Set 1 #
Total comments: 4
Patch Set 2 : Hoist DCHECK out of block #Patch Set 3 : Rebase, switch test to testharness #
Depends on Patchset: Messages
Total messages: 17 (9 generated)
|