DescriptionStop doing unnecessary UTF-8 to UTF-16 conversions in JSONWriter.
The JSONReader only accepts UTF-8 input strings and converts \uXXXX sequences
back into UTF-8. However, the JSONWriter converts all non-ASCII characters to
UTF-16 escape sequences. This round-tripping is sub-optimal, as noted in a
TODO from r54359.
One reason for this may be that JsonDoubleQuote(), used by JSONWriter, does not
handle UTF-8 bytes correctly, interpreting them as code points and writing them
out as \u00XX sequences. If this were read back through a RFC-compliant JSON
parser, the result would be an invalid encoding error. JsonDoubleQuote() does
handle UTF-16 correctly, though.
This rewrites the base/json/string_escape.h API and fixes the above UTF-8 issue
by dividing callers up into three groups:
1. Those that pass valid UTF-8 to be escaped. Prior to this change, very few
callers used this variant. Those that did were likely using ASCII, otherwise
the output would be mangled due to the above issue. Now, valid UTF-8 will be
passed through to the output unescaped. Invalid UTF-8 sequences are replaced
with U+FFFD.
2. Those that pass valid UTF-16 to be escaped. This function now validates that
the input is valid UTF-16, and then converts it to unescaped UTF-8 sequences
for the output.
3. Those that pass arbitrary byte arrays as std::string and expect a non-RFC-
compliant encoding of the binary data using \uXXXX escapes. This behavior is
now in the EscapeBytesAsInvalidJSONString() function. It is only used by
callers who want a "debug string" but do not expect to actually parse the
output as valid JSON, since it is not.
Additionally, this removes the JSONWriter::OPTIONS_DO_NOT_ESCAPE flag, since
the writer can now handle UTF-8 appropriately.
BUG=15466
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=239800
Reverted: https://src.chromium.org/viewvc/chrome?view=rev&revision=240082
R=asanka@chromium.org, bauerb@chromium.org, mark@chromium.org, thakis@chromium.org, zea@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=240190
Patch Set 1 #Patch Set 2 : '' #
Total comments: 42
Patch Set 3 : Address review comments #
Total comments: 6
Patch Set 4 : Nits #
Total comments: 5
Patch Set 5 : Self-nit #Patch Set 6 : NetUtilTest.GetDirectoryListingEntry #Patch Set 7 : Fix ChromeOS page encodings #Messages
Total messages: 26 (0 generated)
|