base/string_util.h - Issue 147038: Pass through non-character codepoints in UTF-8,16,32 and Wide conversion func...

Keyboard Shortcuts

	File
u :	up to issue
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line

	Issue
u :	up to list of issues
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue

Unified Diff: base/string_util.h

Issue 147038: Pass through non-character codepoints in UTF-8,16,32 and Wide conversion func... (Closed) Base URL: svn://chrome-svn/chrome/trunk/src/

Patch Set: '' Created 11 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: base/string_util.h

===================================================================

--- base/string_util.h (revision 19007)

+++ base/string_util.h (working copy)

@@ -186,6 +186,13 @@

// do the best it can and put the result in the output buffer. The versions that

// return strings ignore this error and just return the best conversion

// possible.

+//

+// Note that only the structural validity is checked and non-character

+// codepoints and unassigned are regarded as valid.

+// TODO(jungshik): Consider replacing an invalid input sequence with

+// the Unicode replacement character or adding |replacement_char| parameter.

+// Currently, it's skipped in the ouput, which could be problematic in

+// some situations.

bool WideToUTF8(const wchar_t* src, size_t src_len, std::string* output);

std::string WideToUTF8(const std::wstring& wide);

bool UTF8ToWide(const char* src, size_t src_len, std::wstring* output);

@@ -250,6 +257,13 @@

// string be 8-bit or UTF8? It contains only characters that are < 256 (in the

// first case) or characters that use only 8-bits and whose 8-bit

// representation looks like a UTF-8 string (the second case).

+//

+// Note that IsStringUTF8 checks not only if the input is structrually

+// valid but also if it doesn't contain any non-character codepoint

+// (e.g. U+FFFE). It's done on purpose because all the existing callers want

+// to have the maximum 'discriminating' power from other encodings. If

+// there's a use case for just checking the structural validity, we have to

+// add a new function for that.

bool IsString8Bit(const std::wstring& str);

bool IsStringUTF8(const std::string& str);

bool IsStringWideUTF8(const std::wstring& str);

« no previous file with comments | « base/file_util_unittest.cc ('k') | base/string_util_icu.cc » ('j') | no next file with comments »