|
|
Created:
9 years, 1 month ago by satorux1 Modified:
9 years, 1 month ago CC:
chromium-reviews, Paweł Hajdan Jr., brettw-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd FilePath::FromUTF8Unsafe() and FilePath::AsUTF8Unsafe().
The logic is moved from value_conversions.cc.
FilePath::FromUTF8Unsafe() should only be used when you are
sure that the input string is UTF-8. See the function comments
for why they have "Unsafe" in their names.
BUG=none
TEST=base_unittests
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=108246
Patch Set 1 #
Total comments: 5
Patch Set 2 : Add Unsafe to names and add more comments and tests #
Total comments: 6
Patch Set 3 : address comments #
Total comments: 2
Patch Set 4 : address comments #
Messages
Total messages: 20 (0 generated)
Hi Mark, Tony suggested you would be the right reviewer for this.
Read the header at the points I’ve marked and if you still think this change is a good idea, explain why. Better, read the entire header before coming to any conclusions. http://codereview.chromium.org/8402008/diff/1/base/file_path.h File base/file_path.h (right): http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode13 base/file_path.h:13: // Encoding unspecified* UTF-16 * http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode18 base/file_path.h:18: // * The encoding need not be specified on POSIX systems, although some * http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode32 base/file_path.h:32: // wrap _wfopen_s, perhaps both by calling file_path.value().c_str(). This * http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode266 base/file_path.h:266: // On Linux, although it can use any 8-bit encoding for paths, we assume that *
On 2011/10/26 23:29:13, Mark Mentovai wrote: > Read the header at the points I’ve marked and if you still think this change is > a good idea, explain why. I looked at points, but it wasn't obvious for me why this patch was wrong. Could you elaborate a bit? The newly added functions are named with "UTF8", so the clients should only use these functions when they know the input strings are UTF-8. The motivation is to eliminate #ifdef'ed code as described in the patch description. Even worse, the code looks subtly wrong, as it assumes the file system to be UTF-8 on POSIX. There are many places in our code base where #ifdef is used around FilePath creation. I think we should fix this. Per output from grep -C3 FilePath.*UTF8To **/*.cc. chrome/browser/download/download_prefs.cc-#if defined(OS_POSIX) chrome/browser/download/download_prefs.cc- FilePath path(extensions[i]); chrome/browser/download/download_prefs.cc-#elif defined(OS_WIN) chrome/browser/download/download_prefs.cc: FilePath path(UTF8ToWide(extensions[i])); chrome/browser/download/download_prefs.cc-#endif chrome/browser/parsers/metadata_parser_jpeg_factory.cc-#if defined(OS_WIN) chrome/browser/parsers/metadata_parser_jpeg_factory.cc: FilePath::StringType ext = UTF8ToWide(std::string(".jpg")); chrome/browser/parsers/metadata_parser_jpeg_factory.cc-#elif defined(OS_POSIX) chrome/browser/parsers/metadata_parser_jpeg_factory.cc- FilePath::StringType ext = ".jpg"; chrome/browser/ui/views/select_file_dialog.cc-#if defined(OS_WIN) chrome/browser/ui/views/select_file_dialog.cc: FilePath path(base::SysUTF8ToWide(path_string)); chrome/browser/ui/views/select_file_dialog.cc-#else chrome/browser/ui/views/select_file_dialog.cc- FilePath path( chrome/browser/ui/views/select_file_dialog.cc- base::SysWideToNativeMB(base::SysUTF8ToWide(path_string))); chrome/browser/web_applications/web_app.cc-#if defined(OS_WIN) chrome/browser/web_applications/web_app.cc: return FilePath(UTF8ToWide(app_name)); chrome/browser/web_applications/web_app.cc-#elif defined(OS_POSIX) chrome/browser/web_applications/web_app.cc- return FilePath(app_name); chrome/browser/web_applications/web_app.cc-#endif chrome/common/extensions/extension_file_util.cc-#if defined(OS_POSIX) chrome/common/extensions/extension_file_util.cc- FilePath(file_path); chrome/common/extensions/extension_file_util.cc-#elif defined(OS_WIN) chrome/common/extensions/extension_file_util.cc: FilePath(UTF8ToWide(file_path)); chrome/common/logging_chrome.cc-#if defined(OS_WIN) chrome/common/logging_chrome.cc: return FilePath(UTF8ToWide(filename)); chrome/common/logging_chrome.cc-#elif defined(OS_POSIX) chrome/common/logging_chrome.cc- return FilePath(filename); chrome/common/logging_chrome.cc-#endif content/browser/download/save_package.cc-#if defined(OS_POSIX) content/browser/download/save_package.cc- FilePath::StringType mime_type(contents_mime_type); content/browser/download/save_package.cc-#elif defined(OS_WIN) content/browser/download/save_package.cc: FilePath::StringType mime_type(UTF8ToWide(contents_mime_type)); content/browser/download/save_package.cc-#endif // OS_WIN media/filters/file_data_source.cc-#if defined(OS_WIN) media/filters/file_data_source.cc: FilePath file_path(UTF8ToWide(url)); media/filters/file_data_source.cc-#else media/filters/file_data_source.cc- FilePath file_path(url); media/filters/file_data_source.cc-#endif net/base/mime_util.cc-#if defined(OS_WIN) net/base/mime_util.cc: FilePath::StringType extension(UTF8ToWide(this_extensions[j])); net/base/mime_util.cc-#else net/base/mime_util.cc- FilePath::StringType extension(this_extensions[j]); net/base/mime_util.cc-#endif third_party/leveldatabase/env_chromium.cc-#if defined(OS_WIN) third_party/leveldatabase/env_chromium.cc: return FilePath(UTF8ToUTF16(file_path)); third_party/leveldatabase/env_chromium.cc-#else third_party/leveldatabase/env_chromium.cc- return FilePath(file_path); third_party/leveldatabase/env_chromium.cc-#endif webkit/fileapi/file_system_directory_database.cc- info->data_path = FilePath(data_path); webkit/fileapi/file_system_directory_database.cc- info->name = name; webkit/fileapi/file_system_directory_database.cc-#elif defined(OS_WIN) webkit/fileapi/file_system_directory_database.cc: info->data_path = FilePath(base::SysUTF8ToWide(data_path)); webkit/fileapi/file_system_directory_database.cc- info->name = base::SysUTF8ToWide(name); webkit/fileapi/file_system_operation.cc: *virtual_path = FilePath(UTF8ToWide(temp)).NormalizeWindowsPathSeparators(); webkit/fileapi/file_system_operation.cc-#else webkit/fileapi/file_system_operation.cc- *virtual_path = FilePath(path.path()); webkit/fileapi/file_system_operation.cc-#endif ebkit/fileapi/file_system_operation.cc: *virtual_path = FilePath(UTF8ToWide(temp)).NormalizeWindowsPathSeparators(); webkit/fileapi/file_system_operation.cc-#else webkit/fileapi/file_system_operation.cc- *virtual_path = FilePath(path.path()); webkit/fileapi/file_system_operation.cc-#endif ... there are many more under webkit/ > > Better, read the entire header before coming to any conclusions. > > http://codereview.chromium.org/8402008/diff/1/base/file_path.h > File base/file_path.h (right): > > http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode13 > base/file_path.h:13: // Encoding unspecified* UTF-16 > * > > http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode18 > base/file_path.h:18: // * The encoding need not be specified on POSIX systems, > although some > * > > http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode32 > base/file_path.h:32: // wrap _wfopen_s, perhaps both by calling > file_path.value().c_str(). This > * > > http://codereview.chromium.org/8402008/diff/1/base/file_path.h#newcode266 > base/file_path.h:266: // On Linux, although it can use any 8-bit encoding for > paths, we assume that > *
We don’t know what encoding pathnames are in. We don’t know that it’s got anything to do with SysNativeMB and friends.
On 2011/10/26 23:51:48, Mark Mentovai wrote: > We don’t know what encoding pathnames are in. We don’t know that it’s > got anything to do with SysNativeMB and friends. I know that we don't exactly know what encoding is used on the file system on POSIX, but the comment says: >> but in practice, the locale's character set may be used. I thought SysNativeMB and friends do conversion between the locale's character set and unicode, but I may be wrong. I'm totally fine with a different approach. Do you have an idea about how to fix the large amount of #ifdef'ed code in our codebase?
FWIW, there are plenty of bugs in the existing codebase that assumes that filenames on Linux are utf8. A bunch of stuff won't work if you're on a different encoding. However, I've never seen a single bug report about this. I think all linux distros use utf8 and have been doing that for years. Adding this helper method and using SysNativeMB seems extra precautious and would be better than just letting developers pass utf8 encoded std::strings into the FilePath constructor which is what's been happening for the past few years.
On 2011/10/27 17:33:06, tony wrote: > FWIW, there are plenty of bugs in the existing codebase that assumes that > filenames on Linux are utf8. A bunch of stuff won't work if you're on a > different encoding. However, I've never seen a single bug report about this. > > I think all linux distros use utf8 and have been doing that for years. Adding > this helper method and using SysNativeMB seems extra precautious and would be > better than just letting developers pass utf8 encoded std::strings into the > FilePath constructor which is what's been happening for the past few years. I think tony made a good point. We'll eventually be able to say we only support UTF-8 file names on POSIX, but until then having SysNativeMB seems to be extra precautious. Mark, do you think this patch is a bad idea? I thought having the helper methods is better than letting developers write #ifdef-code and pass UTF-8 directly for POSIX. I didn't write detailed function comments, but we can put ones like we have at FromWStringHack(). I'll ask jshin@ for his opinion from the viewpoint of I18N.
Jungshik, could you take a look at the change as well as the code review thread, and share your thoughts on this?
On 2011/10/28 16:58:20, satorux1 wrote: > Jungshik, could you take a look at the change as well as the code review thread, > and share your thoughts on this? Sorry for the late reply (I was off). I mostly agree with what Satoru said in his comments. As Mark mentioned, what's used in filesystem on Linux is any body's guess and it's not necessarily the encoding assumed by SysNativeMB* (= nl_langinfo(CODESET)). However, I think what's done in the CL is reasonable. (it does not introduce any behavior change, does it? ) Given what Linux distribution have been doing, we may want to 'declare' that we won't support anything other than UTF-8 in file system (and locale) 'soon'. (on ChromeOS, we do because we have the control...).
http://codereview.chromium.org/8402008/diff/1/base/file_path_unittest.cc File base/file_path_unittest.cc (right): http://codereview.chromium.org/8402008/diff/1/base/file_path_unittest.cc#newc... base/file_path_unittest.cc:1054: EXPECT_EQ("foo.txt", path.AsUTF8()); Would you consider adding some non-trivial cases here?
If this is a direction we head in, can we give it an Unsafe or Hack name, like we did with FromWStringHack, to discourage use? It’d be OK to accommodate existing callers on a no-change-but-let’s-consolidate basis, but I do want to discourage ongoing new uses.
glib already assumes filename encoding is utf8 unless you set G_FILENAME_ENCODING or G_BROKEN_FILENAMES. Which is to say, even if we get all the uses correct without the utf8hack methods, it still won't work with gnome (e.g., in the file picker dialog). Alternately, we could also obey the G_FILENAME_ENCODING and G_BROKEN_FILENAMES environment variables and not have Hack in our function names. See http://developer.gnome.org/glib/2.30/glib-running.html .
Thank you all for the feedback. Per Mark's comment, I've changed the file names to FromUTF8Unsafe() and AsUTF8Unsafe() for now, with detailed comments about why they have Unsafe in their names. Please check the wording. I'm willing to revise per feedback. I've also added non-trivial test cases per Jungshik's comments. Now running try bots to see if these pass on all platforms. Re G_FILENAME_ENCODING and G_BROKEN_FILENAMES, I didn't feel like adding more hacks for the legacy non-UTF-8 file names. Thank you for the information anyway.
LGTM
Try results look good. Mark, please take another look. Thanks,
http://codereview.chromium.org/8402008/diff/11001/base/file_path.cc File base/file_path.cc (right): http://codereview.chromium.org/8402008/diff/11001/base/file_path.cc#newcode531 base/file_path.cc:531: return WideToUTF8(base::SysNativeMBToWide(value())); Per the comment I made in the header, avoid SysNativeMBToWide on Mac, and perhaps on Chrome OS. http://codereview.chromium.org/8402008/diff/11001/base/file_path.h File base/file_path.h (right): http://codereview.chromium.org/8402008/diff/11001/base/file_path.h#newcode300 base/file_path.h:300: // This function is *unsafe* as we cannot tell what encoding is used in Please avoid using “we” in comments. Line 307 and perhaps elsewhere too. http://codereview.chromium.org/8402008/diff/11001/base/file_path.h#newcode304 base/file_path.h:304: // POSIX per assumption that the current locale's encoding is used in SysNativeMBToWide is unnecessary (and in some cases wrong) on Mac OS X and Chrome OS which are defined to use UTF-8.
http://codereview.chromium.org/8402008/diff/11001/base/file_path.cc File base/file_path.cc (right): http://codereview.chromium.org/8402008/diff/11001/base/file_path.cc#newcode531 base/file_path.cc:531: return WideToUTF8(base::SysNativeMBToWide(value())); On 2011/11/01 21:30:46, Mark Mentovai wrote: > Per the comment I made in the header, avoid SysNativeMBToWide on Mac, and > perhaps on Chrome OS. Done. http://codereview.chromium.org/8402008/diff/11001/base/file_path.h File base/file_path.h (right): http://codereview.chromium.org/8402008/diff/11001/base/file_path.h#newcode300 base/file_path.h:300: // This function is *unsafe* as we cannot tell what encoding is used in On 2011/11/01 21:30:46, Mark Mentovai wrote: > Please avoid using “we” in comments. Line 307 and perhaps elsewhere too. Done. http://codereview.chromium.org/8402008/diff/11001/base/file_path.h#newcode304 base/file_path.h:304: // POSIX per assumption that the current locale's encoding is used in On 2011/11/01 21:30:46, Mark Mentovai wrote: > SysNativeMBToWide is unnecessary (and in some cases wrong) on Mac OS X and > Chrome OS which are defined to use UTF-8. Done.
http://codereview.chromium.org/8402008/diff/11003/base/file_path.h File base/file_path.h (right): http://codereview.chromium.org/8402008/diff/11003/base/file_path.h#newcode304 base/file_path.h:304: // SysNativeMBToWide() on POSIX systems where the file name encoding is Not on all POSIX systems.
http://codereview.chromium.org/8402008/diff/11003/base/file_path.h File base/file_path.h (right): http://codereview.chromium.org/8402008/diff/11003/base/file_path.h#newcode304 base/file_path.h:304: // SysNativeMBToWide() on POSIX systems where the file name encoding is On 2011/11/01 21:53:31, Mark Mentovai wrote: > Not on all POSIX systems. Done.
LGTM |