|
|
Chromium Code Reviews|
Created:
11 years, 5 months ago by Roland Modified:
9 years, 7 months ago Reviewers:
Avi (use Gerrit), Erik does not do reviews, jungshik at Google, Evan Stade, Mark Mentovai, brettw CC:
chromium-reviews_googlegroups.com, darin (slow to review), brettw, Paul Godavari, willchan no longer on Chromium, Ben Goodger (Google), Nico, brettw+cc_chromium.org, ben+cc_chromium.org, Paweł Hajdan Jr. Base URL:
http://src.chromium.org/svn/trunk/src/ Visibility:
Public. |
DescriptionFix for bug 10876 that resulted in some refactoring:
The bug originates from extensions being treated case sensitive on Windows and Mac OSX, where they shouldn't be.
Therefore I added generic static methods to FilePath to compare strings in the same way the file system does, and changed the relevant parts of the code to make use of them.
I tested the methods under Windows and Mac OS X. I also wrote a basic version for Linux/Posix that behaves the same way as the original code, so there should at least be no regression.
Also, while fixing this I found some confusion in the code about whether extensions are used with or without leading dot. For this reason I changed some functions that were taking an extension as parameter to instead take the whole file path. This makes calling these functions easier and the caller doesn't need to know whether the extension is supposed to be with or without dot.
In the same vein, I split DownloadManager::IsExecutable into IsExecutableFile, where one again passes in the whole file and doesn't have to worry about getting the extension right, and IsExecutableExtension, which corresponds to the original functionality. Ideally only the former method should be public, but that again would have required further code scrubbing that was (even more) outside of the original bug fix.
Finally, fixed a wrong comment in the file path tests.
BUG=10876
TEST=FilePathTest.MatchesExtension, .CompareIgnoreCase
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=30323
Patch Set 1 #Patch Set 2 : '' #
Total comments: 11
Patch Set 3 : '' #
Total comments: 8
Patch Set 4 : BUG=10876... #
Total comments: 2
Patch Set 5 : '' #
Total comments: 3
Patch Set 6 : '' #
Total comments: 10
Patch Set 7 : '' #Patch Set 8 : '' #
Total comments: 12
Patch Set 9 : '' #
Total comments: 1
Patch Set 10 : '' #
Messages
Total messages: 39 (0 generated)
"Simple" bug fix for bug #10876 that resulted in a bit of file extension handling refactoring. (updated patch set to pacify lint.) Cheers, Roland
I added markmentovai to the review and cc'd avi, both of whom will almost certainly have comments on much of this. I tend to think that the model of pulling out the extension and doing comparisons puts extra logic burden on the caller and instead the comparison logic should be in FilePath. http://codereview.chromium.org/149796/diff/1020/24 File base/file_path.h (right): http://codereview.chromium.org/149796/diff/1020/24#newcode174 Line 174: StringType Extension(bool include_dot = true) const; style guide doesn't allow default arguments. Create a second function. http://codereview.chromium.org/149796/diff/1020/24#newcode181 Line 181: StringType CanonicalizedExtension(bool include_dot = true) const; ditto http://codereview.chromium.org/149796/diff/1020/24#newcode205 Line 205: bool MatchesExtension(const StringType& extension) const; This function was added recently to handle the cases sensitivity comparison that you were trying to solve above. Isn't it enough? http://codereview.chromium.org/149796/diff/1020/28 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/149796/diff/1020/28#newcode1223 Line 1223: FilePath::StringType extension = path.CanonicalizedExtension(false); why canonicalize here? Shouldn't it do this in IsExecutableExtension? http://codereview.chromium.org/149796/diff/1020/28#newcode1236 Line 1236: FilePath::StringType extension = path.CanonicalizedExtension(false); same here http://codereview.chromium.org/149796/diff/1020/28#newcode1281 Line 1281: const FilePath::StringType& extension) const { it seems like this method could be rewritten better using MatchesExtension (or adding a static variant that compares two extensions) http://codereview.chromium.org/149796/diff/1020/26 File chrome/browser/icon_manager_win.cc (right): http://codereview.chromium.org/149796/diff/1020/26#newcode11 Line 11: if (extension != L".exe" && extension != L".dll" && extension != L".ico") use MatchesExtension instead http://codereview.chromium.org/149796/diff/1020/20 File net/base/filter.cc (right): http://codereview.chromium.org/149796/diff/1020/20#newcode112 Line 112: if (0 == extension.compare(FILE_PATH_LITERAL(".gz")) || these should use MatchesExtension instead. since this pattern of multiple compares seems common, perhaps we could add a version of MatchesExtension that takes a list of extensions.
I'm ambivalent. I really do like the ability to ask for the extension with no period (goodness knows that the first thing I always have to do is strip the period in every caller). I'm less enthused by the idea of "canonical extension". As was asked earlier, how doesn't "matches extension" work for you? http://codereview.chromium.org/149796/diff/1020/23 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/1020/23#newcode358 Line 358: DCHECK(extension.empty() || extension[0] == kExtensionSeparator); Really like this DCHECK. http://codereview.chromium.org/149796/diff/1020/23#newcode376 Line 376: current_extension.begin()); Er, I wrote this code specifically to work on all platforms; ifdefing it is really unnecessary. Just because the Mac can have a case-sensitive filesystem doesn't affect whether extensions are or are not matched case-insensitively. Those are orthogonal issues. (BTW, they are matched case-insensitively. Always.)
I haven't looked at the change yet but I have read the description. Doesn't MatchesExtension handle what you need?
http://codereview.chromium.org/149796/diff/1020/28 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/149796/diff/1020/28#newcode844 Line 844: ShouldOpenFileBasedOnExtension(download->full_path())) This code has changed since you touched it; the original code is buggy. The auto-open list has no leading periods; make sure you don't reintroduce the bug.
Thanks all for the detailed feedback! As for the main question regarding MatchesExtension: The issue described in bug #10876 stems from DownloadManager::ShouldOpenFileExtension() using the std::set find() method to look for the extension in auto_open_. That was the main thing I wanted to address with CanonicalizedExtension(). When making those changes I also found a few other issues that I corrected along the way (the mentioned leading dot issue with auto_open_, and ShouldOpenFileExtension treating the extension as a file path). Now, I could rewrite the code to use MatchesExtension(), or using a case-insensitive comparison functor for the auto_open_ set itself (and by analogy also the exe_types_ set). However I think that the contents of those sets should also be stored in a canonicalized way (hence CanonicalizedExtension()). As it seems that a public CanonicalizedExtension() method doesn't have too many fans here ;), I'd suggest rewriting those parts of the code to use a case-insensitive comparison functor, and also rewriting OpenFilesBasedOnExtension to make extensions lower-case before storing them. As for my other changes to method signatures (taking full paths instead of extensions): I still think they are worthwhile, as they eliminate the need to know whether an extension should have a dot or not, and reducing the caller's burden to extract the extension just for calling the method. Do you guys have any different opinions or suggestions?
On 2009/07/17 16:52:00, Avi wrote: > http://codereview.chromium.org/149796/diff/1020/23#newcode376 > Line 376: current_extension.begin()); > Er, I wrote this code specifically to work on all platforms; ifdefing it is > really unnecessary. Just because the Mac can have a case-sensitive filesystem > doesn't affect whether extensions are or are not matched case-insensitively. > Those are orthogonal issues. (BTW, they are matched case-insensitively. Always.) I was actually thinking mainly of Linux when I added the #ifdef, Mac was more of an afterthought (I realize the way I wrote it doesn't make that clear :p). I.e., do we treat Linux as case-insensitive as well?
One further suggestion: I wonder if it wouldn't be better to have a separate 'FileExtension' class that takes care of all the mentioned issues (case, dot/no dot) and wraps file extensions in the same way that FilePath does for whole paths. In case you're in favor of implementing such a class: should that be a sub-class of FilePath (less refactoring), a separate class, but dependent on FilePath (similar), or a wholly separate class (more refactoring: e.g., FilePath::StringType would need to become a common type).
Roland wrote: > One further suggestion: I wonder if it wouldn't be better to have a separate > 'FileExtension' class that takes care of all the mentioned issues (case, dot/no > dot) and wraps file extensions in the same way that FilePath does for whole > paths. I think that might be an excessive abstraction.
Removed the contentious parts of my patch and only fixed those parts of the code immediately pertinent to the issue. However, I believe that there are still issues regarding case handling when comparing file extensions. I added comments and unit test cases to document them. The latter are commented out, as they are failing.
http://codereview.chromium.org/149796/diff/3001/3007 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/3001/3007#newcode350 Line 350: // Also, the comparison should probably be case-sensitive for LINUX. Should it? Find a Linux expert to answer the question. http://codereview.chromium.org/149796/diff/3001/3009 File base/string_util.h (right): http://codereview.chromium.org/149796/diff/3001/3009#newcode546 Line 546: // FIXME: This doesn't work correct in all cases, esp. UTF-8 You can't use "FIXME". You're allowed "TODO()" with a name in there. Preferably yours. http://codereview.chromium.org/149796/diff/3001/3003 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/149796/diff/3001/3003#newcode1290 Line 1290: if (open && !executable) No. As commented above (on the old line 868 and removed in this file revision), the auto-open list is dot-less. You fail to remove the dot, and things won't match. (I made the same mistake and broke auto-opening, so that's why I see it.) http://codereview.chromium.org/149796/diff/3001/3003#newcode1318 Line 1318: if (auto_open_.find(extension) != auto_open_.end() || Same as above. The auto-open list is dotless. You can't change it (ever) as that's a saved user preference. http://codereview.chromium.org/149796/diff/3001/3003#newcode1379 Line 1379: ascii_extension.erase(0, 1); :\ Either the parameter always has a dot or it doesn't. I don't like code that has to guess.
http://codereview.chromium.org/149796/diff/3001/3007 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/3001/3007#newcode350 Line 350: // Also, the comparison should probably be case-sensitive for LINUX. On 2009/08/07 19:10:45, Avi wrote: > Should it? Find a Linux expert to answer the question. Files are case-sensitive of course but if you just want to ask "is this a jpeg?" then you probably don't care if its jpeg or JPEG. So I'd say the way we use it, it should be case-insensitive.
http://codereview.chromium.org/149796/diff/3001/3006 File base/file_path_unittest.cc (right): http://codereview.chromium.org/149796/diff/3001/3006#newcode727 Line 727: // FIXME: The following should pass, but currently don't in the as with Avi's other comment, use TODO(rolandsteiner). Also, you should have a bug # for this issue, and refer to it in the TODO. http://codereview.chromium.org/149796/diff/3001/3003 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/149796/diff/3001/3003#newcode1268 Line 1268: void DownloadManager::OpenFilesBasedOnExtension( This implementation looks identical to ShouldOpen... except for the end. Shouldn't it just call ShouldOpen... instead of reimplementing the same logic?
On 2009/08/07 19:10:45, Avi wrote: > http://codereview.chromium.org/149796/diff/3001/3003#newcode1290 > Line 1290: if (open && !executable) > No. As commented above (on the old line 868 and removed in this file revision), > the auto-open list is dot-less. You fail to remove the dot, and things won't > match. (I made the same mistake and broke auto-opening, so that's why I see it.) ARGH! You're right of course. Dunno how that happened, esp. after 1.) you explicitly cautioned against this before and 2.) I did notice and fix the original bug in my original patch! >_<; Will fix this (again) with my next patch set. > http://codereview.chromium.org/149796/diff/3001/3003#newcode1379 > Line 1379: ascii_extension.erase(0, 1); > :\ Either the parameter always has a dot or it doesn't. I don't like code that > has to guess. The question is, where does the extension come from? If it's from the Extension() method, then it will have a leading dot. If it comes from somewhere else, e.g., the auto open list, it may not. So this query is there to keep the method generic and easy to use for the caller. It's also not the only place where such a query happens (e.g., ReplaceExtension, amongst others). Cleaning this up may be worthwhile, but out of the scope of this bug, IMHO. On 2009/08/07 19:54:14, Evan Stade wrote: > http://codereview.chromium.org/149796/diff/3001/3007#newcode350 > Line 350: // Also, the comparison should probably be case-sensitive for LINUX. > On 2009/08/07 19:10:45, Avi wrote: > > Should it? Find a Linux expert to answer the question. > > Files are case-sensitive of course but if you just want to ask "is this a jpeg?" > then you probably don't care if its jpeg or JPEG. So I'd say the way we use it, > it should be case-insensitive. Sounds good (and makes things easier in any case), but I still wonder whether there can never be other use cases for this method where - on Linux - case would matter. That is, IMHO the function should make this explicit, e.g., by being named MatchesExtensionCaseInsensitive (?).
New revision: added static utility methods to FilePath that compare strings the same way as the file system does, and changed the appropriate methods to use them. Changed the auto_open_ set to use a comparator functor that is case insensitive based on those methods. Added unit tests for the new methods. Tested on Windows and Mac OSX, Linux uses fallback functions that should behave like the original code (so at least no regression should occur).
http://codereview.chromium.org/149796/diff/5001/5003 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/5001/5003#newcode437 Line 437: wchar_t c2 = (wchar_t)LOWORD(::CharUpperW((LPWSTR)MAKELONG(*i2, 0))); drive-in comment: I wonder what happens if you pass along eszett (small letter sharp s : U+00DF) to CharUpperW. The single character uppercase counterpart was not introduced until Unicode 5.1. Moreover, your code cannot deal with filenames containing non-BMP characters (> U+FFFF represented in two wchar_t's on Windows). Can you just use CharUpperBuffer or CharUpper after copying or LCMapString? Hmm, that will make this slower only to deal with extremely rare cases (deseret alphabet). It's also possible that Windows file system itself does not deal with them 'correctly'(?). Perhaps, you can just put TODO comment (to add a slower path when there's a non-BMP character in either name) for now as long as the way German eszett is dealt with matches what Windows does. BTW, the MSDN says that it does not do anything special under Turkish locale (which is good for us).
Well, I followed the article in http://msdn.microsoft.com/library/en-us/winui/winui/windowsuserinterface/reso... . It mentions CharUpper as well as CharUpperBuff and LCMapString as adequate functions to mimic the OS' behavior when it comes to comparing filenames, and since I didn't want to allocate and de-allocate 2 extra buffers just for comparison, I went with CharUpper.
A gentle review-wanted-ping... ^_- Cheers, Roland
On 2009/09/03 01:54:30, Roland wrote: > Well, I followed the article in > http://msdn.microsoft.com/library/en-us/winui/winui/windowsuserinterface/reso... Well, that does not guarantee that it does what the file system does. Can you try experimenting with a filename containing eszett or add it as a part of the test? > . > > It mentions CharUpper as well as CharUpperBuff and LCMapString as adequate > functions to mimic the OS' behavior when it comes to comparing filenames, and > since I didn't want to allocate and de-allocate 2 extra buffers just for > comparison, I went with CharUpper.
http://codereview.chromium.org/149796/diff/5001/5002 File base/file_path_unittest.cc (right): http://codereview.chromium.org/149796/diff/5001/5002#newcode811 Line 811: TEST_F(FilePathTest, CompareIgnoreCase) { I guess this and other compare tests should have tests for 'unnormalized' file names for Mac OS X? OS X file system internally uses NFD, but the name comparison should succeed regardless of the normalization status of file names.
On 2009/09/04 21:42:12, Jungshik Shin wrote: > Well, that does not guarantee that it does what the file system does. Can you > try experimenting with a filename containing eszett or add it as a part of the > test? I looked at the Eszett issue: AFAICT, vanilla Vista doesn't even have a glyph for upper case Eszett. Furthermore, according to the unicode standard, upper-casing lower case Eszett is "SS", *not* upper case Eszett. In any case, Explorer doesn't complain about 3 Files with the same name and "ss", <lower eszett> and <upper eszett> as Extension. The unittests I added for eszett reflect this. On 2009/09/04 21:46:40, Jungshik Shin wrote: > http://codereview.chromium.org/149796/diff/5001/5002 > File base/file_path_unittest.cc (right): > > http://codereview.chromium.org/149796/diff/5001/5002#newcode811 > Line 811: TEST_F(FilePathTest, CompareIgnoreCase) { > I guess this and other compare tests should have tests for 'unnormalized' file > names for Mac OS X? OS X file system internally uses NFD, but the name > comparison should succeed regardless of the normalization status of file names. Yes, they should, and boy was that not easy to implement! @_@ (the unicode/text encoding services had a bad tendency to just throw error codes on me, see also the code comments). Hopefully the way I implemented it now is water tight (There's also a fall-back to the previous version in case the system throws an error). I also added unit tests for this on Mac OSX only.
Review requested! ^_- Cheers, Roland
I'm OK with the changes in the download manager, but have issues with the changes to file_path. Make sure to get Mark's OK with those changes before committing. http://codereview.chromium.org/149796/diff/11008/10004 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/11008/10004#newcode366 Line 366: // case insensitively. Didn't we answer this question? Our linux guys said yes. http://codereview.chromium.org/149796/diff/11008/10004#newcode465 Line 465: // cf. http://developer.apple.com/mac/library/technotes/tn/tn1150.html#UnicodeSubtle... You're using the TEC? That's incredibly ancient; please don't do that. You can construct CFStrings from byte encodings using CFString APIs. Please do so.
On 2009/09/30 16:20:58, Avi wrote: > I'm OK with the changes in the download manager, but have issues with the > changes to file_path. Make sure to get Mark's OK with those changes before > committing. I'll try to grab him on IRC, but due to the different time zones this may be difficult. Could you please also ping him to have a look? > http://codereview.chromium.org/149796/diff/11008/10004#newcode366 > Line 366: // case insensitively. > Didn't we answer this question? Our linux guys said yes. This is more or less just a leftover from previous versions. And I agree that from a UI perspective it makes sense to handle things case insensitively. The reason I left it in is because I wonder if there can really never be a (non-UI) case where we need to do things the file system way. (However, since such a need apparently hasn't surfaced so far, I guess I'm just over-thinking things. ^_^;) > http://codereview.chromium.org/149796/diff/11008/10004#newcode465 > Line 465: // cf. > http://developer.apple.com/mac/library/technotes/tn/tn1150.html#UnicodeSubtle... > You're using the TEC? That's incredibly ancient; please don't do that. You can > construct CFStrings from byte encodings using CFString APIs. Please do so. First let me say that I'm far from an expert in OS X programming (just started with these things, really), so it's not unlikely that I missed some APIs! The problem here is not just byte encodings, but to get the specific UTF-8 decomposition that HFS uses (and even more so something that actually works, and not just returns a non-descriptive error... :p). I haven't found a suitable API for that in CFString. CFStringNormalize comes close, but that only allows the standard de/composition forms. (Also, while the API may be ancient, the relevant parts of HFS likely don't change, so it seems to me that that may well be close to what the file system actually uses.)
On 2009/09/10 10:03:26, Roland wrote: > On 2009/09/04 21:42:12, Jungshik Shin wrote: > > Well, that does not guarantee that it does what the file system does. Can you > > try experimenting with a filename containing eszett or add it as a part of the > > test? > > I looked at the Eszett issue: AFAICT, vanilla Vista doesn't even have a glyph > for upper case Eszett. Furthermore, according to the unicode standard, > upper-casing lower case Eszett is "SS", *not* upper case Eszett. > In any case, Explorer doesn't complain about 3 Files with the same name and > "ss", <lower eszett> and <upper eszett> as Extension. Gee. You got me wrong (I realized that I was not very explicit about that). Sure, the uppercase of U+00DF (eszett) is 'SS'. That's why I wondered how using a single-char-in-single-char-out version of Uppercase could work. > The unittests I added for eszett reflect this. Did I miss it? You didn't add a test comparing a file with U+00DF with a file with 'SS'? Whether it's regarded as the same file or not by Windows, you have to have a test, don't you? > > On 2009/09/04 21:46:40, Jungshik Shin wrote: > > http://codereview.chromium.org/149796/diff/5001/5002 > > File base/file_path_unittest.cc (right): > > > > http://codereview.chromium.org/149796/diff/5001/5002#newcode811 > > Line 811: TEST_F(FilePathTest, CompareIgnoreCase) { > > I guess this and other compare tests should have tests for 'unnormalized' file > > names for Mac OS X? OS X file system internally uses NFD, but the name > > comparison should succeed regardless of the normalization status of file > names. > > Yes, they should, and boy was that not easy to implement! @_@ (the unicode/text > encoding services had a bad tendency to just throw error codes on me, see also > the code comments). > > Hopefully the way I implemented it now is water tight (There's also a fall-back > to the previous version in case the system throws an error). I also added unit > tests for this on Mac OSX only.
http://codereview.chromium.org/149796/diff/11008/10003 File base/file_path_unittest.cc (right): http://codereview.chromium.org/149796/diff/11008/10003#newcode846 Line 846: // Note that uc(<lowercase eszett>) => "SS", NOT <uppercase eszett>! So, does the Windows consider a file with eszett the same as a file with 'SS'? Whether it does or not, you need a test for that.
On Thu, Oct 1, 2009 at 1:56 PM, Mark Mentovai <mark@chromium.org> wrote: > Try it out, but I bet CFStringGetFileSystemRepresentation does what you need. Nah, can't use that - too obvious! >_< (Ok, I feel officially dumb now for missing that! @_@;) On 2009/10/01 05:51:13, Jungshik Shin wrote: > Sure, the uppercase of U+00DF (eszett) is 'SS'. That's why I wondered how using > a single-char-in-single-char-out version of Uppercase could work. Well, it doesn't convert <eszett> to 'SS', but neither does the Windows file system, nor does HFS. On 2009/10/01 05:51:42, Jungshik Shin wrote: > Whether it does or not, you need a test for that. Yes, I misunderstood you before. I added the missing test, and it did show up an issue in my prior implementation: I used CFStringCompare to compare the decomposed strings, which DOES convert <eszett> -> 'SS'. (It also likes to crash on the file system representation strings :p) As this is not what the file system does I instead now added an adapted version of the 'official' comparison routine (see the same tech note I linked in the comments in various places). This adds considerably to the code (and there I thought the Windows version was cumbersome :p), but should be the most correct, as well as a bit faster. @Mark: now please don't tell me there's a CFStringCompareFileNamesForDummies that I could have used instead all along... ^_-
Fortunately(?) for you, I don't know of any system-provided function that directly implements FastUnicodeCompare. http://codereview.chromium.org/149796/diff/17002/18018 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/17002/18018#newcode883 Line 883: CFStringRef cfstring1 = Leak. Use scoped_cftyperef. Same for cfstring2. http://codereview.chromium.org/149796/diff/17002/18018#newcode885 Line 885: (const UInt8*) str1.c_str(), Prefer C++-style casts. http://codereview.chromium.org/149796/diff/17002/18018#newcode897 Line 897: CFIndex bufferLength1 = Style dictates buffer_length, not bufferLength. http://codereview.chromium.org/149796/diff/17002/18018#newcode902 Line 902: scoped_array<char> buffer1(new char[bufferLength1]); scoped_array is a pet peeve of mine, because it's really just a vector. Provided that length > 0, this is just a std::vector<char>(length). You can access the underlying storage with &v[0]. http://codereview.chromium.org/149796/diff/17002/18018#newcode910 Line 910: bufferLength2); When you write "a a b b c c d d," you've written too much. You should extract CFStringCreateWithBytesNoCopy/CFStringGetMaximumSizeOfFileSystemRepresentation/CFStringGetFileSystemRepresentation into its own static utility function. It should probably return a StringType. http://codereview.chromium.org/149796/diff/17002/18018#newcode920 Line 920: // http://developer.apple.com/mac/library/technotes/tn/tn1150.html#UnicodeSubtle... You can use #StringComparisonAlgorithm here, it's more specific. http://codereview.chromium.org/149796/diff/17002/18018#newcode925 Line 925: int index1 = 0; I recommend: - breaking everything from here on down into its own static utility function to approximate FastUnicodeCompare, and - going the rest of the way to convert the function to our style. The big differences between our style and Apple's is that we name_variables_like_this insteadOfThis, and that we don't declare variables until their first use. http://codereview.chromium.org/149796/diff/17002/18018#newcode968 Line 968: return (codepoint1 < codepoint2) ? -1 : +1; The + is extraneous and makes me think of floating-point numbers (even though there's no good reason for it). ? -1 : 1 is a common enough idiom that I'd just use that. http://codereview.chromium.org/149796/diff/17002/18018#newcode971 Line 971: #else For a long #ifdef like this one, you should put comments on your #elses and #endifs. http://codereview.chromium.org/149796/diff/17002/18019 File base/file_path.h (right): http://codereview.chromium.org/149796/diff/17002/18019#newcode263 Line 263: static int CompareIgnoreCase(const StringType& str1, I like that you've kept these static, to discourage direct use on FilePaths.
Uploaded a new version based on Mark's comments. On 2009/10/06 16:10:03, Mark Mentovai wrote: > http://codereview.chromium.org/149796/diff/17002/18018#newcode902 > Line 902: scoped_array<char> buffer1(new char[bufferLength1]); > scoped_array is a pet peeve of mine, because it's really just a vector. > Provided that length > 0, this is just a std::vector<char>(length). You can > access the underlying storage with &v[0]. I added 2 helper methods CreateHFSDecomposedForm() and FastUnicodeCompare() as suggested, which indeed makes things much cleaner. In my first version of the rewrite CreateHFSDecomposedForm() used a string to hold the converted result and had CFStringGetFileSystemRepresentation() write directly into the underlying buffer. However, basic_string stores the current string length explicitely, so result.length() would still be 0 after the operation. vector<char> would also suffer from the same problem with size(). Getting a wrong result from length() or size() would probably be confusing and a potential source of bugs down the road. OTOH I wanted to avoid needlessly copying the contents of the result buffer within CreateHFSDecomposedForm(), or for the return value. Therefore I rewrote the code to use a simple new[] allocated buffer instead and return a pointer. However, this brought me back to using scoped_array as the most convenient solution for life-time management of the result buffer within CompareIgnoreCase(). If you still have issues with the use of scoped_array, I can of course change the code to something different (although I believe the current version should be rather efficient).
It sounds like you want to use WriteInto from base/string_util.h. You can call WriteInto on a std::string with the maximum size, and then resize the string with the result of strlen when you're done.
On 2009/10/07 14:14:01, Mark Mentovai wrote: > It sounds like you want to use WriteInto from base/string_util.h. You can call > WriteInto on a std::string with the maximum size, and then resize the string > with the result of strlen when you're done. Yes, that is similar to my first version. But as the comment to WriteInto points out, this also adds some basically unnecessary overhead by filling the string with 0s. However, it also DOES allow for a cleaner interface, so I uploaded a new version of the implementation. I didn't use WriteInto directly however, since it requires a resize afterwards that needs explaining anyway, so I just use the 2 lines (reserve + resize) as I feel that makes it clearer what happens. Since the interface is now "representable" I also expose FastUnicodeCompare() and GetHFSDecomposedForm() in the header file. Finally, I also removed a potential signed shift bug in FastUnicodeCompare() and cleaned up the variable naming a bit.
The FilePath stuff LGTM with these changes. (It looks like a lot of comments but it really isn't.) http://codereview.chromium.org/149796/diff/24006/24008 File base/file_path.cc (right): http://codereview.chromium.org/149796/diff/24006/24008#newcode8 Line 8: #include <CoreFoundation/CFString.h> You don't need this. CoreServices.h should get you all of the CFString declarations you need. http://codereview.chromium.org/149796/diff/24006/24008#newcode22 Line 22: #include "base/scoped_ptr.h" No longer needed. http://codereview.chromium.org/149796/diff/24006/24008#newcode454 Line 454: return +1; +1 comment again here and on line 457 http://codereview.chromium.org/149796/diff/24006/24008#newcode496 Line 496: UInt16 lower_case_table[] = { const? http://codereview.chromium.org/149796/diff/24006/24008#newcode497 Line 497: // High-byte indices ( == 0 iff no case mapping and no ignorables ) Make this table conform to our coding style by giving it a 2-space indent instead of 4. http://codereview.chromium.org/149796/diff/24006/24008#newcode901 Line 901: // comparison, we just use that vaue and flag it with DCHECK. vaue = value http://codereview.chromium.org/149796/diff/24006/24008#newcode916 Line 916: // comparison, we just use that vaue and flag it with DCHECK. Same. I feel like this block and the one above ought to be combined, since they're exactly the same. http://codereview.chromium.org/149796/diff/24006/24008#newcode983 Line 983: // GetHFSDecomposedForm() returns and empty string in an error case. and -> an http://codereview.chromium.org/149796/diff/24006/24009 File base/file_path.h (right): http://codereview.chromium.org/149796/diff/24006/24009#newcode286 Line 286: static int FastUnicodeCompare(const StringType& string1, Since you're making this public, you should put HFS in the name. http://codereview.chromium.org/149796/diff/24006/24007 File base/file_path_unittest.cc (right): http://codereview.chromium.org/149796/diff/24006/24007#newcode811 Line 811: We should add straight ASCII test cases here to see if .txt matches foo.TXT, foo.TxT, and that sort of thing. http://codereview.chromium.org/149796/diff/24006/24007#newcode874 Line 874: { { FPL("foo"), FPL("bar") }, +1}, Pluses? http://codereview.chromium.org/149796/diff/24006/24007#newcode934 Line 934: if (result < 0) I find this curious. Doesn't the interface already guarantee -1, 0, 1, and not just negative, 0, positive? I guess it doesn't, because in the POSIX-not-Mac case, you're just using strcasecmp, which doesn't provide this guarantee. After reading the Windows and Mac code, this surprised me. You should: - Add a small comment in file_path.h, either "returns -1, 0, or 1" or "returns a negative integer, 0, or a positive integer", depending on what you choose. - Optionally transform negatives to -1 and positives to 1 where you need to use strcasecmp or other functions that might not have already restricted return values to these three possibilities.
Uploaded a cleaned up patch set with the requested changes: .) eradicated the remaining '+1's .) moved reading of the next non-ignorable codepoint to a separate 'inline' function (within the anonymous namespace). .) added test cases to the MatchesExtension unit test .) Changed the code to always return -1, 0 or 1. Added a comment to file_path.h. Removed the separate check from the unit test. .) Corrected the rest: const lower_case_table, comment typos, changed method name to 'HFSFastUnicodeCompare', removed unneeded #include-s
The FilePath portions LGTM. I think you've done a great job here. http://codereview.chromium.org/149796/diff/24035/24038 File base/file_path.h (right): http://codereview.chromium.org/149796/diff/24035/24038#newcode282 Line 282: static StringType GetHFSDecomposedForm(const FilePath::StringType& string); FilePath:: is unnecessary on FilePath::StringType because you're inside |class FilePath|.
This hasn't been checked in yet. Is it still waiting for more reviews? Are we going to do this or are we abandoning the approach?
I think I have all the LGMT's necessary, but I was waiting to get my chromium account set up rather than bothering people to check it in for me. As I'm a bit busy today otherwise, I intended to check it in tomorrow. On 2009/10/24 16:01:03, Mark Mentovai wrote: > This hasn't been checked in yet. Is it still waiting for more reviews? Are we > going to do this or are we abandoning the approach?
Committed as rev. 30168
Unfortunately, there was some bit rot: it seems ICU was yanked from under base/, so I need one more code review after all, to check that I adapted the code correctly: .) file_path.cc: changed U8_NEXT -> CBU8_NEXT .) changed include file "unicode/utf8.h" -> "base/third_party/icu/icu_utf.h"
> On Tue, Oct 27, 2009 at 10:44 PM, Mark Mentovai <mark@chromium.org> wrote: > LGTM committed (again :p) as r30323 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
