|
|
Created:
10 years, 5 months ago by Evan Stade Modified:
9 years, 6 months ago CC:
chromium-reviews, ben+cc_chromium.org, Paul Godavari, Paweł Hajdan Jr., brettw-cc_chromium.org, cbentzel+watch_chromium.org, darin-cc_chromium.org Base URL:
http://src.chromium.org/git/chromium.git Visibility:
Public. |
DescriptionTreat multiple extensions like .tar.gz as a single extension.
The logic is taken from firefox.
BUG=48346
TEST=unit tests; downloading the same .tar.gz file multiple times (see bug)
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=53844
Patch Set 1 #Patch Set 2 : match firefox #Patch Set 3 : update comment #
Total comments: 34
Patch Set 4 : Mark comments #Patch Set 5 : review comments #Patch Set 6 : whitespace #Patch Set 7 : remove naming change #
Total comments: 12
Patch Set 8 : more review comments #
Messages
Total messages: 16 (0 generated)
What will this do with subversion-1.6.12.zip?
On 2010/07/19 21:32:57, Mark Mentovai wrote: > What will this do with subversion-1.6.12.zip? the wrong thing. Firefox appears to handle that correctly. We could use additional heuristics to work around this common case, although it would be hard to get every case correct (for example, .668 is apparently an extension for some music codec, so subversion-1.6.668.zip would be ambiguous). Do you think applying some heuristics is the right solution, or that this is too difficult to get right and the current code is best?
I think that filenames like subversion-1.6.12.zip are common enough that we should do something to handle them properly. If Firefox gets it right (or “righter”), let’s look at what it does. Given the .668s of the world, we’ll probably never get things 100% right in 100% of all cases, but we can probably get really really close. Looking at the .dmg files in my ~/Downloads right now, I see that many of them fall into this pattern. Mark Evan Stade wrote: > On 2010/07/19 21:32:57, Mark Mentovai wrote: >> >> What will this do with subversion-1.6.12.zip? > > the wrong thing. Firefox appears to handle that correctly. We could use > additional heuristics to work around this common case, although it would be > hard > to get every case correct (for example, .668 is apparently an extension for > some > music codec, so subversion-1.6.668.zip would be ambiguous). Do you think > applying some heuristics is the right solution, or that this is too > difficult to > get right and the current code is best? > > http://codereview.chromium.org/3018011/show
ok. Here is the firefox code for reference: /** * Separates the base name from the extension in a file name, recognizing some * double extensions like ".tar.gz". * * @param aLeafName * The full leaf name to be parsed. Be careful when processing names * containing leading or trailing dots or spaces. * @returns [base, ext] * The base name of the file, which can be empty, and its extension, * which always includes the leading dot unless it's an empty string. * Concatenating the two items always results in the original name. */ splitBaseNameAndExtension: function DP_splitBaseNameAndExtension(aLeafName) { // The following regular expression is built from these key parts: // .*? Matches the base name non-greedily. // \.[A-Z0-9]{1,3} Up to three letters or numbers preceding a // double extension. // \.(?:gz|bz2|Z) The second part of common double extensions. // \.[^.]* Matches any extension or a single trailing dot. var [, base, ext] = /(.*?)(\.[A-Z0-9]{1,3}\.(?:gz|bz2|Z)|\.[^.]*)?$/i .exec(aLeafName); // Return an empty string instead of undefined if no extension is found. return [base, ext || ""]; } I'll set about trying to copy the logic.
updated. I applied the logic to Windows as well since I made the logic more restrictive (won't affect foo.dll.exe for example).
Tryserver: check on net_unittests FilterTest.UnsupportedMimeGzip.
Also see previous comment about net_unittests failures detected by the trybot. http://codereview.chromium.org/3018011/diff/10001/9002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/10001/9002#newcode313 base/file_path.cc:313: // See explanation in .h file. This comment will need to change because the other stuff is probably going to come out of the .h file. http://codereview.chromium.org/3018011/diff/10001/9002#newcode321 base/file_path.cc:321: bool is_common_double_extension = For ease of maintenance, I suggest: const char* common_double_extensions[] = {"gz", "z", "bz2"}; ... http://codereview.chromium.org/3018011/diff/10001/9002#newcode332 base/file_path.cc:332: StringType::size_type separator_dot; No need to introduce separator_dot at this point - the function already has enough other “return” points. You should just do: if (thing on next three lines) { return penultimate_dot; } return last_dot; Alternative: maintain a single “return” point at the bottom of the function, introduce the local variable to contain the return value much earlier, and use if/else logic flow instead of multiple returns. http://codereview.chromium.org/3018011/diff/10001/9002#newcode357 base/file_path.cc:357: return DirName().Append(BaseName().path_.substr(0, dot)); What’s with the DirName/BaseName stuff? That’s potentially a behavior change. Why not just return path_.substr(0, dot); http://codereview.chromium.org/3018011/diff/10001/9003 File base/file_path.h (right): http://codereview.chromium.org/3018011/diff/10001/9003#newcode214 base/file_path.h:214: // We allow a second extension component of up to 4 characters when the No “we” in comments. http://codereview.chromium.org/3018011/diff/10001/9003#newcode217 base/file_path.h:217: // '.tar.gz' and '.tar.Z' respectively. The comment should specify that this function returns StringType::npos if no extension is present. http://codereview.chromium.org/3018011/diff/10001/9003#newcode218 base/file_path.h:218: StringType::size_type ExtensionSeparatorPosition() const; This doesn’t need to be part of the public API. In fact, it doesn’t need to be a member function at all. It can follow the pattern of the other private “implementation detail” functions in file_path.cc’s anonymous namespace, and accept a const FilePath::StringType& parameter. http://codereview.chromium.org/3018011/diff/10001/9004 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/10001/9004#newcode701 base/file_path_unittest.cc:701: EXPECT_EQ(FILE_PATH_LITERAL(".jpg"), jpg.Extension()); Thanks! http://codereview.chromium.org/3018011/diff/10001/9004#newcode707 base/file_path_unittest.cc:707: EXPECT_EQ(jpg.RemoveExtension().value(), path_no_ext.value()); Shouldn’t this one be flipped too? http://codereview.chromium.org/3018011/diff/10001/9004#newcode709 base/file_path_unittest.cc:709: EXPECT_EQ(path_no_ext.value(), path_no_ext.RemoveExtension().value()); (not this one) http://codereview.chromium.org/3018011/diff/10001/9004#newcode710 base/file_path_unittest.cc:710: EXPECT_EQ(path_no_ext.Extension(), FILE_PATH_LITERAL("")); but this one should also be flipped? http://codereview.chromium.org/3018011/diff/10001/9004#newcode747 base/file_path_unittest.cc:747: { FPL("/foo.bar/..////"), FPL("") }, I wonder why there are no test cases that don’t use any separators in the source data. For that matter, there should be test cases that look at a filename that’s all extension (beginning with a dot, not "." or "..", since the .cc file takes special care to handle those. http://codereview.chromium.org/3018011/diff/10001/9004#newcode755 base/file_path_unittest.cc:755: } No real set of test cases for RemoveExtension? That seems kind of bad. No wonder nothing flagged you that your RemoveExtension change DirName/BaseName thing might have been a behavior change. :( http://codereview.chromium.org/3018011/diff/10001/9005 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/3018011/diff/10001/9005#newcode1138 chrome/browser/download/download_manager.cc:1138: // 5. The original extension is not ".tgz" & the new extension is not "gz". Comment needs revision now? http://codereview.chromium.org/3018011/diff/10001/9006 File chrome/browser/download/download_util.cc (right): http://codereview.chromium.org/3018011/diff/10001/9006#newcode494 chrome/browser/download/download_util.cc:494: *path = path->InsertBeforeExtensionASCII(StringPrintf("(%d)", number)); This part is unrelated to the rest of this changelist. I think the space is correct on the Mac. I thought Firefox used a space on the Mac, too. Can you check to see if Firefox’s behavior is platform-specific? On systems where you don’t want spaces in filenames because they’re a pain on command lines, wouldn’t you also not want parentheses?
http://codereview.chromium.org/3018011/diff/10001/9006 File chrome/browser/download/download_util.cc (right): http://codereview.chromium.org/3018011/diff/10001/9006#newcode494 chrome/browser/download/download_util.cc:494: *path = path->InsertBeforeExtensionASCII(StringPrintf("(%d)", number)); On 2010/07/20 17:17:00, Mark Mentovai wrote: > On systems where you don’t want spaces in filenames because they’re a pain on > command lines, wouldn’t you also not want parentheses? Not using parentheses sounds good, but I'm not sure if there's a good alternateive .2. -2- _2_? _2_ seems the most reasonable.
http://codereview.chromium.org/3018011/diff/10001/9002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/10001/9002#newcode321 base/file_path.cc:321: bool is_common_double_extension = On 2010/07/20 17:17:00, Mark Mentovai wrote: > For ease of maintenance, I suggest: > > const char* common_double_extensions[] = {"gz", "z", "bz2"}; > ... Done. http://codereview.chromium.org/3018011/diff/10001/9002#newcode332 base/file_path.cc:332: StringType::size_type separator_dot; On 2010/07/20 17:17:00, Mark Mentovai wrote: > No need to introduce separator_dot at this point - the function already has > enough other “return” points. You should just do: > > if (thing on next three lines) { > return penultimate_dot; > } > return last_dot; > > Alternative: maintain a single “return” point at the bottom of the function, > introduce the local variable to contain the return value much earlier, and use > if/else logic flow instead of multiple returns. Done. http://codereview.chromium.org/3018011/diff/10001/9002#newcode357 base/file_path.cc:357: return DirName().Append(BaseName().path_.substr(0, dot)); On 2010/07/20 17:17:00, Mark Mentovai wrote: > What’s with the DirName/BaseName stuff? That’s potentially a behavior change. > Why not just > > return path_.substr(0, dot); because dot is relative to BaseName. http://codereview.chromium.org/3018011/diff/10001/9003 File base/file_path.h (right): http://codereview.chromium.org/3018011/diff/10001/9003#newcode214 base/file_path.h:214: // We allow a second extension component of up to 4 characters when the On 2010/07/20 17:17:00, Mark Mentovai wrote: > No “we” in comments. Done. http://codereview.chromium.org/3018011/diff/10001/9003#newcode217 base/file_path.h:217: // '.tar.gz' and '.tar.Z' respectively. On 2010/07/20 17:17:00, Mark Mentovai wrote: > The comment should specify that this function returns StringType::npos if no > extension is present. Done. http://codereview.chromium.org/3018011/diff/10001/9003#newcode218 base/file_path.h:218: StringType::size_type ExtensionSeparatorPosition() const; On 2010/07/20 17:17:00, Mark Mentovai wrote: > This doesn’t need to be part of the public API. > > In fact, it doesn’t need to be a member function at all. It can follow the > pattern of the other private “implementation detail” functions in file_path.cc’s > anonymous namespace, and accept a const FilePath::StringType& parameter. Sure it *can* be a member of the anonymous namespace, but what's the advantage? The disadvantage is that I have to manually pass a FilePath (or StringType), which is supposed to be a very specific FilePath (i.e. just the base name, not the whole thing), and thus it can be mis-used. Also, I'd have to prepend StringType with FilePath:: like 10 times (unless I add using FilePath::StringType; don't know what the philosophy is on using). Although I agree in can be private. http://codereview.chromium.org/3018011/diff/10001/9004 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/10001/9004#newcode707 base/file_path_unittest.cc:707: EXPECT_EQ(jpg.RemoveExtension().value(), path_no_ext.value()); On 2010/07/20 17:17:00, Mark Mentovai wrote: > Shouldn’t this one be flipped too? Done. http://codereview.chromium.org/3018011/diff/10001/9004#newcode710 base/file_path_unittest.cc:710: EXPECT_EQ(path_no_ext.Extension(), FILE_PATH_LITERAL("")); On 2010/07/20 17:17:00, Mark Mentovai wrote: > but this one should also be flipped? Done. http://codereview.chromium.org/3018011/diff/10001/9004#newcode755 base/file_path_unittest.cc:755: } On 2010/07/20 17:17:00, Mark Mentovai wrote: > No real set of test cases for RemoveExtension? That seems kind of bad. No wonder > nothing flagged you that your RemoveExtension change DirName/BaseName thing > might have been a behavior change. :( ReplaceExtension relies on RemoveExtension, and there are a lot of tests for ReplaceExtension. http://codereview.chromium.org/3018011/diff/10001/9005 File chrome/browser/download/download_manager.cc (right): http://codereview.chromium.org/3018011/diff/10001/9005#newcode1138 chrome/browser/download/download_manager.cc:1138: // 5. The original extension is not ".tgz" & the new extension is not "gz". On 2010/07/20 17:17:00, Mark Mentovai wrote: > Comment needs revision now? Done. http://codereview.chromium.org/3018011/diff/10001/9006 File chrome/browser/download/download_util.cc (right): http://codereview.chromium.org/3018011/diff/10001/9006#newcode494 chrome/browser/download/download_util.cc:494: *path = path->InsertBeforeExtensionASCII(StringPrintf("(%d)", number)); Firefox does it with parens but without a space on Mac, Linux, and Windows. However, I don't feel strongly about the "correct" behavior as these characters are just a minor annoyance.
On 2010/07/20 04:30:52, Mark Mentovai wrote: > Tryserver: check on net_unittests FilterTest.UnsupportedMimeGzip. saw that but I thought it was flake because I couldn't repro locally. Looking at it now.
http://codereview.chromium.org/3018011/diff/10001/9002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/10001/9002#newcode357 base/file_path.cc:357: return DirName().Append(BaseName().path_.substr(0, dot)); Evan Stade wrote: > because dot is relative to BaseName. FPL("abc.txt").RemoveExtension() should return FPL("abc"). As you’ve written it now, it will return FPL("./abc"). This is the behavior change I’m talking about, and it’s wrong. http://codereview.chromium.org/3018011/diff/10001/9003 File base/file_path.h (right): http://codereview.chromium.org/3018011/diff/10001/9003#newcode218 base/file_path.h:218: StringType::size_type ExtensionSeparatorPosition() const; Evan Stade wrote: > Sure it *can* be a member of the anonymous namespace, but what's the advantage? There really isn’t any disadvantage—the disadvantage you cite really isn’t because it’s still implementation-specific and isolated to the .cc file. The small advantage is that it doesn’t clutter the class definition with uninteresting private stuff. The big advantage is that a lot of things depend on "base/file_path.h", and not touching it means that people and bots who sync to your change will not have to rebuild the entire world. This is how everything is done in this class, whenever possible. http://codereview.chromium.org/3018011/diff/10001/9004 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/10001/9004#newcode755 base/file_path_unittest.cc:755: } Evan Stade wrote: > ReplaceExtension relies on RemoveExtension, and there are a lot of tests for > ReplaceExtension. …and it’s why I feel that some minimal testing of RemoveExtension is warranted… http://codereview.chromium.org/3018011/diff/10001/9006 File chrome/browser/download/download_util.cc (right): http://codereview.chromium.org/3018011/diff/10001/9006#newcode494 chrome/browser/download/download_util.cc:494: *path = path->InsertBeforeExtensionASCII(StringPrintf("(%d)", number)); Evan Stade wrote: > However, I don't feel strongly about the "correct" behavior as these characters > are just a minor annoyance. OK. I think that if we change this, we should use - or _ instead of parentheses and spaces. We can probably get away with that on all platforms. I don’t think that a spaceless but parenthesized thing makes sense. You can break this into a distinct change if you prefer.
http://codereview.chromium.org/3018011/diff/10001/9002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/10001/9002#newcode357 base/file_path.cc:357: return DirName().Append(BaseName().path_.substr(0, dot)); On 2010/07/20 20:18:55, Mark Mentovai wrote: > Evan Stade wrote: > > because dot is relative to BaseName. > > FPL("abc.txt").RemoveExtension() should return FPL("abc"). As you’ve written it > now, it will return FPL("./abc"). This is the behavior change I’m talking about, > and it’s wrong. I see. It's actually the reverse (./abc.txt -> abc), but point taken. Fixed. http://codereview.chromium.org/3018011/diff/10001/9004 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/10001/9004#newcode755 base/file_path_unittest.cc:755: } On 2010/07/20 20:18:55, Mark Mentovai wrote: > Evan Stade wrote: > > ReplaceExtension relies on RemoveExtension, and there are a lot of tests for > > ReplaceExtension. > > …and it’s why I feel that some minimal testing of RemoveExtension is warranted… not sure I follow. Individually testing RemoveExtension won't increase code coverage. Nevertheless, tests added. http://codereview.chromium.org/3018011/diff/10001/9006 File chrome/browser/download/download_util.cc (right): http://codereview.chromium.org/3018011/diff/10001/9006#newcode494 chrome/browser/download/download_util.cc:494: *path = path->InsertBeforeExtensionASCII(StringPrintf("(%d)", number)); I'll just leave it alone. No one has actually complained about it as far as I know.
LGTM with these small changes. One question, too. http://codereview.chromium.org/3018011/diff/30001/11002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/30001/11002#newcode39 base/file_path.cc:39: typedef FilePath::StringType StringType; |using FilePath::StringType;| instead? http://codereview.chromium.org/3018011/diff/30001/11002#newcode51 base/file_path.cc:51: const StringType& path) { Since you’ve made StringType a synonym for FilePath::StringType in this file, you can probably move a bunch of these to the lines that precede them. This one, lines 66-67, 71-72, and 223-224 look like candidates. http://codereview.chromium.org/3018011/diff/30001/11002#newcode128 base/file_path.cc:128: // See explanation in .h file. No longer applicable. “See explanation above” would be correct, but since the above explanation is so close, you can probably just drop this line. http://codereview.chromium.org/3018011/diff/30001/11003 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/30001/11003#newcode735 base/file_path_unittest.cc:735: { FPL("/foo.tar.bz2"), FPL(".tar.bz2") }, Add some testcases for something like FPL("/foo.1234.tar.gz"), FPL("/foo.tar.tar.gz"), and FPL("/foo.tar.gz.gz") — the “correct” behavior is harder to pin down, but we should at least monitor the procedure to make sure it’s stable if the implementation changes, or that if its behavior does change, that it will be called to our attention. http://codereview.chromium.org/3018011/diff/30001/11003#newcode825 base/file_path_unittest.cc:825: TEST_F(FilePathTest, RemoveExtension) { Thanks. http://codereview.chromium.org/3018011/diff/30001/11003#newcode863 base/file_path_unittest.cc:863: { { FPL("./foo.dll"), FPL("txt") }, FPL("./foo.txt") }, Nice. http://codereview.chromium.org/3018011/diff/30001/11005 File net/base/filter.cc (right): http://codereview.chromium.org/3018011/diff/30001/11005#newcode111 net/base/filter.cc:111: if (EndsWith(extension, FILE_PATH_LITERAL(".gz"), false) || Good catch. Did you search the entire codebase for ".gz", ".bz2", and ".Z" for other situations like this? Might be a good idea if you haven’t.
http://codereview.chromium.org/3018011/diff/30001/11002 File base/file_path.cc (right): http://codereview.chromium.org/3018011/diff/30001/11002#newcode39 base/file_path.cc:39: typedef FilePath::StringType StringType; On 2010/07/24 04:53:36, Mark Mentovai wrote: > |using FilePath::StringType;| instead? only works for namespaces apparently http://codereview.chromium.org/3018011/diff/30001/11002#newcode51 base/file_path.cc:51: const StringType& path) { On 2010/07/24 04:53:36, Mark Mentovai wrote: > Since you’ve made StringType a synonym for FilePath::StringType in this file, > you can probably move a bunch of these to the lines that precede them. This one, > lines 66-67, 71-72, and 223-224 look like candidates. Done. http://codereview.chromium.org/3018011/diff/30001/11002#newcode128 base/file_path.cc:128: // See explanation in .h file. On 2010/07/24 04:53:36, Mark Mentovai wrote: > No longer applicable. “See explanation above” would be correct, but since the > above explanation is so close, you can probably just drop this line. Done. http://codereview.chromium.org/3018011/diff/30001/11003 File base/file_path_unittest.cc (right): http://codereview.chromium.org/3018011/diff/30001/11003#newcode735 base/file_path_unittest.cc:735: { FPL("/foo.tar.bz2"), FPL(".tar.bz2") }, On 2010/07/24 04:53:36, Mark Mentovai wrote: > Add some testcases for something like FPL("/foo.1234.tar.gz"), > FPL("/foo.tar.tar.gz"), and FPL("/foo.tar.gz.gz") — the “correct” behavior is > harder to pin down, but we should at least monitor the procedure to make sure > it’s stable if the implementation changes, or that if its behavior does change, > that it will be called to our attention. Done. http://codereview.chromium.org/3018011/diff/30001/11005 File net/base/filter.cc (right): http://codereview.chromium.org/3018011/diff/30001/11005#newcode111 net/base/filter.cc:111: if (EndsWith(extension, FILE_PATH_LITERAL(".gz"), false) || On 2010/07/24 04:53:36, Mark Mentovai wrote: > Good catch. Did you search the entire codebase for ".gz", ".bz2", and ".Z" for > other situations like this? Might be a good idea if you haven’t. couldn't find any other instances that needed changing
LGTM. Let’s do it. |