|
|
DescriptionImprove Zip File Scanning on Mac
This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures
that .app files are treated as directories and as such do not break binary
feature extraction, causing analysis to fail. Second, it performs
type-sniffing to identify the existence of executable MachO files that do not
have any file extension, as is the usual case on Mac.
BUG=600392
Review-Url: https://codereview.chromium.org/2961373002
Cr-Commit-Position: refs/heads/master@{#492032}
Committed: https://chromium.googlesource.com/chromium/src/+/034ecb569929e1202f39cf744a32e8deeade06c8
Patch Set 1 : crude magic check working #Patch Set 2 : gonna implement cap first before moving on #Patch Set 3 : removed temp dir stuff #Patch Set 4 : addressing comments #Patch Set 5 : added makefile #Patch Set 6 : changing name of executable in zip #Patch Set 7 : ready for review #Patch Set 8 : improving readability in zip_analyzer.cc #Patch Set 9 : removing check for .app on windows #
Total comments: 4
Patch Set 10 : addressing comments #Patch Set 11 : addressing comments #Patch Set 12 : addressing comments #
Total comments: 7
Patch Set 13 : refactoring in zip_reader #
Total comments: 6
Patch Set 14 : comment on return value #Patch Set 15 : incorporating new zip_reader functionality into existing functions #Patch Set 16 : adding debug statements for android #Patch Set 17 : debugging int64_t on android #
Total comments: 4
Patch Set 18 : removing argument from ExtractCurrentEntryToString() #Patch Set 19 : minor #
Total comments: 10
Patch Set 20 : slight refactoring in zip_reader #
Total comments: 4
Patch Set 21 : testing #Patch Set 22 : avoiding multiple calls to unzReadCurrentFile #
Total comments: 22
Patch Set 23 : addressing comments #
Total comments: 6
Patch Set 24 : final mods #
Total comments: 12
Patch Set 25 : comments #
Total comments: 2
Patch Set 26 : minor #Messages
Total messages: 188 (152 generated)
Patchset #7 (id:120001) has been deleted
Description was changed from ========== trying stuff out BUG=600392 ========== to ========== This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac BUG=600392 ==========
mortonm@google.com changed reviewers: + jialiul@chromium.org
Patchset #3 (id:40001) has been deleted
Patchset #7 (id:160001) has been deleted
Patchset #7 (id:180001) has been deleted
Patchset #7 (id:200001) has been deleted
Patchset #7 (id:220001) has been deleted
Patchset #7 (id:240001) has been deleted
Looking good :-) https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:70: LOG(ERROR) << "macho?: " << (bytes.compare("\xcf\xfa\xed\xfe") == 0); You can convert this LOG(ERROR) into a DCHECK https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:72: if (bytes.length() != sizeof(uint32_t)) Maybe this should be a DCHECK too. https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:170: results->has_executable = true; Do we need to set has_executable = true here? When we go into the .app directory and encounter a Mach-O file, we can set this value then.
https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:70: LOG(ERROR) << "macho?: " << (bytes.compare("\xcf\xfa\xed\xfe") == 0); On 2017/07/12 17:57:10, Jialiu Lin wrote: > You can convert this LOG(ERROR) into a DCHECK Yeah, sorry I should probably just take this out. This is only one example of a valid MachO magic value (see line 79). I just left it in for my own debugging purposes but it doesn't really belong. https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:72: if (bytes.length() != sizeof(uint32_t)) On 2017/07/12 17:57:10, Jialiu Lin wrote: > Maybe this should be a DCHECK too. I changed it to < instead of !=, so the function can be called with longer strings and it will still check only the first 4 bytes. I wanted this check in the code so that the memcpy() doesn't access invalid memory if the string is less than 4 bytes long. https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:170: results->has_executable = true; On 2017/07/12 17:57:10, Jialiu Lin wrote: > Do we need to set has_executable = true here? When we go into the .app directory > and encounter a Mach-O file, we can set this value then. Yeah, I was debating this as well. I figured I'd leave it as is since this mirrors the pre-existing behavior of the code and I didn't want to break any tests. I'll just take it out and hopefully no tests start failing.
Description was changed from ========== This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac BUG=600392 ========== to ========== This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac. BUG=600392 ==========
On 2017/07/12 18:15:06, mortonm wrote: > https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... > File chrome/common/safe_browsing/zip_analyzer.cc (right): > > https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... > chrome/common/safe_browsing/zip_analyzer.cc:70: LOG(ERROR) << "macho?: " << > (bytes.compare("\xcf\xfa\xed\xfe") == 0); > On 2017/07/12 17:57:10, Jialiu Lin wrote: > > You can convert this LOG(ERROR) into a DCHECK > > Yeah, sorry I should probably just take this out. This is only one example of a > valid MachO magic value (see line 79). I just left it in for my own debugging > purposes but it doesn't really belong. > > https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... > chrome/common/safe_browsing/zip_analyzer.cc:72: if (bytes.length() != > sizeof(uint32_t)) > On 2017/07/12 17:57:10, Jialiu Lin wrote: > > Maybe this should be a DCHECK too. > > I changed it to < instead of !=, so the function can be called with longer > strings and it will still check only the first 4 bytes. I wanted this check in > the code so that the memcpy() doesn't access invalid memory if the string is > less than 4 bytes long. > > https://codereview.chromium.org/2961373002/diff/260001/chrome/common/safe_bro... > chrome/common/safe_browsing/zip_analyzer.cc:170: results->has_executable = true; > On 2017/07/12 17:57:10, Jialiu Lin wrote: > > Do we need to set has_executable = true here? When we go into the .app > directory > > and encounter a Mach-O file, we can set this value then. > > Yeah, I was debating this as well. I figured I'd leave it as is since this > mirrors the pre-existing behavior of the code and I didn't want to break any > tests. I'll just take it out and hopefully no tests start failing. Wow, didn't realize that 'CIGAM' was 'MAGIC' backwards so ignore the explanation for my first comment :) I guess there is only one magic value but those constants check for different endianness and integer width of the architecture.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-...) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
mortonm@google.com changed reviewers: + satorux@chromium.org
Patchset #14 (id:400001) has been deleted
Patchset #12 (id:360001) has been deleted
Patchset #11 (id:340001) has been deleted
Patchset #7 (id:260001) has been deleted
Patchset #9 (id:320001) has been deleted
adding satorux@ as OWNER for: third_party/zlib/google/zip_reader.cc third_party/zlib/google/zip_reader.h
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
satorux@chromium.org changed reviewers: + palmer@chromium.org
+palmer Does this scanning code run in the browser process or in a separate process? https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:168: size_t num_bytes) const; Could you add a unit test for this function to zip_reader_unittest.cc ? https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:229: bool ExtractPartOfCurrentEntryToString(size_t max_read_bytes, Ditto.
All of the code in zip_analyzer.cc, including calling zip_reader functionality, runs in a sandboxed utility process. https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:168: size_t num_bytes) const; On 2017/07/14 01:00:53, satorux1 wrote: > Could you add a unit test for this function to zip_reader_unittest.cc ? Done. https://codereview.chromium.org/2961373002/diff/550001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:229: bool ExtractPartOfCurrentEntryToString(size_t max_read_bytes, On 2017/07/14 01:00:53, satorux1 wrote: > Ditto. Done.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #14 (id:490001) has been deleted
Patchset #14 (id:510001) has been deleted
Patchset #14 (id:530001) has been deleted
Patchset #18 (id:630001) has been deleted
Patchset #12 (id:460001) has been deleted
https://codereview.chromium.org/2961373002/diff/650001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/650001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:362: total_num_bytes_read += base::checked_cast<size_t>(num_bytes_read); Should |total_num_bytes_read| be a |base::CheckedNumeric<size_t>| ?
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/650001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/650001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:362: total_num_bytes_read += base::checked_cast<size_t>(num_bytes_read); On 2017/07/17 18:44:49, palmer wrote: > Should |total_num_bytes_read| be a |base::CheckedNumeric<size_t>| ? > Yes, I should have considered that |total_num_bytes_read| could overflow for a huge (>4GB) extraction on a system that only uses 32 bits for size_t. More importantly, this comment made me realize a corner-case bug in the code that I hadn't accounted for since this function doesn't necessarily read to the end of the file. The code now calculates |bytes_to_read| on each iteration of the loop to make sure that not too many bytes of the file are read when the loop requires multiple iterations. Thanks!
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
sorry for the belated response. https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:328: bool ZipReader::ExtractPartOfCurrentEntry(WriterDelegate* delegate, This function looks similar to ExtractCurrentEntry() above. Can you unify them? I wonder if you can do something like: bool ZipReader::ExtractCurrentEntry(WriterDelegate* delegate) const { return ExtractPartOfCurrentEntry(delegate, -1); // -1 means everything } https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:508: bool ZipReader::ExtractPartOfCurrentEntryToString(size_t num_bytes, ditto. this looks similar to ExtractCurrentEntryToString(). would be good to unify them. https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:167: bool ExtractPartOfCurrentEntry(WriterDelegate* delegate, Part -> Head/Prefix/Beinning to make it clear that it extracts the beginning of the entry?
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:328: bool ZipReader::ExtractPartOfCurrentEntry(WriterDelegate* delegate, On 2017/07/18 07:46:37, satorux1 wrote: > This function looks similar to ExtractCurrentEntry() above. Can you unify them? > I wonder if you can do something like: > > bool ZipReader::ExtractCurrentEntry(WriterDelegate* delegate) const { > return ExtractPartOfCurrentEntry(delegate, -1); // -1 means everything > } Done. https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:508: bool ZipReader::ExtractPartOfCurrentEntryToString(size_t num_bytes, On 2017/07/18 07:46:37, satorux1 wrote: > ditto. this looks similar to ExtractCurrentEntryToString(). would be good to > unify them. Done. https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:167: bool ExtractPartOfCurrentEntry(WriterDelegate* delegate, On 2017/07/18 07:46:38, satorux1 wrote: > Part -> Head/Prefix/Beinning to make it clear that it extracts the beginning of > the entry? Done. https://codereview.chromium.org/2961373002/diff/690001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:167: bool ExtractPartOfCurrentEntry(WriterDelegate* delegate, On 2017/07/18 07:46:38, satorux1 wrote: > Part -> Head/Prefix/Beinning to make it clear that it extracts the beginning of > the entry? Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
sorry for the belated response again https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:143: reader.ExtractFromBeginningOfCurrentEntryToString(sizeof(uint32_t), false, Sorry for not taking a look at the caller earlier, but I'm trying to understand why you needed to introduce this new function. Seems to me that the following would be OK: reader.ExtractCurrentEntryToString(sizeof(uint32_t), &magic); If the content is larger than 4GB, the function will stop reading at the maximum size. It'll return false but |magic| should contain the 4GB of data. https://codereview.chromium.org/2961373002/diff/710001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/710001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:162: // Extracts entirety of the current entry in chunks to |delegate|. While you are at it, could you add documentation about the return value?
https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:143: reader.ExtractFromBeginningOfCurrentEntryToString(sizeof(uint32_t), false, On 2017/07/21 05:27:02, satorux1 wrote: > Sorry for not taking a look at the caller earlier, but I'm trying to understand > why you needed to introduce this new function. > > Seems to me that the following would be OK: > > reader.ExtractCurrentEntryToString(sizeof(uint32_t), &magic); > > If the content is larger than 4GB, the function will stop reading at the maximum > size. It'll return false but |magic| should contain the 4GB of data. In this case, we only want to read the first 4 bytes of the current entry to determine whether it is a Mac executable (and thus has the MAGIC header for MachO files). We don't want to do an in-memory copy of a large file into a string just to be able to check the first 4 bytes. The existing functionality in zip_reader only allows for reading an entire file while specifying the max length to extract, but we can't just specify |4| here because for all files larger than 4 bytes the extraction would simply fail. In the current implementation, when extraction fails, ExtractCurrentEntryToString() does not copy any data to the output string. Note that this function returns without copying data to output: https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... Moreover, the change isn't as simple as just copying the data to the output even when extraction fails, due to the way ExtractCurrentEntry() is written as well as the WriteBytes function for StringWriterDelegate: https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... Unless i'm missing something, I think the new function is necessary for reading the first 4 bytes of an entry https://codereview.chromium.org/2961373002/diff/710001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/710001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:162: // Extracts entirety of the current entry in chunks to |delegate|. On 2017/07/21 05:27:02, satorux1 wrote: > While you are at it, could you add documentation about the return value? Done. Didn't add an explanation here, but there are a variety of cases that could cause this function to return false, including not being able to read the current entry, not being able to write to the delegate, etc
https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:143: reader.ExtractFromBeginningOfCurrentEntryToString(sizeof(uint32_t), false, On 2017/07/21 16:29:56, mortonm wrote: > On 2017/07/21 05:27:02, satorux1 wrote: > > Sorry for not taking a look at the caller earlier, but I'm trying to > understand > > why you needed to introduce this new function. > > > > Seems to me that the following would be OK: > > > > reader.ExtractCurrentEntryToString(sizeof(uint32_t), &magic); > > > > If the content is larger than 4GB, the function will stop reading at the > maximum > > size. It'll return false but |magic| should contain the 4GB of data. > > In this case, we only want to read the first 4 bytes of the current entry to > determine whether it is a Mac executable (and thus has the MAGIC header for > MachO files). We don't want to do an in-memory copy of a large file into a > string just to be able to check the first 4 bytes. The existing functionality in > zip_reader only allows for reading an entire file while specifying the max > length to extract, but we can't just specify |4| here because for all files > larger than 4 bytes the extraction would simply fail. > > In the current implementation, when extraction fails, > ExtractCurrentEntryToString() does not copy any data to the output string. Note > that this function returns without copying data to output: > https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... > > Moreover, the change isn't as simple as just copying the data to the output even > when extraction fails, due to the way ExtractCurrentEntry() is written as well > as the WriteBytes function for StringWriterDelegate: > https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... > > Unless i'm missing something, I think the new function is necessary for reading > the first 4 bytes of an entry My bad. sizeof(uint32_t) is of course 4, not 4GB. :( Sorry for the confusion. What I meant was that the following would work: reader.ExtractCurrentEntryToString(sizeof(sizeof(uint32_t), &magic); // sizeof(sizeof(uint32_t) == 4 But you are right that it doesn't work currently. I'd rather think that fixing the behavior is better than introducing a new function that looks similar but behaves slightly different.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/710001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:143: reader.ExtractFromBeginningOfCurrentEntryToString(sizeof(uint32_t), false, On 2017/07/21 23:00:09, satorux1 wrote: > On 2017/07/21 16:29:56, mortonm wrote: > > On 2017/07/21 05:27:02, satorux1 wrote: > > > Sorry for not taking a look at the caller earlier, but I'm trying to > > understand > > > why you needed to introduce this new function. > > > > > > Seems to me that the following would be OK: > > > > > > reader.ExtractCurrentEntryToString(sizeof(uint32_t), &magic); > > > > > > If the content is larger than 4GB, the function will stop reading at the > > maximum > > > size. It'll return false but |magic| should contain the 4GB of data. > > > > In this case, we only want to read the first 4 bytes of the current entry to > > determine whether it is a Mac executable (and thus has the MAGIC header for > > MachO files). We don't want to do an in-memory copy of a large file into a > > string just to be able to check the first 4 bytes. The existing functionality > in > > zip_reader only allows for reading an entire file while specifying the max > > length to extract, but we can't just specify |4| here because for all files > > larger than 4 bytes the extraction would simply fail. > > > > In the current implementation, when extraction fails, > > ExtractCurrentEntryToString() does not copy any data to the output string. > Note > > that this function returns without copying data to output: > > > https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... > > > > Moreover, the change isn't as simple as just copying the data to the output > even > > when extraction fails, due to the way ExtractCurrentEntry() is written as well > > as the WriteBytes function for StringWriterDelegate: > > > https://cs.chromium.org/chromium/src/third_party/zlib/google/zip_reader.cc?rc... > > > > Unless i'm missing something, I think the new function is necessary for > reading > > the first 4 bytes of an entry > > My bad. sizeof(uint32_t) is of course 4, not 4GB. :( Sorry for the confusion. > > What I meant was that the following would work: > > reader.ExtractCurrentEntryToString(sizeof(sizeof(uint32_t), &magic); // > sizeof(sizeof(uint32_t) == 4 > > But you are right that it doesn't work currently. I'd rather think that fixing > the behavior is better than introducing a new function that looks similar but > behaves slightly different. Done. Required adding an extra parameter to each of the existing functions, but this seemed like the most straightforward way.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Thank you for revising the patch. I think it's getting close. I have a request to simplify the code a bit more (i.e. removing a parameter) https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:220: // |max_read_bytes|. Otherwise if |read_entire_file| is false, |num_bytes| nit: num_bytes and max_read_bytes are not consistent. https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:224: bool read_entire_file, zip_analyzier is the only caller of this function, so I think we can drop |read_entire_file| parameter to simplify the function. How about: bool ExtractCurrentEntryToString(size_t max_read_bytes, std::string* output) const; Returns true if the entire content is read. If the content is bigger than |max_read_bytes|, returns false and |output| is filled with|max_read_bytes| of data. If an error occurs, returns false, and |output| is set to the empty string.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.h (right): https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:220: // |max_read_bytes|. Otherwise if |read_entire_file| is false, |num_bytes| On 2017/07/25 07:55:15, satorux1 wrote: > nit: num_bytes and max_read_bytes are not consistent. Done. https://codereview.chromium.org/2961373002/diff/780001/third_party/zlib/googl... third_party/zlib/google/zip_reader.h:224: bool read_entire_file, On 2017/07/25 07:55:15, satorux1 wrote: > zip_analyzier is the only caller of this function, so I think we can drop > |read_entire_file| parameter to simplify the function. How about: > > bool ExtractCurrentEntryToString(size_t max_read_bytes, > std::string* output) const; > > Returns true if the entire content is read. > > If the content is bigger than |max_read_bytes|, > returns false and |output| is filled with|max_read_bytes| of data. > > If an error occurs, returns false, and |output| is set to the empty string. > Done, I agree, this is a better way of writing this function. I still kept the |num_bytes_to_extract| parameter in ExtractCurrentEntry(), since that function needs to know how many bytes to extract from the entry and there is no member of WriterDelegate that gives this value.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:302: bool entire_file_extracted = true; start with false and set it to true only when the entire file is read? https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:335: if (total_num_bytes_read == num_bytes_to_extract) { Special-casing this case seems tricky. Maybe we can get rid of this by something like below? while (true) { const int num_bytes_read = unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); if (num_bytes == 0) { entire_file_extracted = true; break; } else if (num_bytes_read < 0) { // If num_bytes_read < 0, then it's a specific UNZ_* error code. break; } else if (num_bytes_read > 0) { // Some data is read. auto unread_bytes = base::CheckedNumeric<uint64_t>(num_bytes_to_extract) - total_num_bytes_read; uint_64 num_bytes_to_write = std::min(unread_bytes.ValueOrDie(), num_bytes_read); if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) break; } } https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:448: bool ZipReader::ExtractCurrentEntryToString(uint64_t num_bytes, num_bytes -> max_read_bytes to be consistent with the header file https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:475: // There was an error in extracting entry. I was a bit confused at first. Maybe add something like below? There was an error. If the current entry is smaller than max_read_bytes, ExtractCurrentEntry() should read the whole content and return true.
https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:335: if (total_num_bytes_read == num_bytes_to_extract) { On 2017/07/27 07:05:31, satorux1 wrote: > Special-casing this case seems tricky. Maybe we can get rid of this by something > like below? > > while (true) { > const int num_bytes_read = > unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); > > if (num_bytes == 0) { > entire_file_extracted = true; > break; > } else if (num_bytes_read < 0) { > // If num_bytes_read < 0, then it's a specific UNZ_* error code. > break; > } else if (num_bytes_read > 0) { > // Some data is read. > auto unread_bytes = base::CheckedNumeric<uint64_t>(num_bytes_to_extract) - > total_num_bytes_read; > uint_64 num_bytes_to_write = std::min(unread_bytes.ValueOrDie(), > num_bytes_read); we'd also need if (num_bytes_to_write == 0) break; > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) > break; > } > } >
Patchset #27 (id:840001) has been deleted
https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:302: bool entire_file_extracted = true; On 2017/07/27 07:05:31, satorux1 wrote: > start with false and set it to true only when the entire file is read? Done. https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:335: if (total_num_bytes_read == num_bytes_to_extract) { On 2017/07/27 07:10:28, satorux1 wrote: > On 2017/07/27 07:05:31, satorux1 wrote: > > Special-casing this case seems tricky. Maybe we can get rid of this by > something > > like below? > > > > while (true) { > > const int num_bytes_read = > > unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); > > > > if (num_bytes == 0) { > > entire_file_extracted = true; > > break; > > } else if (num_bytes_read < 0) { > > // If num_bytes_read < 0, then it's a specific UNZ_* error code. > > break; > > } else if (num_bytes_read > 0) { > > // Some data is read. > > auto unread_bytes = base::CheckedNumeric<uint64_t>(num_bytes_to_extract) - > > total_num_bytes_read; > > uint_64 num_bytes_to_write = std::min(unread_bytes.ValueOrDie(), > > num_bytes_read); > > we'd also need > > if (num_bytes_to_write == 0) > break; > > > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) > > break; > > } > > } > > > Done. https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:335: if (total_num_bytes_read == num_bytes_to_extract) { On 2017/07/27 07:05:31, satorux1 wrote: > Special-casing this case seems tricky. Maybe we can get rid of this by something > like below? > > while (true) { > const int num_bytes_read = > unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); > > if (num_bytes == 0) { > entire_file_extracted = true; > break; > } else if (num_bytes_read < 0) { > // If num_bytes_read < 0, then it's a specific UNZ_* error code. > break; > } else if (num_bytes_read > 0) { > // Some data is read. > auto unread_bytes = base::CheckedNumeric<uint64_t>(num_bytes_to_extract) - > total_num_bytes_read; > uint_64 num_bytes_to_write = std::min(unread_bytes.ValueOrDie(), > num_bytes_read); > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) > break; > } > } > Done. https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:448: bool ZipReader::ExtractCurrentEntryToString(uint64_t num_bytes, On 2017/07/27 07:05:31, satorux1 wrote: > num_bytes -> max_read_bytes to be consistent with the header file Done. https://codereview.chromium.org/2961373002/diff/820001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:475: // There was an error in extracting entry. On 2017/07/27 07:05:31, satorux1 wrote: > I was a bit confused at first. Maybe add something like below? > > There was an error. If the current entry is smaller than max_read_bytes, > ExtractCurrentEntry() should read the whole content and return true. Done. The revised comment now explains the situation more clearly.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks satorux1@ for your thorough review! Really appreciate your effort. BTW, do we still need palmer@'s review? LGTM from safe_browsing side.
lgtm
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:329: unzReadCurrentFile(zip_file_, buf.get(), 1) == 0) { I thought we could get rid of the call to unzReadCurrentFile() here, since the while loop will end at the first |if| conditional in the loop, if the entire file is read: 310 if (num_bytes_read == 0) { 311 entire_file_extracted = true; 312 break; 313 } Here, can we go with something like below? auto unread_bytes = ...; if (unread_bytes.ValueOrDie() == 0) break; uint64_t uint64_t num_bytes_to_write = ...; if (!delegate->WriteBytes(buf.get(), num_bytes_to_write) break;
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Hi satorux1@, thanks for the help on this CL! I am close to finishing my internship and am seeking to wrap up this CL as part of a broader project on SafeBrowsing Download Protection for Mac, so hopefully we can get this done soon :) https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:329: unzReadCurrentFile(zip_file_, buf.get(), 1) == 0) { On 2017/07/27 23:18:10, satorux1 wrote: > I thought we could get rid of the call to unzReadCurrentFile() here, since the > while loop will end at the first |if| conditional in the loop, if the entire > file is read: > > 310 if (num_bytes_read == 0) { > 311 entire_file_extracted = true; > 312 break; > 313 } > > Here, can we go with something like below? > > auto unread_bytes = ...; > if (unread_bytes.ValueOrDie() == 0) > break; > > uint64_t uint64_t num_bytes_to_write = ...; > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write) > break; Yeah, I think I originally wrote the function this way as well. The problem with this approach is that it will cause the function to return false if the specified |num_bytes_to_extract| parameter equals the size of the entry being extracted. The check on line 325 checks to see if the |num_bytes_to_extract| cap has been hit while performing extraction. If so, we usually want to return false, except when |num_bytes_to_extract| equals the size of the entry, in which case we want to return true. However, in line with your comment, I've re-written the code to avoid doing the second call to unzReadCurrentFile().
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:329: unzReadCurrentFile(zip_file_, buf.get(), 1) == 0) { On 2017/07/31 18:13:04, mortonm wrote: > On 2017/07/27 23:18:10, satorux1 wrote: > > I thought we could get rid of the call to unzReadCurrentFile() here, since the > > while loop will end at the first |if| conditional in the loop, if the entire > > file is read: > > > > 310 if (num_bytes_read == 0) { > > 311 entire_file_extracted = true; > > 312 break; > > 313 } > > > > Here, can we go with something like below? > > > > auto unread_bytes = ...; > > if (unread_bytes.ValueOrDie() == 0) > > break; > > > > uint64_t uint64_t num_bytes_to_write = ...; > > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write) > > break; > > Yeah, I think I originally wrote the function this way as well. The problem with > this approach is that it will cause the function to return false if the > specified |num_bytes_to_extract| parameter equals the size of the entry being > extracted. Ah you are right. Sorry for missing that point. > The check on line 325 checks to see if the |num_bytes_to_extract| cap > has been hit while performing extraction. If so, we usually want to return > false, except when |num_bytes_to_extract| equals the size of the entry, in which > case we want to return true. However, in line with your comment, I've re-written > the code to avoid doing the second call to unzReadCurrentFile(). I think the new version with two additional booleans is more complicated than the previous version. :) I think your previous version could be a bit simplified if we keep track of the remaining capacity of the delegate than the total bytes extracted: uint64_t remaining_capacity = num_bytes_to_extract; bool entire_file_extracted = false; while (remaining_capacity > 0) { const int num_bytes_read = unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); if (num_bytes_read == 0) { entire_file_extracted = true; break; } if (num_bytes_read < 0) { // If num_bytes_read < 0, then it's a specific UNZ_* error code. break; } else if (num_bytes_read > 0) { uint64_t num_bytes_to_write = std::min<uint64_t>(remaining_capacity, base::checked_cast<uint64_t>(num_bytes_read)); if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) break; if (remaining_capacity == base::checked_cast<uint64_t>(num_bytes_read)) { // Ensures function returns true if the entire file has been read. entire_file_extracted = (unzReadCurrentFile(zip_file_, buf.get(), 1) == 0); } CHECK_GE(remaining_capacity, num_bytes_to_write); remaining_capacity -= num_bytes_to_write; } } https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... File third_party/zlib/google/zip_reader_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:580: EXPECT_EQ(0, memcmp(contents.c_str(), "0123456", i)); EXPECT_EQ(base::StringPiece("0123456", i).as__string(), contents) ? then you can get rid of EXPECT_EQ(i, contents.size()); https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:586: EXPECT_EQ(0, memcmp(contents.c_str(), "0123456", i)); ditto https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:605: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); EXPECT_EQ("", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:607: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); EXPECT_EQ("", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:612: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); EXPECT_EQ("", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:614: EXPECT_EQ(0, memcmp(contents.c_str(), "0", 1)); EXPECT_EQ("0", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:616: EXPECT_EQ(0, memcmp(contents.c_str(), "0", 1)); EXPECT_EQ("0", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:620: EXPECT_TRUE(reader.ExtractCurrentEntryToString(0, &contents)); EXPECT_EQ("", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:623: EXPECT_EQ(0, memcmp(contents.c_str(), "01", 2)); EXPECT_EQ("01", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:625: EXPECT_EQ(0, memcmp(contents.c_str(), "0123", 4)); EXPECT_EQ("0123", contents)? https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:627: EXPECT_EQ(0, memcmp(contents.c_str(), "0123", 0)); 0 here does not look correct. EXPECT_EQ("0123", contents)?
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/860001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:329: unzReadCurrentFile(zip_file_, buf.get(), 1) == 0) { On 2017/08/01 08:01:08, satorux1 wrote: > On 2017/07/31 18:13:04, mortonm wrote: > > On 2017/07/27 23:18:10, satorux1 wrote: > > > I thought we could get rid of the call to unzReadCurrentFile() here, since > the > > > while loop will end at the first |if| conditional in the loop, if the entire > > > file is read: > > > > > > 310 if (num_bytes_read == 0) { > > > 311 entire_file_extracted = true; > > > 312 break; > > > 313 } > > > > > > Here, can we go with something like below? > > > > > > auto unread_bytes = ...; > > > if (unread_bytes.ValueOrDie() == 0) > > > break; > > > > > > uint64_t uint64_t num_bytes_to_write = ...; > > > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write) > > > break; > > > > Yeah, I think I originally wrote the function this way as well. The problem > with > > this approach is that it will cause the function to return false if the > > specified |num_bytes_to_extract| parameter equals the size of the entry being > > extracted. > > Ah you are right. Sorry for missing that point. > > > The check on line 325 checks to see if the |num_bytes_to_extract| cap > > has been hit while performing extraction. If so, we usually want to return > > false, except when |num_bytes_to_extract| equals the size of the entry, in > which > > case we want to return true. However, in line with your comment, I've > re-written > > the code to avoid doing the second call to unzReadCurrentFile(). > > I think the new version with two additional booleans is more complicated than > the previous version. :) > > I think your previous version could be a bit simplified if we keep track of the > remaining capacity of the delegate than the total bytes extracted: > > uint64_t remaining_capacity = num_bytes_to_extract; > bool entire_file_extracted = false; > > while (remaining_capacity > 0) { > const int num_bytes_read = > unzReadCurrentFile(zip_file_, buf.get(), internal::kZipBufSize); > > if (num_bytes_read == 0) { > entire_file_extracted = true; > break; > } > if (num_bytes_read < 0) { > // If num_bytes_read < 0, then it's a specific UNZ_* error code. > > break; > } else if (num_bytes_read > 0) { > uint64_t num_bytes_to_write = > std::min<uint64_t>(remaining_capacity, > base::checked_cast<uint64_t>(num_bytes_read)); > if (!delegate->WriteBytes(buf.get(), num_bytes_to_write)) > break; > if (remaining_capacity == base::checked_cast<uint64_t>(num_bytes_read)) { > // Ensures function returns true if the entire file has been read. > entire_file_extracted = > (unzReadCurrentFile(zip_file_, buf.get(), 1) == 0); > } > CHECK_GE(remaining_capacity, num_bytes_to_write); > remaining_capacity -= num_bytes_to_write; > } > } > > > Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... File third_party/zlib/google/zip_reader_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:580: EXPECT_EQ(0, memcmp(contents.c_str(), "0123456", i)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ(base::StringPiece("0123456", i).as__string(), contents) ? then you can > get rid of EXPECT_EQ(i, contents.size()); Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:586: EXPECT_EQ(0, memcmp(contents.c_str(), "0123456", i)); On 2017/08/01 08:01:09, satorux1 wrote: > ditto Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:605: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:607: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:612: EXPECT_EQ(0, memcmp(contents.c_str(), "", 0)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:614: EXPECT_EQ(0, memcmp(contents.c_str(), "0", 1)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("0", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:616: EXPECT_EQ(0, memcmp(contents.c_str(), "0", 1)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("0", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:620: EXPECT_TRUE(reader.ExtractCurrentEntryToString(0, &contents)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:623: EXPECT_EQ(0, memcmp(contents.c_str(), "01", 2)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("01", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:625: EXPECT_EQ(0, memcmp(contents.c_str(), "0123", 4)); On 2017/08/01 08:01:09, satorux1 wrote: > EXPECT_EQ("0123", contents)? Done. https://codereview.chromium.org/2961373002/diff/900001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:627: EXPECT_EQ(0, memcmp(contents.c_str(), "0123", 0)); On 2017/08/01 08:01:09, satorux1 wrote: > 0 here does not look correct. > > EXPECT_EQ("0123", contents)? The file is only 4 characters long. I am checking that when |max_read_bytes| is larger than the file, the function still returns true and returns the entire file in the string. I've added a comment noting this.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
zlib stuff lgtm with some requests: https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:315: if (num_bytes_read < 0) { nit: else if (sorry, my code sample was bad) https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... File third_party/zlib/google/zip_reader_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:743: } I's suggest to remove the three gmock-based tests. These tests seem to closely depend on the current implementation details, that might change in the future. Besides, as discussed on chromium-dev@ time to time, gmock-based tests are difficult to read and maintain.
https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... File third_party/zlib/google/zip_reader_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:743: } On 2017/08/03 08:36:48, satorux1 wrote: > I's suggest to remove the three gmock-based tests. These tests seem to closely > depend on the current implementation details, that might change in the future. > > Besides, as discussed on chromium-dev@ time to time, gmock-based tests are > difficult to read and maintain. correction: difficult => often difficult
Patchset #1 (id:1) has been deleted
Patchset #2 (id:60001) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #6 (id:380001) has been deleted
Patchset #11 (id:590001) has been deleted
Patchset #11 (id:610001) has been deleted
Patchset #11 (id:650001) has been deleted
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks! https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... File third_party/zlib/google/zip_reader.cc (right): https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader.cc:315: if (num_bytes_read < 0) { On 2017/08/03 08:36:48, satorux1 wrote: > nit: else if > > (sorry, my code sample was bad) Done. https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... File third_party/zlib/google/zip_reader_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:743: } On 2017/08/03 08:47:03, satorux1 wrote: > On 2017/08/03 08:36:48, satorux1 wrote: > > I's suggest to remove the three gmock-based tests. These tests seem to closely > > depend on the current implementation details, that might change in the future. > > > > Besides, as discussed on chromium-dev@ time to time, gmock-based tests are > > difficult to read and maintain. > > correction: difficult => often difficult Acknowledged. https://codereview.chromium.org/2961373002/diff/920001/third_party/zlib/googl... third_party/zlib/google/zip_reader_unittest.cc:743: } On 2017/08/03 08:36:48, satorux1 wrote: > I's suggest to remove the three gmock-based tests. These tests seem to closely > depend on the current implementation details, that might change in the future. > > Besides, as discussed on chromium-dev@ time to time, gmock-based tests are > difficult to read and maintain. Done.
rsesek@chromium.org changed reviewers: + rsesek@chromium.org
Note on the CL description: The "Subject" in Rietveld does not get included in the commit log. Please copy it to be the first line in the "Description" box. https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... File chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:104: ASSERT_EQ(data.is_signed, binary.has_signature()); nit: Use EXPECT_ instead of ASSERT_. The difference is that EXPECT_ will let the test keep running, while ASSERT_ will abort. https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:106: ASSERT_LT(0, binary.signature().signed_data_size()); This should stay ASSERT_ so that you can ensure that .signed_data(0) is valid... https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:107: ASSERT_NE(0U, binary.signature().signed_data(0).size()); But this can be EXPECT_ https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:13: #if defined(OS_MACOSX) Platform-specific includes usually come after the common ones, so move this to after line 30. https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:21: #include "build/build_config.h" note: build_config.h is probably required for the OS_MACOSX define above https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:77: return magic == FAT_MAGIC || magic == FAT_CIGAM || magic == MH_MAGIC || Rather than duplicating this, can you use MachOImageReader::IsMachOMagicValue?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... File chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc (right): https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:104: ASSERT_EQ(data.is_signed, binary.has_signature()); On 2017/08/03 15:43:35, Robert Sesek wrote: > nit: Use EXPECT_ instead of ASSERT_. The difference is that EXPECT_ will let the > test keep running, while ASSERT_ will abort. Done. https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:106: ASSERT_LT(0, binary.signature().signed_data_size()); On 2017/08/03 15:43:35, Robert Sesek wrote: > This should stay ASSERT_ so that you can ensure that .signed_data(0) is valid... Acknowledged. https://codereview.chromium.org/2961373002/diff/940001/chrome/browser/safe_br... chrome/browser/safe_browsing/sandboxed_zip_analyzer_unittest.cc:107: ASSERT_NE(0U, binary.signature().signed_data(0).size()); On 2017/08/03 15:43:36, Robert Sesek wrote: > But this can be EXPECT_ Done. https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:13: #if defined(OS_MACOSX) On 2017/08/03 15:43:36, Robert Sesek wrote: > Platform-specific includes usually come after the common ones, so move this to > after line 30. Done. https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:21: #include "build/build_config.h" On 2017/08/03 15:43:36, Robert Sesek wrote: > note: build_config.h is probably required for the OS_MACOSX define above Acknowledged. https://codereview.chromium.org/2961373002/diff/940001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:77: return magic == FAT_MAGIC || magic == FAT_CIGAM || magic == MH_MAGIC || On 2017/08/03 15:43:36, Robert Sesek wrote: > Rather than duplicating this, can you use MachOImageReader::IsMachOMagicValue? Good call, thanks.
Description was changed from ========== This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac. BUG=600392 ========== to ========== Improve Zip File Scanning on Mac This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac. BUG=600392 ==========
LGTM https://codereview.chromium.org/2961373002/diff/960001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/960001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:21: #include "chrome/common/safe_browsing/mach_o_image_reader_mac.h" I think this will probably need to move after line 29 for non-Mac.
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2961373002/diff/960001/chrome/common/safe_bro... File chrome/common/safe_browsing/zip_analyzer.cc (right): https://codereview.chromium.org/2961373002/diff/960001/chrome/common/safe_bro... chrome/common/safe_browsing/zip_analyzer.cc:21: #include "chrome/common/safe_browsing/mach_o_image_reader_mac.h" On 2017/08/03 18:36:31, Robert Sesek wrote: > I think this will probably need to move after line 29 for non-Mac. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by mortonm@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mortonm@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from palmer@chromium.org, jialiul@chromium.org, satorux@chromium.org, rsesek@chromium.org Link to the patchset: https://codereview.chromium.org/2961373002/#ps980001 (title: "minor")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 980001, "attempt_start_ts": 1501858430463320, "parent_rev": "c4c3f6423d629ca803771737becb94ccf9073b61", "commit_rev": "034ecb569929e1202f39cf744a32e8deeade06c8"}
Message was sent while issue was closed.
Description was changed from ========== Improve Zip File Scanning on Mac This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac. BUG=600392 ========== to ========== Improve Zip File Scanning on Mac This CL fixes two aspects of broken ZIP processing on Mac. First, it ensures that .app files are treated as directories and as such do not break binary feature extraction, causing analysis to fail. Second, it performs type-sniffing to identify the existence of executable MachO files that do not have any file extension, as is the usual case on Mac. BUG=600392 Review-Url: https://codereview.chromium.org/2961373002 Cr-Commit-Position: refs/heads/master@{#492032} Committed: https://chromium.googlesource.com/chromium/src/+/034ecb569929e1202f39cf744a32... ==========
Message was sent while issue was closed.
Committed patchset #26 (id:980001) as https://chromium.googlesource.com/chromium/src/+/034ecb569929e1202f39cf744a32... |