|
|
Chromium Code Reviews|
Created:
4 years, 4 months ago by vakh (use Gerrit instead) Modified:
4 years, 4 months ago CC:
chromium-reviews, noé, palmer, francois_mozilla.com Base URL:
https://chromium.googlesource.com/chromium/src.git@01_checksum Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionPVer4: DecodeHashes needs to sort the output of the Rice decoder
Also:
1. Add tests to validate the Rice-decoder implementation.
2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice.
BUG=543161, 624567
Committed: https://crrev.com/1670eb02e8172d1750b70c50dbe3fb968e1f365f
Cr-Commit-Position: refs/heads/master@{#413027}
Patch Set 1 : PVer4: DecodeHashes needs to sort the output of the Rice decoder #Patch Set 2 : Fix the test to expect sorted list of hashes from DecodeBytes #
Total comments: 20
Patch Set 3 : Fix the corresponding test in v4_store_unittest.cc also #
Total comments: 16
Patch Set 4 : Remove DVLOGs. #include logging.h #Patch Set 5 : Use a vector to decode and sort uint32_t values. re-interpret the memory of that vector as raw hash… #
Total comments: 4
Patch Set 6 : Use htonl/ntohl instead of FlipEndianness (but the effect is the same). Use GetBitsFromCurrentWord … #Patch Set 7 : Use FlipEndianness instead of ntohl/htonl. #error for big endian #Patch Set 8 : rebase #
Total comments: 13
Patch Set 9 : Use htonl. Return from GetBitsFromCurrentWord #
Total comments: 6
Patch Set 10 : DecodeBytes->DecodePrefixes. More detailed comment. #
Dependent Patchsets: Messages
Total messages: 87 (53 generated)
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
rebase
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
Nit: s/DCHECK_EQ/EXPECT_EQ/
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
PVer4: DecodeHashes needs to sort the output of the Rice decoder
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Add tests to validate the Rice-decoder implementation. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 ========== to ========== PVer4: DecodeHashes needs to sort the output of the Rice decoder Also: 1. Add tests to validate the Rice-decoder implementation. 2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 ==========
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Patchset #1 (id:40001) has been deleted
Fix the test to expect sorted list of hashes from DecodeBytes
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
vakh@chromium.org changed reviewers: + awoz@chromium.org, nparker@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
Fix the corresponding test in v4_store_unittest.cc also
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { This should be platform-specific. Maybe use C's htonl(), or something from chromium/src/base/big_endian.h https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:29: bytes[3] = (char)(word & mask); Is this ordering endian-dependent as well? https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:119: base::CheckedNumeric<uint32_t> last_value(first_value); You're casting a int64 to a unit32_t. Is that intentional? https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:120: std::set<uint32_t> hash_prefixes; This is going to use at least another 24 bytes of RAM per 4-byte hash (three pointers). I think it'd be better to use a different sorting mechanism, since we've got several copies of the hashes in RAM already. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); If this fails, will it tell you which test cast failed? An idea: log a testcase number when it fails. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:257: EXPECT_EQ(std::string("\x5\0\0\0\fL\x93\xADV\x7F\xF6o\xCEo1\x81", 16), out); How did you generate this? (or was it just based on code output?) https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:205: DVLOG(1) << "Update successful: " << *this; These are fairly verbose -- You should plan to remove most of them once the code is stable. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:404: DVLOG(1) << "Failure: ADDITIONS_HAS_EXISTING_PREFIX_FAILURE"; Can you log this further up instead? Just logging the enum # is usually sufficient, since it should be rarely used. Same with those below. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:506: DVLOG(1) << "Failure: Unexpected magic number found in file: " These DVLOGs actually seem like more clutter than they're worth, since they just mean the file is corrupted. We don't need to maintain code to tell a developer which part of the file is corrupted. Same with file version and response type. If someday we start getting those errors, we can easily add a line to log it.
shess@chromium.org changed reviewers: + shess@chromium.org
DVLOG! OK, I'll ignore it. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (left): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:5: #include "base/logging.h" I think you still need logging.h for things like NOTREACHED(). Just because safe_math.h has it for you doesn't mean you should skip it here. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { Is this right? I don't _think_ we have a mix of little-endian and big-endian platforms running Chrome, but I'm not entirely certain. I know there are MIPS ports out there, and I see evidence that someone was working on cleaning up big-endian support for MIPS. Does this really need to always flip, or is it more like ntohl()? If the latter, then I suspect we have base/ support for that kind of thing. [Maybe base/big_endian.h ? Or perhaps something in net/?] https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:249: // |current_word_|. I'm finding this confusing. I _think_ what it's saying is that the entire span of bits does not exist in current_word_. But below it calls GetNextBits() to read num_bits_left_in_current_word, which _does_ exist in current_word_ (so I think it will always degenerate to GetBitsFromCurrentWord). Then upper has to be read via GetNextBits() to load a new word. [Is the same possibly true for reading lower? Or would num_bits_left_in_current_word be zero in that case?] https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:39: } There's an expectation below for the decoder not having additional values - do you need an expectation for the decoder having enough values? Or does DECODE_SUCCESS cover that? Will that fire repeatedly if the decoder comes up short? And clearly |word != expected| in case of decode failure. [Perhaps ASSERT_EQ() would handle this case for the decode failure, if it can be used in a helper function like this.] https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); In case of failure this isn't going to give very satisfying diagnostics. It might be nice to have a descriptive value logged with expectation failures. OTOH, if test case #7 fails, I'm not entirely sure how to address it by inspection.
Remove DVLOGs. #include logging.h
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Thanks for the review. PTAL. DVLOGs - removed. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { On 2016/08/12 23:01:49, Nathan Parker wrote: > This should be platform-specific. Maybe use C's htonl(), or something from > chromium/src/base/big_endian.h Waiting for Noe/Alex to confirm. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:29: bytes[3] = (char)(word & mask); On 2016/08/12 23:01:49, Nathan Parker wrote: > Is this ordering endian-dependent as well? No. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:119: base::CheckedNumeric<uint32_t> last_value(first_value); On 2016/08/12 23:01:49, Nathan Parker wrote: > You're casting a int64 to a unit32_t. Is that intentional? Yes, I am not sure why the first_value is a int64 when we only expect uint32s. The reason for this change is to capture the value from the proto as-is and let the NumericChecker code in Chrome raise if the value is actually greater than what can fit in 32 bits. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice.cc:120: std::set<uint32_t> hash_prefixes; On 2016/08/12 23:01:49, Nathan Parker wrote: > This is going to use at least another 24 bytes of RAM per 4-byte hash (three > pointers). I think it'd be better to use a different sorting mechanism, since > we've got several copies of the hashes in RAM already. Let me try what we discussed offline: use the string out to store unsorted hash prefixes using reinterpret magic and sort in-place at the end. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); On 2016/08/12 23:01:49, Nathan Parker wrote: > If this fails, will it tell you which test cast failed? An idea: log a testcase > number when it fails. It'll grok what did not match and I can then grep it. Test case number is useful but even with that, it'd require me to count the instances of RiceDecodingTestInfo to see which one did fail. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:257: EXPECT_EQ(std::string("\x5\0\0\0\fL\x93\xADV\x7F\xF6o\xCEo1\x81", 16), out); On 2016/08/12 23:01:49, Nathan Parker wrote: > How did you generate this? (or was it just based on code output?) Got the input from a unit test for the server. Verified the output manually and then added it as the expected value. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_store.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:205: DVLOG(1) << "Update successful: " << *this; On 2016/08/12 23:01:49, Nathan Parker wrote: > These are fairly verbose -- You should plan to remove most of them once the code > is stable. Ack. Done. Also: http://crbug.com/637468 https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:404: DVLOG(1) << "Failure: ADDITIONS_HAS_EXISTING_PREFIX_FAILURE"; On 2016/08/12 23:01:49, Nathan Parker wrote: > Can you log this further up instead? Just logging the enum # is usually > sufficient, since it should be rarely used. > > Same with those below. Done. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_store.cc:506: DVLOG(1) << "Failure: Unexpected magic number found in file: " On 2016/08/12 23:01:49, Nathan Parker wrote: > These DVLOGs actually seem like more clutter than they're worth, since they just > mean the file is corrupted. We don't need to maintain code to tell a developer > which part of the file is corrupted. > > Same with file version and response type. If someday we start getting those > errors, we can easily add a line to log it. Done. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (left): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:5: #include "base/logging.h" On 2016/08/12 23:25:46, Scott Hess wrote: > I think you still need logging.h for things like NOTREACHED(). Just because > safe_math.h has it for you doesn't mean you should skip it here. I agree. Didn't realize it was needed for that. Added back. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { On 2016/08/12 23:25:47, Scott Hess wrote: > Is this right? I don't _think_ we have a mix of little-endian and big-endian > platforms running Chrome, but I'm not entirely certain. I know there are MIPS > ports out there, and I see evidence that someone was working on cleaning up > big-endian support for MIPS. Does this really need to always flip, or is it > more like ntohl()? If the latter, then I suspect we have base/ support for that > kind of thing. > > [Maybe base/big_endian.h ? Or perhaps something in net/?] Is this flip always needed: Waiting to get a confirmation from the SafeBrowsing team on that. cc'd you on that email. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:249: // |current_word_|. On 2016/08/12 23:25:46, Scott Hess wrote: > I'm finding this confusing. I _think_ what it's saying is that the entire span > of bits does not exist in current_word_. But below it calls GetNextBits() to > read num_bits_left_in_current_word, which _does_ exist in current_word_ (so I > think it will always degenerate to GetBitsFromCurrentWord). Then upper has to > be read via GetNextBits() to load a new word. [Is the same possibly true for > reading lower? Or would num_bits_left_in_current_word be zero in that case?] Let me illustrate with an example: Let's say we want to read 28 bits, but we've already read 23 from current_word_. This means: num_requested_bits = 28 current_word_bit_index_ = 23 num_bits_left_in_current_word = 9 We split the read operation into two parts: read 9 bits, then read 19 bits for a total of 28. lower = GetNextBits(9) => current_word_bit_index_ = 23 + 9 = 32. upper = GetNextBits(19) => but, current_word_bit_index_ == 32 (kMaxBitIndex) => => Read next word into current_word_ and reset current_word_bit_index_ to 0 (see line 233). => Read 19 bits from new current_word_ into upper. => current_word_bit_index_ = 0 + 19 = 19
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); On 2016/08/12 23:25:47, Scott Hess wrote: > In case of failure this isn't going to give very satisfying diagnostics. It > might be nice to have a descriptive value logged with expectation failures. > > OTOH, if test case #7 fails, I'm not entirely sure how to address it by > inspection. The failure will list what did not match and it is easy to find by using grep.
https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:39: } On 2016/08/12 23:25:47, Scott Hess wrote: > There's an expectation below for the decoder not having additional values - do > you need an expectation for the decoder having enough values? Or does > DECODE_SUCCESS cover that? Will that fire repeatedly if the decoder comes up > short? And clearly |word != expected| in case of decode failure. > > [Perhaps ASSERT_EQ() would handle this case for the decode failure, if it can be > used in a helper function like this.] That case returns DECODE_NO_MORE_ENTRIES_FAILURE so is covered.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Use a vector to decode and sort uint32_t values. re-interpret the memory of that vector as raw hashes bytes
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
It feels like there is something being forced in the flippedness area, here. Rice encoding should be happening on a series of fixed-size integers. So the client of the encoder/decoder shouldn't be doing any byte-swapping at all - the integers should go in and come out as integers, not as blobs. Likewise, for local use, there shouldn't be any byte-order involved. For convenience you might encode a series of fixed-size integers into a string blob, but since it's for local use it should all be strictly with local byte order. I'm not entirely certain where things are going awry. You may need to step back one level and look at things with fresh eyes. https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); On 2016/08/13 00:15:14, vakh wrote: > On 2016/08/12 23:01:49, Nathan Parker wrote: > > If this fails, will it tell you which test cast failed? An idea: log a > testcase > > number when it fails. > > It'll grok what did not match and I can then grep it. > Test case number is useful but even with that, it'd require me to count the > instances of RiceDecodingTestInfo to see which one did fail. It seems unlikely that the logged output will match the \x-escaped data. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { On 2016/08/13 00:15:14, vakh wrote: > On 2016/08/12 23:25:47, Scott Hess wrote: > > Is this right? I don't _think_ we have a mix of little-endian and big-endian > > platforms running Chrome, but I'm not entirely certain. I know there are MIPS > > ports out there, and I see evidence that someone was working on cleaning up > > big-endian support for MIPS. Does this really need to always flip, or is it > > more like ntohl()? If the latter, then I suspect we have base/ support for > that > > kind of thing. > > > > [Maybe base/big_endian.h ? Or perhaps something in net/?] > > Is this flip always needed: Waiting to get a confirmation from the SafeBrowsing > team on that. cc'd you on that email. To be clear - I suspect they are just serving network byte order, so you need to use ntohl() or base/big_endian.h . My guess is that they don't rearrange the byte order based on user-agent request, so the right thing to do would be to conditionally flip (such as those two options do). It probably would be preferable to use something like base/big_endian.h so that the data is simply read out right, rather than have separate steps for read-the-data and conditional-flip, though cache hierarchies on modern CPUs may make the difference moot. Actually, if it's little-endian on the wire, then having a read like your prior reader is probably best, since it's not conditional (it simply put the first byte in the low-order bits, and so on). https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:249: // |current_word_|. On 2016/08/13 00:15:14, vakh wrote: > On 2016/08/12 23:25:46, Scott Hess wrote: > > I'm finding this confusing. I _think_ what it's saying is that the entire > span > > of bits does not exist in current_word_. But below it calls GetNextBits() to > > read num_bits_left_in_current_word, which _does_ exist in current_word_ (so I > > think it will always degenerate to GetBitsFromCurrentWord). Then upper has to > > be read via GetNextBits() to load a new word. [Is the same possibly true for > > reading lower? Or would num_bits_left_in_current_word be zero in that case?] > > Let me illustrate with an example: Let's say we want to read 28 bits, but we've > already read 23 from current_word_. This means: > num_requested_bits = 28 > current_word_bit_index_ = 23 > num_bits_left_in_current_word = 9 > > We split the read operation into two parts: read 9 bits, then read 19 bits for a > total of 28. > > lower = GetNextBits(9) > => current_word_bit_index_ = 23 + 9 = 32. > > upper = GetNextBits(19) > => but, current_word_bit_index_ == 32 (kMaxBitIndex) > => => Read next word into current_word_ and reset current_word_bit_index_ to 0 > (see line 233). > => Read 19 bits from new current_word_ into upper. > => current_word_bit_index_ = 0 + 19 = 19 Yes, but the comment is written in terms of current_word_, which is not a thing the local code is using. Maybe try writing a new comment without reference to the current one, to see if a cleaner description can be made. Like "Combine the rest of current_word_ with additional bits from the next word." I think the first call to GetNextBits() should actually be calling GetBitsFromCurrentWord() to eat the remainder of current_word_. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:39: } On 2016/08/13 00:19:44, vakh wrote: > On 2016/08/12 23:25:47, Scott Hess wrote: > > There's an expectation below for the decoder not having additional values - do > > you need an expectation for the decoder having enough values? Or does > > DECODE_SUCCESS cover that? Will that fire repeatedly if the decoder comes up > > short? And clearly |word != expected| in case of decode failure. > > > > [Perhaps ASSERT_EQ() would handle this case for the decode failure, if it can > be > > used in a helper function like this.] > > That case returns DECODE_NO_MORE_ENTRIES_FAILURE so is covered. It's going to return DECODE_NO_MORE_ENTRIES_FAILURE, but then have a cascade of expectation failures. So it might be better to use ASSERT_EQ for that case. https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:132: // https://developers.google.com/safe-browsing/v4/compression The protocol says the data is in network byte order, it doesn't say that it needs to be flipped. If the platform is little-endian, flipping would be incorrect. https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:136: std::sort(out->begin(), out->end()); Whoa! I could see ASSERT_TRUE(std::is_sorted(...)) here, as if these aren't already sorted, I think something is fubar.
On 2016/08/15 21:18:15, Scott Hess wrote: > It feels like there is something being forced in the flippedness area, here. > Rice encoding should be happening on a series of fixed-size integers. So the > client of the encoder/decoder shouldn't be doing any byte-swapping at all - the > integers should go in and come out as integers, not as blobs. Hmm, to clarify - what I mean is that the flipping should be happening when encoding onto the wire and decoding off the wire. If the client of the encoder/decoder is doing the flipping, that implies that something wrong is happening.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Use htonl/ntohl instead of FlipEndianness (but the effect is the same). Use GetBitsFromCurrentWord for lower
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/08/15 21:19:38, Scott Hess wrote: > On 2016/08/15 21:18:15, Scott Hess wrote: > > It feels like there is something being forced in the flippedness area, here. > > Rice encoding should be happening on a series of fixed-size integers. So the > > client of the encoder/decoder shouldn't be doing any byte-swapping at all - > the > > integers should go in and come out as integers, not as blobs. > > Hmm, to clarify - what I mean is that the flipping should be happening when > encoding onto the wire and decoding off the wire. If the client of the > encoder/decoder is doing the flipping, that implies that something wrong is > happening. Yes, I agree it does look strange. [Rest is repition of what I have said elsewhere.] But what's read off the wire is bits, not bytes. The flipping happens after the decoder has produced the uint32_t.
Out-of-order comments! This one should have gone out before: <quote> Yes, I agree it does look strange. [Rest is repition of what I have said elsewhere.] But what's read off the wire is bits, not bytes. The flipping happens after the decoder has produced the uint32_t. </quote> https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/80001/components/safe_browsin... components/safe_browsing_db/v4_rice_unittest.cc:142: VerifyRiceDecoding(it); On 2016/08/15 21:18:15, Scott Hess wrote: > On 2016/08/13 00:15:14, vakh wrote: > > On 2016/08/12 23:01:49, Nathan Parker wrote: > > > If this fails, will it tell you which test cast failed? An idea: log a > > testcase > > > number when it fails. > > > > It'll grok what did not match and I can then grep it. > > Test case number is useful but even with that, it'd require me to count the > > instances of RiceDecodingTestInfo to see which one did fail. > > It seems unlikely that the logged output will match the \x-escaped data. I changed one of the values and the produced error was: ../../components/safe_browsing_db/v4_rice_unittest.cc:39: Failure Value of: word Actual: 922272696 Expected: expected Which is: 924369848 I was then able to grep 924369848 in the file to see which case is failing. However, that requires unique values in the file, which is an unstated assumption so I've added a DVLOG for test case number. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:22: uint32_t FlipEndianness(uint32_t word) { On 2016/08/15 21:18:15, Scott Hess wrote: > On 2016/08/13 00:15:14, vakh wrote: > > On 2016/08/12 23:25:47, Scott Hess wrote: > > > Is this right? I don't _think_ we have a mix of little-endian and > big-endian > > > platforms running Chrome, but I'm not entirely certain. I know there are > MIPS > > > ports out there, and I see evidence that someone was working on cleaning up > > > big-endian support for MIPS. Does this really need to always flip, or is it > > > more like ntohl()? If the latter, then I suspect we have base/ support for > > that > > > kind of thing. > > > > > > [Maybe base/big_endian.h ? Or perhaps something in net/?] > > > > Is this flip always needed: Waiting to get a confirmation from the > SafeBrowsing > > team on that. cc'd you on that email. > > To be clear - I suspect they are just serving network byte order, so you need to > use ntohl() or base/big_endian.h . My guess is that they don't rearrange the > byte order based on user-agent request, so the right thing to do would be to > conditionally flip (such as those two options do). It probably would be > preferable to use something like base/big_endian.h so that the data is simply > read out right, rather than have separate steps for read-the-data and > conditional-flip, though cache hierarchies on modern CPUs may make the > difference moot. > > Actually, if it's little-endian on the wire, then having a read like your prior > reader is probably best, since it's not conditional (it simply put the first > byte in the low-order bits, and so on). Noe -- can you please review this part of the code since you are most aware of the server side implementation? This is what my implementation does for now: 1. decode encoded_data field using Rice decoder. 2. Flip the bits of each output uint32_t value 3. Sort these uint32_t (since flipping the bits destroys sort order). 4. Flip the bits back and reinterpret them as bytes in a string. Can you specifically comment on what parts of this are avoidable and what parts are platform specific? https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:249: // |current_word_|. On 2016/08/15 21:18:15, Scott Hess wrote: > On 2016/08/13 00:15:14, vakh wrote: > > On 2016/08/12 23:25:46, Scott Hess wrote: > > > I'm finding this confusing. I _think_ what it's saying is that the entire > > span > > > of bits does not exist in current_word_. But below it calls GetNextBits() > to > > > read num_bits_left_in_current_word, which _does_ exist in current_word_ (so > I > > > think it will always degenerate to GetBitsFromCurrentWord). Then upper has > to > > > be read via GetNextBits() to load a new word. [Is the same possibly true > for > > > reading lower? Or would num_bits_left_in_current_word be zero in that > case?] > > > > Let me illustrate with an example: Let's say we want to read 28 bits, but > we've > > already read 23 from current_word_. This means: > > num_requested_bits = 28 > > current_word_bit_index_ = 23 > > num_bits_left_in_current_word = 9 > > > > We split the read operation into two parts: read 9 bits, then read 19 bits for > a > > total of 28. > > > > lower = GetNextBits(9) > > => current_word_bit_index_ = 23 + 9 = 32. > > > > upper = GetNextBits(19) > > => but, current_word_bit_index_ == 32 (kMaxBitIndex) > > => => Read next word into current_word_ and reset current_word_bit_index_ to 0 > > (see line 233). > > => Read 19 bits from new current_word_ into upper. > > => current_word_bit_index_ = 0 + 19 = 19 > > Yes, but the comment is written in terms of current_word_, which is not a thing > the local code is using. Maybe try writing a new comment without reference to > the current one, to see if a cleaner description can be made. Like "Combine the > rest of current_word_ with additional bits from the next word." > > I think the first call to GetNextBits() should actually be calling > GetBitsFromCurrentWord() to eat the remainder of current_word_. Done. https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/100001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:39: } On 2016/08/15 21:18:15, Scott Hess wrote: > On 2016/08/13 00:19:44, vakh wrote: > > On 2016/08/12 23:25:47, Scott Hess wrote: > > > There's an expectation below for the decoder not having additional values - > do > > > you need an expectation for the decoder having enough values? Or does > > > DECODE_SUCCESS cover that? Will that fire repeatedly if the decoder comes > up > > > short? And clearly |word != expected| in case of decode failure. > > > > > > [Perhaps ASSERT_EQ() would handle this case for the decode failure, if it > can > > be > > > used in a helper function like this.] > > > > That case returns DECODE_NO_MORE_ENTRIES_FAILURE so is covered. > > It's going to return DECODE_NO_MORE_ENTRIES_FAILURE, but then have a cascade of > expectation failures. So it might be better to use ASSERT_EQ for that case. Done. Please let me know if you meant something else. https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:132: // https://developers.google.com/safe-browsing/v4/compression On 2016/08/15 21:18:15, Scott Hess wrote: > The protocol says the data is in network byte order, it doesn't say that it > needs to be flipped. If the platform is little-endian, flipping would be > incorrect. My machine is little-endian and it does require byte flipping. The encoded data (in the form of a string) passed by the server is read bit-by-bit to produce the Rice-decoded values. I believe, that part of the process is independent of the machine's endianness. Once that decoder has produced a uint32_t value, the only way to match the checksum is to flip those bits (at least on a little-endian machine). I am not sure about the expected behavior on big-endian machines so I'd let Noe comment on that. FWIW, I know that the SafeBrowsing code in Android does this bit flipping unconditionally. https://codereview.chromium.org/2228393003/diff/140001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:136: std::sort(out->begin(), out->end()); On 2016/08/15 21:18:15, Scott Hess wrote: > Whoa! I could see ASSERT_TRUE(std::is_sorted(...)) here, as if these aren't > already sorted, I think something is fubar. These aren't already sorted. That's because of the bit flipping I do just before push_back.
On 2016/08/16 00:21:53, vakh wrote: > On 2016/08/15 21:19:38, Scott Hess wrote: > > On 2016/08/15 21:18:15, Scott Hess wrote: > > > It feels like there is something being forced in the flippedness area, here. > > > > Rice encoding should be happening on a series of fixed-size integers. So > the > > > client of the encoder/decoder shouldn't be doing any byte-swapping at all - > > the > > > integers should go in and come out as integers, not as blobs. > > > > Hmm, to clarify - what I mean is that the flipping should be happening when > > encoding onto the wire and decoding off the wire. If the client of the > > encoder/decoder is doing the flipping, that implies that something wrong is > > happening. > > Yes, I agree it does look strange. > [Rest is repition of what I have said elsewhere.] > But what's read off the wire is bits, not bytes. The flipping happens after the > decoder has produced the uint32_t. That doesn't really change anything, though. A class which decodes a stream of integers should generally decode directly to platform integers. The sense of whether the source data is little-endian or big-endian should be encapsulated within the decoder/encoder, not exposed to the client code. I mean, correct me if I'm wrong - my understanding of this is that what we have is a base value plus a cumulative series of increments from that base value. If that is decoded wrong-endian, then the increments are going to be entirely incorrect! Instead of being a series of small integers which increment through a flat distribution, you would have a series of giant integers with repeated overflow. Which ... which under unsigned arithmetic might just result in the expected output after byte-flipping, I'll have to think on that. I can't see where this could be going wrong in the code, though. Is it possible that the datasets you're using are actually encoded incorrectly?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
On 2016/08/16 at 00:36:13, shess wrote: > On 2016/08/16 00:21:53, vakh wrote: > > On 2016/08/15 21:19:38, Scott Hess wrote: > > > On 2016/08/15 21:18:15, Scott Hess wrote: > > > > It feels like there is something being forced in the flippedness area, here. > > > > > > Rice encoding should be happening on a series of fixed-size integers. So > > the > > > > client of the encoder/decoder shouldn't be doing any byte-swapping at all - > > > the > > > > integers should go in and come out as integers, not as blobs. > > > > > > Hmm, to clarify - what I mean is that the flipping should be happening when > > > encoding onto the wire and decoding off the wire. If the client of the > > > encoder/decoder is doing the flipping, that implies that something wrong is > > > happening. > > > > Yes, I agree it does look strange. > > [Rest is repition of what I have said elsewhere.] > > But what's read off the wire is bits, not bytes. The flipping happens after the > > decoder has produced the uint32_t. > > That doesn't really change anything, though. A class which decodes a stream of integers should generally decode directly to platform integers. The sense of whether the source data is little-endian or big-endian should be encapsulated within the decoder/encoder, not exposed to the client code. > > I mean, correct me if I'm wrong - my understanding of this is that what we have is a base value plus a cumulative series of increments from that base value. If that is decoded wrong-endian, then the increments are going to be entirely incorrect! That's correct. > Instead of being a series of small integers which increment through a flat distribution, you would have a series of giant integers with repeated overflow. That's not possible since I use base::CheckedNumeric<uint32_t> to ensure that overflows don't happen. > Which ... which under unsigned arithmetic might just result in the expected output after byte-flipping, I'll have to think on that. That's not possible (see above), but even if it were, that would be a big coincidence. Especially, if it works correctly repeatedly. > > I can't see where this could be going wrong in the code, though. Is it possible that the datasets you're using are actually encoded incorrectly? I am testing this against production servers that have been serving lists to Android users for months now so it shouldn't be wrong.
On 2016/08/16 at 01:33:28, vakh wrote: > On 2016/08/16 at 00:36:13, shess wrote: > > On 2016/08/16 00:21:53, vakh wrote: > > > On 2016/08/15 21:19:38, Scott Hess wrote: > > > > On 2016/08/15 21:18:15, Scott Hess wrote: > > > > > It feels like there is something being forced in the flippedness area, here. > > > > > > > > Rice encoding should be happening on a series of fixed-size integers. So > > > the > > > > > client of the encoder/decoder shouldn't be doing any byte-swapping at all - > > > > the > > > > > integers should go in and come out as integers, not as blobs. > > > > > > > > Hmm, to clarify - what I mean is that the flipping should be happening when > > > > encoding onto the wire and decoding off the wire. If the client of the > > > > encoder/decoder is doing the flipping, that implies that something wrong is > > > > happening. > > > > > > Yes, I agree it does look strange. > > > [Rest is repition of what I have said elsewhere.] > > > But what's read off the wire is bits, not bytes. The flipping happens after the > > > decoder has produced the uint32_t. > > > > That doesn't really change anything, though. A class which decodes a stream of integers should generally decode directly to platform integers. The sense of whether the source data is little-endian or big-endian should be encapsulated within the decoder/encoder, not exposed to the client code. > > > > I mean, correct me if I'm wrong - my understanding of this is that what we have is a base value plus a cumulative series of increments from that base value. If that is decoded wrong-endian, then the increments are going to be entirely incorrect! > That's correct. > > > Instead of being a series of small integers which increment through a flat distribution, you would have a series of giant integers with repeated overflow. > That's not possible since I use base::CheckedNumeric<uint32_t> to ensure that overflows don't happen. > > > Which ... which under unsigned arithmetic might just result in the expected output after byte-flipping, I'll have to think on that. > That's not possible (see above), but even if it were, that would be a big coincidence. Especially, if it works correctly repeatedly. > > > > > I can't see where this could be going wrong in the code, though. Is it possible that the datasets you're using are actually encoded incorrectly? > I am testing this against production servers that have been serving lists to Android users for months now so it shouldn't be wrong. [Noe has replied to the email separately mentioning why the bit flipping needs to be done.] Also, Chrome only supports little-endian architectures so this is not an issue. See: https://cs.chromium.org/chromium/src/build/build_config.h?l=100 I propose two things: 1. Add a #error for big endian platforms and move ahead with the rest of the implementation as-is? 2. Use the FlipBytes method as I had previously, since that's exactly what htonl/ntohl methods do unconditionally anyway. But with htonl/ntohl, I need to depend on an external header file that's different across platforms.
Use FlipEndianness instead of ntohl/htonl. #error for big endian
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
rebase
https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:28: uint32_t FlipEndianness(uint32_t word) { I feel like there's no point in building in an endianness dependency unless it makes the code faster or a lot cleaner. What's wrong with htonl()? https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:135: // This flipping is done so that the decoded uint32 is interpreted re-flow comment. Make a comment that the ints need to be converted to 4-byte-strings as if they were Big Endian ints, and Chrome runs only on Little Endian. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:149: (*out)[i] = FlipEndianness((*out)[i]); Wait, now I'm confused again. You're flipping, sorting as ints, then flipping again. OH, I think I understand: You're sorting the array as integers, rather than as 4-byte-strings, but you want them to end up in lexigraphical order. I _think_ you could avoid all the flipping by having sort() compare 4-byte strings instead of ints. I'm not sure if that'll be any faster comparing ints is pretty ideal. But the current code has an extra pass through the array. If what I said make sense, you could add a comment that we might want to compare my proposed method's performance, if performance here turns out to be an issue. It might be more clear if you didn't use uint32_t's, but instead had something like typedef uint32_t Hash32Bits std::vector<Hash32Bits>* out Since really once you've done the Rice decoding, you no longer treat them as ints (and shouldn't). https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:269: void V4RiceDecoder::GetBitsFromCurrentWord(unsigned int num_requested_bits, Could this just return x?
Use htonl. Return from GetBitsFromCurrentWord
https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:28: uint32_t FlipEndianness(uint32_t word) { On 2016/08/17 at 20:57:27, Nathan Parker wrote: > I feel like there's no point in building in an endianness dependency unless it makes the code faster or a lot cleaner. What's wrong with htonl()? [I don't feel strongly about it bit have a slight preference so I changed the code.] There are two parts to this statement: 1. endianness dependency: this is irrelevant since Chrome has only been supported on little-endian platforms for the longest time. I am not sure if this code would work correctly on big endian machines so I ifdef'd it out. 2. Nothing wrong with htonl -- It is defined in different header files on different plaforms so defining a simple bit-flipping method that does the same thing seemed cleaner. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:135: // This flipping is done so that the decoded uint32 is interpreted On 2016/08/17 at 20:57:27, Nathan Parker wrote: > re-flow comment. > Make a comment that the ints need to be converted to 4-byte-strings as if they were Big Endian ints, and Chrome runs only on Little Endian. Please see my other comment about this below. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:149: (*out)[i] = FlipEndianness((*out)[i]); On 2016/08/17 at 20:57:27, Nathan Parker wrote: > Wait, now I'm confused again. You're flipping, sorting as ints, then flipping again. > > OH, I think I understand: You're sorting the array as integers, rather than as 4-byte-strings, but you want them to end up in lexigraphical order. I _think_ you could avoid all the flipping by having sort() compare 4-byte strings instead of ints. I'm not sure if that'll be any faster comparing ints is pretty ideal. But the current code has an extra pass through the array. > > > > If what I said make sense, you could add a comment that we might want to compare my proposed method's performance, if performance here turns out to be an issue. > > > It might be more clear if you didn't use uint32_t's, but instead had something like > typedef uint32_t Hash32Bits > std::vector<Hash32Bits>* out > > Since really once you've done the Rice decoding, you no longer treat them as ints (and shouldn't). I think the explanation goes like this. The following program, on a little-endian machine produces: std::vector<uint32_t> v2; v2.reserve(3); v2.push_back(0x41424344); v2.push_back(0); cptr = reinterpret_cast<char*>(v2.data()); std::cout << "as string: " << cptr << std::endl; Output: as string: DCBA This shows that when a string (which is what the server meant to send) as re-interpreted as a uint32_t, it reverses the order of the bits. That's why we need to do this whole bit flipping. The server sends a response. Parse the response as uint32_t values. Flip bytes to reinterpret them correctly. Sort them to get the correct lexicographical order Flip the bytes again, since they need to be consumed as bytes not uint32_t values. I think the current comment describes this process fairly accurately but I am super willing to consider suggestions.
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I think I get the situation this is dealing with. I think it was an unfortunate thing for the server data to have a confusing ordering dependency baked in that way. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:28: uint32_t FlipEndianness(uint32_t word) { On 2016/08/17 20:57:27, Nathan Parker wrote: > I feel like there's no point in building in an endianness dependency unless it > makes the code faster or a lot cleaner. What's wrong with htonl()? I agree with this, if it's reasonable to just do it right, do it right. At this point I wouldn't worry excessively about performance, because there will probably be bigger fish to fry in here. That said, I also support the "If it's not little endian, break the compile" strategy. I'm slightly nervous about doing it as macro checks, rather than some sort of compiler-generated check. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:107: std::vector<uint32_t>* out) { "DecodeBytes" decodes to a vector of uint32_t. Maybe "DecodePrefixes" or something like that. Also this heavily overlaps the DecodeIntegers() function, other than the ordering constraints. I couldn't see an obvious way to steal the underlying storage from the RepeatedField, unfortunately. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:148: for (unsigned i = 0; i < out->size(); i++) { size_t rather than unsigned, here. https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:26: #endif Is this still necessary now that it's using htonl? https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:145: std::sort(out->begin(), out->end()); I think at least part of the confusion is that there's a sorting constraint which is being explained from the bottom up (comments probably added as you discovered things), and maybe it would be cleaner if the header comment for this function described how the results should be sorted. AFAICT, the sort order is because someone elsewhere is going to interpret the raw vector::data() bytes. I'm not making a value judgement on _that_, I'm just saying that if you're going to bake that assumption in here, then you need to describe it because it's a completely non-obvious requirement for someone looking at this code. That said ... I'm also not entirely convinced that this should return a vector of integers, given that there's this odd sorting constraint and someone else is accessing it as raw data. If the only reason for the vector is for unit tests, then IMHO it would be reasonable for the unit tests to decode/flip things for purposes of testing. https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:253: for (unsigned i = 0; i < expected.size(); i++) { size_t here.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
DecodeBytes->DecodePrefixes. More detailed comment.
The CQ bit was checked by vakh@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:28: uint32_t FlipEndianness(uint32_t word) { On 2016/08/17 at 22:42:26, Scott Hess wrote: > On 2016/08/17 20:57:27, Nathan Parker wrote: > > I feel like there's no point in building in an endianness dependency unless it > > makes the code faster or a lot cleaner. What's wrong with htonl()? > > I agree with this, if it's reasonable to just do it right, do it right. At this point I wouldn't worry excessively about performance, because there will probably be bigger fish to fry in here. > Done. > That said, I also support the "If it's not little endian, break the compile" strategy. I'm slightly nervous about doing it as macro checks, rather than some sort of compiler-generated check. I really think that if it should fail, it should fail spectacularly. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:107: std::vector<uint32_t>* out) { On 2016/08/17 at 22:42:26, Scott Hess wrote: > "DecodeBytes" decodes to a vector of uint32_t. Maybe "DecodePrefixes" or something like that. > Done. > Also this heavily overlaps the DecodeIntegers() function, other than the ordering constraints. I couldn't see an obvious way to steal the underlying storage from the RepeatedField, unfortunately. Hmm. https://codereview.chromium.org/2228393003/diff/200001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:148: for (unsigned i = 0; i < out->size(); i++) { On 2016/08/17 at 22:42:26, Scott Hess wrote: > size_t rather than unsigned, here. Done. https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... File components/safe_browsing_db/v4_rice.cc (right): https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:26: #endif On 2016/08/17 at 22:42:27, Scott Hess wrote: > Is this still necessary now that it's using htonl? To be frank, I won't be 100% sure until I observe it. I'd much rather it fails loud and then someone can make it work, than it fails silently and no one notices. https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice.cc:145: std::sort(out->begin(), out->end()); On 2016/08/17 at 22:42:26, Scott Hess wrote: > I think at least part of the confusion is that there's a sorting constraint which is being explained from the bottom up (comments probably added as you discovered things), and maybe it would be cleaner if the header comment for this function described how the results should be sorted. > Done. Let me know if that isn't clear enough for a new reader. > AFAICT, the sort order is because someone elsewhere is going to interpret the raw vector::data() bytes. I'm not making a value judgement on _that_, I'm just saying that if you're going to bake that assumption in here, then you need to describe it because it's a completely non-obvious requirement for someone looking at this code. > Done. > That said ... I'm also not entirely convinced that this should return a vector of integers, given that there's this odd sorting constraint and someone else is accessing it as raw data. If the only reason for the vector is for unit tests, then IMHO it would be reasonable for the unit tests to decode/flip things for purposes of testing. The reason it returns a vector is that I am trying to interpret the same memory block some times as std::vector<uint32_t> and other times as std::string. The other way to do this would be to create a std::string, reserve its size appropriately, and then re-interpret that block of memory as vector<uint32_t>. The problem with that approach is that string::data() returns a const char * pointer, so I cannot modify the memory underneath that string. Of course I could cast away the constness but that's probably not a good idea either. So I am between a rock and a hard place. https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... File components/safe_browsing_db/v4_rice_unittest.cc (right): https://codereview.chromium.org/2228393003/diff/220001/components/safe_browsi... components/safe_browsing_db/v4_rice_unittest.cc:253: for (unsigned i = 0; i < expected.size(); i++) { On 2016/08/17 at 22:42:27, Scott Hess wrote: > size_t here. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
Since most of the major points have been addressed, I am going to commit this but feel free to leave feedback and I'll address it as a follow-up CL.
The CQ bit was checked by vakh@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== PVer4: DecodeHashes needs to sort the output of the Rice decoder Also: 1. Add tests to validate the Rice-decoder implementation. 2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 ========== to ========== PVer4: DecodeHashes needs to sort the output of the Rice decoder Also: 1. Add tests to validate the Rice-decoder implementation. 2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 ==========
Message was sent while issue was closed.
Committed patchset #10 (id:240001)
Message was sent while issue was closed.
Description was changed from ========== PVer4: DecodeHashes needs to sort the output of the Rice decoder Also: 1. Add tests to validate the Rice-decoder implementation. 2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 ========== to ========== PVer4: DecodeHashes needs to sort the output of the Rice decoder Also: 1. Add tests to validate the Rice-decoder implementation. 2. Also, make the code for the case where the requested number of bits is greater than those left in the current_word_ easier to understand by calling GetNextBits recursively twice. BUG=543161, 624567 Committed: https://crrev.com/1670eb02e8172d1750b70c50dbe3fb968e1f365f Cr-Commit-Position: refs/heads/master@{#413027} ==========
Message was sent while issue was closed.
Patchset 10 (id:??) landed as https://crrev.com/1670eb02e8172d1750b70c50dbe3fb968e1f365f Cr-Commit-Position: refs/heads/master@{#413027} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
