Add the goog-unwanted-shavar list to a new SafeBrowsing PrefixSet.
The logic is the same as the logic for the existing PrefixSet for the malware and phishing lists.
BUG=397754
TEST=Navigate to http://uwd.safebrowsingtest.com and confirm that the unwanted (harmful) software interstial is shown.
Committed: https://crrev.com/5bebfd01ae0090f46d8018dbbd66db6412796da9
Cr-Commit-Position: refs/heads/master@{#303978}
Noe PTAL. Instead of creating a new client-side storage location, I added the prefixes to ...
6 years, 2 months ago
(2014-09-26 20:35:39 UTC)
#2
Noe PTAL.
Instead of creating a new client-side storage location, I added the prefixes to
the existing browse_prefix_set since Adrienne desired to re-use the existing
result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what eventually
triggers the creation of the SafeBrowsingBlockingPage she augmented -- in
https://codereview.chromium.org/584433002/ and simply added a new SB_THREAT_TYPE
now returned from this check with this CL.
I'll modify the existing tests if you agree that this is the way to go.
Thanks!
Gab
ping? (or @mattm, is Noe away? Care to take a look, see initial message for ...
6 years, 2 months ago
(2014-10-01 20:08:30 UTC)
#4
ping?
(or @mattm, is Noe away? Care to take a look, see initial message for details)
mattm
On 2014/09/26 20:35:39, gab wrote: > Noe PTAL. > > Instead of creating a new ...
6 years, 2 months ago
(2014-10-01 21:23:22 UTC)
#5
On 2014/09/26 20:35:39, gab wrote:
> Noe PTAL.
>
> Instead of creating a new client-side storage location, I added the prefixes
to
> the existing browse_prefix_set since Adrienne desired to re-use the existing
> result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what
eventually
> triggers the creation of the SafeBrowsingBlockingPage she augmented -- in
> https://codereview.chromium.org/584433002/ and simply added a new
SB_THREAT_TYPE
> now returned from this check with this CL.
>
> I'll modify the existing tests if you agree that this is the way to go.
>
> Thanks!
> Gab
I don't think this will work as-is. Using the same prefixset I think is fine,
but you can't store more than 2 lists in a single SafeBrowsingStoreFile. It may
appear to work but updates will be broken in weird ways.
Scott Hess - ex-Googler
On 2014/10/01 21:23:22, mattm wrote: > On 2014/09/26 20:35:39, gab wrote: > > Noe PTAL. ...
6 years, 2 months ago
(2014-10-03 00:15:56 UTC)
#6
On 2014/10/01 21:23:22, mattm wrote:
> On 2014/09/26 20:35:39, gab wrote:
> > Noe PTAL.
> >
> > Instead of creating a new client-side storage location, I added the prefixes
> to
> > the existing browse_prefix_set since Adrienne desired to re-use the existing
> > result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what
> eventually
> > triggers the creation of the SafeBrowsingBlockingPage she augmented -- in
> > https://codereview.chromium.org/584433002/ and simply added a new
> SB_THREAT_TYPE
> > now returned from this check with this CL.
> >
> > I'll modify the existing tests if you agree that this is the way to go.
> >
> > Thanks!
> > Gab
>
> I don't think this will work as-is. Using the same prefixset I think is fine,
> but you can't store more than 2 lists in a single SafeBrowsingStoreFile. It
may
> appear to work but updates will be broken in weird ways.
Yes, the list id is encoded into the low-order bit of the chunk id. That bit
combines with the file it's in to determine the actual list. Long ago ppl had a
theory that these files might update out of sync so should be separate, but it
turned out there has never been one updated on a different schedule, but nobody
has gotten around to rewriting the file format to something more sane (like a
single file with a table of contents listing the lists directly, perhaps, rather
than insane bit encoding).
Putting additional items into the PrefixSet might work, but it's not entirely
clear that it will make any sense once you wedge it all in. Right now things
map pretty directly from store file to prefix set. Space wise if you have a lot
of prefixes it won't matter (it will be 16 bits plus a fraction either way).
Unified would be a slight win for lookup performance, though.
gab
On 2014/10/03 00:15:56, Scott Hess - On Leave wrote: > On 2014/10/01 21:23:22, mattm wrote: ...
6 years, 2 months ago
(2014-10-03 20:01:53 UTC)
#7
On 2014/10/03 00:15:56, Scott Hess - On Leave wrote:
> On 2014/10/01 21:23:22, mattm wrote:
> > On 2014/09/26 20:35:39, gab wrote:
> > > Noe PTAL.
> > >
> > > Instead of creating a new client-side storage location, I added the
prefixes
> > to
> > > the existing browse_prefix_set since Adrienne desired to re-use the
existing
> > > result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what
> > eventually
> > > triggers the creation of the SafeBrowsingBlockingPage she augmented -- in
> > > https://codereview.chromium.org/584433002/ and simply added a new
> > SB_THREAT_TYPE
> > > now returned from this check with this CL.
> > >
> > > I'll modify the existing tests if you agree that this is the way to go.
> > >
> > > Thanks!
> > > Gab
> >
> > I don't think this will work as-is. Using the same prefixset I think is
fine,
> > but you can't store more than 2 lists in a single SafeBrowsingStoreFile. It
> may
> > appear to work but updates will be broken in weird ways.
>
> Yes, the list id is encoded into the low-order bit of the chunk id. That bit
> combines with the file it's in to determine the actual list. Long ago ppl had
a
> theory that these files might update out of sync so should be separate, but it
> turned out there has never been one updated on a different schedule, but
nobody
> has gotten around to rewriting the file format to something more sane (like a
> single file with a table of contents listing the lists directly, perhaps,
rather
> than insane bit encoding).
>
> Putting additional items into the PrefixSet might work, but it's not entirely
> clear that it will make any sense once you wedge it all in. Right now things
> map pretty directly from store file to prefix set. Space wise if you have a
lot
> of prefixes it won't matter (it will be 16 bits plus a fraction either way).
> Unified would be a slight win for lookup performance, though.
Ah I see... :-(
I was hoping to put this list in the same PrefixSet for lookup performance and
code simplicity.
It looks like code simplicity wise it might be better for now to have a second
PrefixSet for goog-unwanted-shavar, stored as a separate file (otherwise the
code would have to handle two files merging into a single in-memory
PrefixSet...).
This does mean that SafeBrowsingResourceThrottle::CheckUrl() [1] will have to
perform another check after CheckBrowseUrl(); some new CheckUnwantedUrl() call
on the SafeBrowsingDatabaseManager :-(.
[1]
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/ren...
I'm going on leave now and won't be able to get back to this until the end of
October unfortunately...
Best,
Gab
mattm
On 2014/10/03 20:01:53, gab - On Leave wrote: > On 2014/10/03 00:15:56, Scott Hess - ...
6 years, 2 months ago
(2014-10-03 20:37:51 UTC)
#8
On 2014/10/03 20:01:53, gab - On Leave wrote:
> On 2014/10/03 00:15:56, Scott Hess - On Leave wrote:
> > On 2014/10/01 21:23:22, mattm wrote:
> > > On 2014/09/26 20:35:39, gab wrote:
> > > > Noe PTAL.
> > > >
> > > > Instead of creating a new client-side storage location, I added the
> prefixes
> > > to
> > > > the existing browse_prefix_set since Adrienne desired to re-use the
> existing
> > > > result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what
> > > eventually
> > > > triggers the creation of the SafeBrowsingBlockingPage she augmented --
in
> > > > https://codereview.chromium.org/584433002/ and simply added a new
> > > SB_THREAT_TYPE
> > > > now returned from this check with this CL.
> > > >
> > > > I'll modify the existing tests if you agree that this is the way to go.
> > > >
> > > > Thanks!
> > > > Gab
> > >
> > > I don't think this will work as-is. Using the same prefixset I think is
> fine,
> > > but you can't store more than 2 lists in a single SafeBrowsingStoreFile.
It
> > may
> > > appear to work but updates will be broken in weird ways.
> >
> > Yes, the list id is encoded into the low-order bit of the chunk id. That
bit
> > combines with the file it's in to determine the actual list. Long ago ppl
had
> a
> > theory that these files might update out of sync so should be separate, but
it
> > turned out there has never been one updated on a different schedule, but
> nobody
> > has gotten around to rewriting the file format to something more sane (like
a
> > single file with a table of contents listing the lists directly, perhaps,
> rather
> > than insane bit encoding).
> >
> > Putting additional items into the PrefixSet might work, but it's not
entirely
> > clear that it will make any sense once you wedge it all in. Right now
things
> > map pretty directly from store file to prefix set. Space wise if you have a
> lot
> > of prefixes it won't matter (it will be 16 bits plus a fraction either way).
> > Unified would be a slight win for lookup performance, though.
>
> Ah I see... :-(
>
> I was hoping to put this list in the same PrefixSet for lookup performance and
> code simplicity.
>
> It looks like code simplicity wise it might be better for now to have a second
> PrefixSet for goog-unwanted-shavar, stored as a separate file (otherwise the
> code would have to handle two files merging into a single in-memory
> PrefixSet...).
>
> This does mean that SafeBrowsingResourceThrottle::CheckUrl() [1] will have to
> perform another check after CheckBrowseUrl(); some new CheckUnwantedUrl() call
> on the SafeBrowsingDatabaseManager :-(.
>
> [1]
>
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/ren...
>
> I'm going on leave now and won't be able to get back to this until the end of
> October unfortunately...
>
> Best,
> Gab
What happened to the idea of using metadata in the existing phishing list
instead of adding a new list?
gab
On 2014/10/03 20:37:51, mattm wrote: > On 2014/10/03 20:01:53, gab - On Leave wrote: > ...
6 years, 2 months ago
(2014-10-03 20:41:43 UTC)
#9
On 2014/10/03 20:37:51, mattm wrote:
> On 2014/10/03 20:01:53, gab - On Leave wrote:
> > On 2014/10/03 00:15:56, Scott Hess - On Leave wrote:
> > > On 2014/10/01 21:23:22, mattm wrote:
> > > > On 2014/09/26 20:35:39, gab wrote:
> > > > > Noe PTAL.
> > > > >
> > > > > Instead of creating a new client-side storage location, I added the
> > prefixes
> > > > to
> > > > > the existing browse_prefix_set since Adrienne desired to re-use the
> > existing
> > > > > result from SafeBrowsingResourceThrottle::CheckUrl() -- which is what
> > > > eventually
> > > > > triggers the creation of the SafeBrowsingBlockingPage she augmented --
> in
> > > > > https://codereview.chromium.org/584433002/ and simply added a new
> > > > SB_THREAT_TYPE
> > > > > now returned from this check with this CL.
> > > > >
> > > > > I'll modify the existing tests if you agree that this is the way to
go.
> > > > >
> > > > > Thanks!
> > > > > Gab
> > > >
> > > > I don't think this will work as-is. Using the same prefixset I think is
> > fine,
> > > > but you can't store more than 2 lists in a single SafeBrowsingStoreFile.
> It
> > > may
> > > > appear to work but updates will be broken in weird ways.
> > >
> > > Yes, the list id is encoded into the low-order bit of the chunk id. That
> bit
> > > combines with the file it's in to determine the actual list. Long ago ppl
> had
> > a
> > > theory that these files might update out of sync so should be separate,
but
> it
> > > turned out there has never been one updated on a different schedule, but
> > nobody
> > > has gotten around to rewriting the file format to something more sane
(like
> a
> > > single file with a table of contents listing the lists directly, perhaps,
> > rather
> > > than insane bit encoding).
> > >
> > > Putting additional items into the PrefixSet might work, but it's not
> entirely
> > > clear that it will make any sense once you wedge it all in. Right now
> things
> > > map pretty directly from store file to prefix set. Space wise if you have
a
> > lot
> > > of prefixes it won't matter (it will be 16 bits plus a fraction either
way).
>
> > > Unified would be a slight win for lookup performance, though.
> >
> > Ah I see... :-(
> >
> > I was hoping to put this list in the same PrefixSet for lookup performance
and
> > code simplicity.
> >
> > It looks like code simplicity wise it might be better for now to have a
second
> > PrefixSet for goog-unwanted-shavar, stored as a separate file (otherwise the
> > code would have to handle two files merging into a single in-memory
> > PrefixSet...).
> >
> > This does mean that SafeBrowsingResourceThrottle::CheckUrl() [1] will have
to
> > perform another check after CheckBrowseUrl(); some new CheckUnwantedUrl()
call
> > on the SafeBrowsingDatabaseManager :-(.
> >
> > [1]
> >
>
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/ren...
> >
> > I'm going on leave now and won't be able to get back to this until the end
of
> > October unfortunately...
> >
> > Best,
> > Gab
>
> What happened to the idea of using metadata in the existing phishing list
> instead of adding a new list?
Hmm, not sure, I was only looped into this when the new list was already served
by Google's Safe Browsing servers so I assumed this was the way to go?
+Noe/Moheeb for insights.
noé
On 2014/10/03 20:41:43, gab - On Leave wrote: > On 2014/10/03 20:37:51, mattm wrote: > ...
6 years, 2 months ago
(2014-10-03 21:52:30 UTC)
#10
On 2014/10/03 20:41:43, gab - On Leave wrote:
> On 2014/10/03 20:37:51, mattm wrote:
> > On 2014/10/03 20:01:53, gab - On Leave wrote:
> > > On 2014/10/03 00:15:56, Scott Hess - On Leave wrote:
> > > > On 2014/10/01 21:23:22, mattm wrote:
> > > > > On 2014/09/26 20:35:39, gab wrote:
> > > > > > Noe PTAL.
> > > > > >
> > > > > > Instead of creating a new client-side storage location, I added the
> > > prefixes
> > > > > to
> > > > > > the existing browse_prefix_set since Adrienne desired to re-use the
> > > existing
> > > > > > result from SafeBrowsingResourceThrottle::CheckUrl() -- which is
what
> > > > > eventually
> > > > > > triggers the creation of the SafeBrowsingBlockingPage she augmented
--
> > in
> > > > > > https://codereview.chromium.org/584433002/ and simply added a new
> > > > > SB_THREAT_TYPE
> > > > > > now returned from this check with this CL.
> > > > > >
> > > > > > I'll modify the existing tests if you agree that this is the way to
> go.
> > > > > >
> > > > > > Thanks!
> > > > > > Gab
> > > > >
> > > > > I don't think this will work as-is. Using the same prefixset I think
is
> > > fine,
> > > > > but you can't store more than 2 lists in a single
SafeBrowsingStoreFile.
> > It
> > > > may
> > > > > appear to work but updates will be broken in weird ways.
> > > >
> > > > Yes, the list id is encoded into the low-order bit of the chunk id.
That
> > bit
> > > > combines with the file it's in to determine the actual list. Long ago
ppl
> > had
> > > a
> > > > theory that these files might update out of sync so should be separate,
> but
> > it
> > > > turned out there has never been one updated on a different schedule, but
> > > nobody
> > > > has gotten around to rewriting the file format to something more sane
> (like
> > a
> > > > single file with a table of contents listing the lists directly,
perhaps,
> > > rather
> > > > than insane bit encoding).
> > > >
> > > > Putting additional items into the PrefixSet might work, but it's not
> > entirely
> > > > clear that it will make any sense once you wedge it all in. Right now
> > things
> > > > map pretty directly from store file to prefix set. Space wise if you
have
> a
> > > lot
> > > > of prefixes it won't matter (it will be 16 bits plus a fraction either
> way).
> >
> > > > Unified would be a slight win for lookup performance, though.
> > >
> > > Ah I see... :-(
> > >
> > > I was hoping to put this list in the same PrefixSet for lookup performance
> and
> > > code simplicity.
> > >
> > > It looks like code simplicity wise it might be better for now to have a
> second
> > > PrefixSet for goog-unwanted-shavar, stored as a separate file (otherwise
the
> > > code would have to handle two files merging into a single in-memory
> > > PrefixSet...).
> > >
> > > This does mean that SafeBrowsingResourceThrottle::CheckUrl() [1] will have
> to
> > > perform another check after CheckBrowseUrl(); some new CheckUnwantedUrl()
> call
> > > on the SafeBrowsingDatabaseManager :-(.
> > >
> > > [1]
> > >
> >
>
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/ren...
> > >
> > > I'm going on leave now and won't be able to get back to this until the end
> of
> > > October unfortunately...
> > >
> > > Best,
> > > Gab
> >
> > What happened to the idea of using metadata in the existing phishing list
> > instead of adding a new list?
>
> Hmm, not sure, I was only looped into this when the new list was already
served
> by Google's Safe Browsing servers so I assumed this was the way to go?
>
> +Noe/Moheeb for insights.
For social engineering & phishing we have one list (existing phishing list).
For unwanted software pages we have this new list.
The main reason is because we didn't want all other Safe Browsing users to have
to download the unwanted software list for nothing. Firefox and Safari aren't
planning on using that new list anytime soon.
gab
Patchset #5 (id:80001) has been deleted
6 years, 1 month ago
(2014-11-05 19:44:33 UTC)
#11
Patchset #5 (id:80001) has been deleted
gab
Patchset #5 (id:100001) has been deleted
6 years, 1 month ago
(2014-11-05 19:59:44 UTC)
#12
Matt PTAL, this is still WIP, but I'd like your early feedback as we are ...
6 years, 1 month ago
(2014-11-05 20:10:55 UTC)
#15
Matt PTAL, this is still WIP, but I'd like your early feedback as we are under a
tight deadline.
We switched from re-using the malware/phishing PrefixSet to creating a brand new
one for the unwanted list only.
Things left to do/figure out are highlighted by //FIXME comments.
I'd also like to expand testing further.
Thanks!
Gab
mattm
Haven't looked through the whole thing yet, but one problem jumped out. https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc File chrome/browser/safe_browsing/database_manager.cc ...
6 years, 1 month ago
(2014-11-06 01:06:11 UTC)
#16
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc File chrome/browser/safe_browsing/database_manager.cc (right): https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc#newcode479 chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check); On 2014/11/06 01:06:11, mattm wrote: > This is ...
6 years, 1 month ago
(2014-11-06 01:22:36 UTC)
#17
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check);
On 2014/11/06 01:06:11, mattm wrote:
> This is going to be a problem if both browse_prefix_match and
> unwanted_prefix_match are true. The client isn't expecting to get two
callbacks
> for the same check.
Actually there should be no need for the separate checks, a single gethash check
always gets results for all lists, so you can have a single "if
(browse_prefix_match || unwanted_prefix_match)" since expected_threats contains
all three types.
mattm
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/safe_browsing_database.h File chrome/browser/safe_browsing/safe_browsing_database.h (right): https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/safe_browsing_database.h#newcode517 chrome/browser/safe_browsing/safe_browsing_database.h:517: //FIXME get one of these for uwslist Given my ...
6 years, 1 month ago
(2014-11-06 01:32:36 UTC)
#18
Thanks for the early feedback, reply/question below. Cheers, Gab https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc File chrome/browser/safe_browsing/database_manager.cc (right): https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc#newcode479 chrome/browser/safe_browsing/database_manager.cc:479: ...
6 years, 1 month ago
(2014-11-06 17:47:47 UTC)
#19
Thanks for the early feedback, reply/question below.
Cheers,
Gab
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check);
On 2014/11/06 01:22:36, mattm wrote:
> On 2014/11/06 01:06:11, mattm wrote:
> > This is going to be a problem if both browse_prefix_match and
> > unwanted_prefix_match are true. The client isn't expecting to get two
> callbacks
> > for the same check.
>
> Actually there should be no need for the separate checks, a single gethash
check
> always gets results for all lists, so you can have a single "if
> (browse_prefix_match || unwanted_prefix_match)" since expected_threats
contains
> all three types.
I caught on to do the same thing yesterday after sending this, I decided to
split it into two checks made by the client
SafeBrowsingDatabaseManager::CheckBrowseUrl and
SafeBrowsingDatabaseManager::CheckUnwantedSoftwareUrl.
Then the client (SafeBrowsingResourceThrottle) expects two callbacks and
forwards the most severe onwards (or the most severe to-date in the event of a
timeout).
Does that make sense?
I don't quite see how we could check two lists in a single SafeBrowsingCheck
here otherwise?
gab
PTAL at the latest patch set (6) and then we should chat in person about ...
6 years, 1 month ago
(2014-11-07 00:12:09 UTC)
#20
PTAL at the latest patch set (6) and then we should chat in person about how
this should all work.
Thanks!
Gab
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check);
On 2014/11/06 17:47:46, gab wrote:
> On 2014/11/06 01:22:36, mattm wrote:
> > On 2014/11/06 01:06:11, mattm wrote:
> > > This is going to be a problem if both browse_prefix_match and
> > > unwanted_prefix_match are true. The client isn't expecting to get two
> > callbacks
> > > for the same check.
> >
> > Actually there should be no need for the separate checks, a single gethash
> check
> > always gets results for all lists, so you can have a single "if
> > (browse_prefix_match || unwanted_prefix_match)" since expected_threats
> contains
> > all three types.
>
> I caught on to do the same thing yesterday after sending this, I decided to
> split it into two checks made by the client
> SafeBrowsingDatabaseManager::CheckBrowseUrl and
> SafeBrowsingDatabaseManager::CheckUnwantedSoftwareUrl.
> Then the client (SafeBrowsingResourceThrottle) expects two callbacks and
> forwards the most severe onwards (or the most severe to-date in the event of a
> timeout).
>
> Does that make sense?
Implementation in patch set 6.
>
> I don't quite see how we could check two lists in a single SafeBrowsingCheck
> here otherwise?
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database.h (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.h:517: //FIXME get one of
these for uwslist
On 2014/11/06 01:32:36, mattm wrote:
> Given my previous comment about gethash always returning results for all
lists,
> I think it makes more sense to only have a single gethash cache.
I don't think that's correct, if I use the same cache for both, then a prefix
hit in the UNWANTED list is cached and when a query is later made for the
MALWARE list, we get a cache hit...
I tested this scenario and hit this DCHECK [1]; perhaps the DCHECK is incorrect
and we should adjust, but as-is this doesn't work with a single cache.
[1]
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/saf...
mattm
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc File chrome/browser/safe_browsing/database_manager.cc (right): https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_browsing/database_manager.cc#newcode479 chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check); On 2014/11/06 17:47:46, gab wrote: > On 2014/11/06 ...
6 years, 1 month ago
(2014-11-07 00:27:09 UTC)
#21
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check);
On 2014/11/06 17:47:46, gab wrote:
> On 2014/11/06 01:22:36, mattm wrote:
> > On 2014/11/06 01:06:11, mattm wrote:
> > > This is going to be a problem if both browse_prefix_match and
> > > unwanted_prefix_match are true. The client isn't expecting to get two
> > callbacks
> > > for the same check.
> >
> > Actually there should be no need for the separate checks, a single gethash
> check
> > always gets results for all lists, so you can have a single "if
> > (browse_prefix_match || unwanted_prefix_match)" since expected_threats
> contains
> > all three types.
>
> I caught on to do the same thing yesterday after sending this, I decided to
> split it into two checks made by the client
> SafeBrowsingDatabaseManager::CheckBrowseUrl and
> SafeBrowsingDatabaseManager::CheckUnwantedSoftwareUrl.
> Then the client (SafeBrowsingResourceThrottle) expects two callbacks and
> forwards the most severe onwards (or the most severe to-date in the event of a
> timeout).
>
> Does that make sense?
>
> I don't quite see how we could check two lists in a single SafeBrowsingCheck
> here otherwise?
SafeBrowsingCheck actually checks all lists, then filters the results based on
expected_threats. So you just combine the prefix_hits and cache_hits of the two
ContainsBrowseUrl and ContainsUWSUrl.
The one niggle is the "check_type" of SafeBrowsingCheck, but this is only used
to decide which callback to call, and as the patch to
SafeBrowsingDatabaseManager::Client::OnSafeBrowsingResult above shows, you call
the same method in either case. Therefore using a single SafeBrowsingCheck with
check_type=MALWARE should be fine.
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database.h (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.h:517: //FIXME get one of
these for uwslist
On 2014/11/07 00:12:09, gab wrote:
> On 2014/11/06 01:32:36, mattm wrote:
> > Given my previous comment about gethash always returning results for all
> lists,
> > I think it makes more sense to only have a single gethash cache.
>
> I don't think that's correct, if I use the same cache for both, then a prefix
> hit in the UNWANTED list is cached and when a query is later made for the
> MALWARE list, we get a cache hit...
>
> I tested this scenario and hit this DCHECK [1]; perhaps the DCHECK is
incorrect
> and we should adjust, but as-is this doesn't work with a single cache.
>
> [1]
>
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/saf...
I believe that DCHECK is wrong, the server gethash request always returns
results for all lists, and this could already happen even if you are only trying
to query for one list, just luck that it doesn't.
gab
Alright, I think patch set 8 is now a working solution! PTAL, I'll work on ...
6 years, 1 month ago
(2014-11-07 18:54:17 UTC)
#22
Alright, I think patch set 8 is now a working solution!
PTAL, I'll work on adding more tests in the mean time.
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:479: checks_.insert(check);
On 2014/11/07 00:27:09, mattm wrote:
> On 2014/11/06 17:47:46, gab wrote:
> > On 2014/11/06 01:22:36, mattm wrote:
> > > On 2014/11/06 01:06:11, mattm wrote:
> > > > This is going to be a problem if both browse_prefix_match and
> > > > unwanted_prefix_match are true. The client isn't expecting to get two
> > > callbacks
> > > > for the same check.
> > >
> > > Actually there should be no need for the separate checks, a single gethash
> > check
> > > always gets results for all lists, so you can have a single "if
> > > (browse_prefix_match || unwanted_prefix_match)" since expected_threats
> > contains
> > > all three types.
> >
> > I caught on to do the same thing yesterday after sending this, I decided to
> > split it into two checks made by the client
> > SafeBrowsingDatabaseManager::CheckBrowseUrl and
> > SafeBrowsingDatabaseManager::CheckUnwantedSoftwareUrl.
> > Then the client (SafeBrowsingResourceThrottle) expects two callbacks and
> > forwards the most severe onwards (or the most severe to-date in the event of
a
> > timeout).
> >
> > Does that make sense?
> >
> > I don't quite see how we could check two lists in a single SafeBrowsingCheck
> > here otherwise?
>
> SafeBrowsingCheck actually checks all lists, then filters the results based on
> expected_threats. So you just combine the prefix_hits and cache_hits of the
two
> ContainsBrowseUrl and ContainsUWSUrl.
>
> The one niggle is the "check_type" of SafeBrowsingCheck, but this is only used
> to decide which callback to call, and as the patch to
> SafeBrowsingDatabaseManager::Client::OnSafeBrowsingResult above shows, you
call
> the same method in either case. Therefore using a single SafeBrowsingCheck
with
> check_type=MALWARE should be fine.
I see, as discussed yesterday, makes sense. See latest update.
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database.h (right):
https://codereview.chromium.org/611603002/diff/120001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.h:517: //FIXME get one of
these for uwslist
On 2014/11/07 00:27:09, mattm wrote:
> On 2014/11/07 00:12:09, gab wrote:
> > On 2014/11/06 01:32:36, mattm wrote:
> > > Given my previous comment about gethash always returning results for all
> > lists,
> > > I think it makes more sense to only have a single gethash cache.
> >
> > I don't think that's correct, if I use the same cache for both, then a
prefix
> > hit in the UNWANTED list is cached and when a query is later made for the
> > MALWARE list, we get a cache hit...
> >
> > I tested this scenario and hit this DCHECK [1]; perhaps the DCHECK is
> incorrect
> > and we should adjust, but as-is this doesn't work with a single cache.
> >
> > [1]
> >
>
https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/saf...
>
> I believe that DCHECK is wrong, the server gethash request always returns
> results for all lists, and this could already happen even if you are only
trying
> to query for one list, just luck that it doesn't.
I see, DCHECK removed.
I can now visit uwd.safebrowsingtest.com and get the unwanted interstitial
multiple times with no problems.
gab
Patchset #9 (id:200001) has been deleted
6 years, 1 month ago
(2014-11-10 04:52:11 UTC)
#23
Patchset #9 (id:200001) has been deleted
mattm
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_browsing/database_manager.cc File chrome/browser/safe_browsing/database_manager.cc (right): https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_browsing/database_manager.cc#newcode455 chrome/browser/safe_browsing/database_manager.cc:455: unwanted_cache_hits.end()); should avoid duplicates in the merged lists https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_browsing/safe_browsing_database.cc ...
6 years, 1 month ago
(2014-11-11 01:29:11 UTC)
#24
Alright, tests fixed and comments addressed. I'd say I'm finally confident that patch set 11 ...
6 years, 1 month ago
(2014-11-11 23:39:12 UTC)
#25
Alright, tests fixed and comments addressed.
I'd say I'm finally confident that patch set 11 is working as intended.
Thanks a lot for your thorough review thus far: catching potentially subtle
synchronization bugs.
Cheers!
Gab
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:455:
unwanted_cache_hits.end());
On 2014/11/11 01:29:10, mattm wrote:
> should avoid duplicates in the merged lists
Done, added STL logic and added API requirements which already happened to be
true in the impl (unique+sorted prefix_hits) to get rid of duplicates here. I'll
refactor this in a follow-up CL to avoid so much dependency on the
inner-workings of SafeBrowsingDatabase.
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database.cc (right):
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.cc:637:
FAILURE_UNWANTED_SOFTWARE_PREFIX_SET_READ);
On 2014/11/11 01:29:10, mattm wrote:
> I guess this should be protected by lookup_lock too (probably not needed, as
the
> comment above says, but should be consistent).
Done (and moved it up to have it contextually close to similar code).
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.cc:669: return
PrefixSetContainsUrl(url, browse_prefix_set_.get(), prefix_hits,
On 2014/11/11 01:29:10, mattm wrote:
> the browse_prefix_set_.get() and unwanted_software_prefix_set_.get() needs to
be
> done inside (the same) lookup_lock_ that protects the access, which is the
> reason for the odd ordering of the code in the old version.
Ah I see makes sense now :-)! Tweaked to make it so.
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.cc:1335: // |prefix_set|,
there is no need to lock.
On 2014/11/11 01:29:10, mattm wrote:
> Comment is (even more) confusing now that prefix_set is a local variable.
> I guess it should at least say |*prefix_set|.
Agreed, re-worded comment.
I also think we can make this code better by using scoped_ptr<const PrefixSet>
instead of scoped_ptr<PrefixSet> to store PrefixSets to have stronger guarantees
that nobody else is touching them, rather than assuming everybody we hand them
to plays by the rules...
I'll follow up with a CL for this.
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database_unittest.cc (right):
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database_unittest.cc:493: LOG(INFO)
<< "Testing " << test_case.test_list_name;
On 2014/11/11 01:29:11, mattm wrote:
> use SCOPED_TRACE
Oh nice! Didn't know about SCOPED_TRACE, this is exactly what I want :-)!
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_service_browsertest.cc (right):
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_service_browsertest.cc:902:
SetupResponseForUrl(bad_url, full_hash_result);
On 2014/11/11 01:29:11, mattm wrote:
> Looks like SetupResponseForUrl just overwrites the previous response, not adds
> to it.
Right, I caught on to that as well while doing further testing (when I
originally wrote CheckUnwantedUrl I thought the last bit would fail, but it
surprisingly didn't... which is why I wrote the CheckBrowseUrl test to confirm I
wasn't crazy but that one failed... which is when I realized the test was hiding
the real behavior with this. After fixing the test: CheckUnwantedUrl failed as
originally expected and then I added product logic to extract the severest
threat as desired).
mattm
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_browsing/safe_browsing_database.cc File chrome/browser/safe_browsing/safe_browsing_database.cc (right): https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_browsing/safe_browsing_database.cc#newcode1335 chrome/browser/safe_browsing/safe_browsing_database.cc:1335: // |prefix_set|, there is no need to lock. On ...
6 years, 1 month ago
(2014-11-12 03:24:31 UTC)
#26
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/safe_browsing_database.cc (right):
https://codereview.chromium.org/611603002/diff/220001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/safe_browsing_database.cc:1335: // |prefix_set|,
there is no need to lock.
On 2014/11/11 23:39:11, gab wrote:
> On 2014/11/11 01:29:10, mattm wrote:
> > Comment is (even more) confusing now that prefix_set is a local variable.
> > I guess it should at least say |*prefix_set|.
>
> Agreed, re-worded comment.
>
> I also think we can make this code better by using scoped_ptr<const PrefixSet>
> instead of scoped_ptr<PrefixSet> to store PrefixSets to have stronger
guarantees
> that nobody else is touching them, rather than assuming everybody we hand them
> to plays by the rules...
>
> I'll follow up with a CL for this.
Sounds reasonable, though I don't think it completely guarantees it (would still
depend on no other thread changing the scoped_ptr itself).
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:78:
safe_browsing_util::ListType severest_threat = safe_browsing_util::INVALID;
name is a bit confusing since it's not used for the severest threats actually.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:485:
std::vector<SBFullHashResult> other_cache_hits;
unused_cache_hits ?
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:492: // format are at least
caught by some bots...
It's not win specific, but it is padding, yes. Ex you can see if fails in the
win x64 bot also.
I don't think there's a good way to do this (you could define your own copy of
what you expect the struct to be, and then compare size of that...). But see my
comment below, anyway.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:503: }));
Technically the cache hits could be different, since they will be run at
slightly different times, and thus the later one might ignore some cache entries
as being too old. So it's probably not a good idea to DCHECK it. I think it is
okay to just always use the first one, though.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:1052: IsExpectedThreat(threat,
check->expected_threats)) {
Hmm, I suppose ideally the severest threat should only look for one of the
expected threat list, otherwise we might ignore results because it just happened
to return the value for some threat we aren't looking for, even if the one we
were looking for was also in the list. But I guess that problem already exists,
so up to you if you want to address in this CL.
6 years, 1 month ago
(2014-11-12 16:05:08 UTC)
#27
Comments addressed, PTAL.
Thanks,
Gab
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
File chrome/browser/safe_browsing/database_manager.cc (right):
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:78:
safe_browsing_util::ListType severest_threat = safe_browsing_util::INVALID;
On 2014/11/12 03:24:30, mattm wrote:
> name is a bit confusing since it's not used for the severest threats actually.
Done.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:485:
std::vector<SBFullHashResult> other_cache_hits;
On 2014/11/12 03:24:30, mattm wrote:
> unused_cache_hits ?
Done.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:492: // format are at least
caught by some bots...
On 2014/11/12 03:24:30, mattm wrote:
> It's not win specific, but it is padding, yes. Ex you can see if fails in the
> win x64 bot also.
>
> I don't think there's a good way to do this (you could define your own copy of
> what you expect the struct to be, and then compare size of that...). But see
my
> comment below, anyway.
Removed.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:503: }));
On 2014/11/12 03:24:30, mattm wrote:
> Technically the cache hits could be different, since they will be run at
> slightly different times, and thus the later one might ignore some cache
entries
> as being too old. So it's probably not a good idea to DCHECK it. I think it is
> okay to just always use the first one, though.
Good catch, removed.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:509:
DCHECK(std::is_sorted(unwanted_prefix_hits.begin(),
Also removed these two checks as std::is_sorted is a C++11 STL feature which are
not yet supported in Chromium.
Already updated the API definition to enforce this by contract anyways.
https://codereview.chromium.org/611603002/diff/300001/chrome/browser/safe_bro...
chrome/browser/safe_browsing/database_manager.cc:1052: IsExpectedThreat(threat,
check->expected_threats)) {
On 2014/11/12 03:24:30, mattm wrote:
> Hmm, I suppose ideally the severest threat should only look for one of the
> expected threat list, otherwise we might ignore results because it just
happened
> to return the value for some threat we aren't looking for, even if the one we
> were looking for was also in the list. But I guess that problem already
exists,
> so up to you if you want to address in this CL.
Good point. Since this is an existing problem though I added a TODO and will
address in a follow-up CL.
mattm
lgtm with nits https://codereview.chromium.org/611603002/diff/320001/chrome/browser/safe_browsing/safe_browsing_database.cc File chrome/browser/safe_browsing/safe_browsing_database.cc (right): https://codereview.chromium.org/611603002/diff/320001/chrome/browser/safe_browsing/safe_browsing_database.cc#newcode258 chrome/browser/safe_browsing/safe_browsing_database.cc:258: void UpdateChunkRangesForBrowseLists(SafeBrowsingStore* store, I think the ...
6 years, 1 month ago
(2014-11-12 22:26:57 UTC)
#28
https://codereview.chromium.org/611603002/diff/340001/chrome/browser/safe_browsing/safe_browsing_database.h File chrome/browser/safe_browsing/safe_browsing_database.h (right): https://codereview.chromium.org/611603002/diff/340001/chrome/browser/safe_browsing/safe_browsing_database.h#newcode275 chrome/browser/safe_browsing/safe_browsing_database.h:275: FAILURE_UNWANTED_SOFTWARE_DATABASE_UPDATE_BEGIN, I think you also need to update histograms.xml ...
5 years, 9 months ago
(2015-02-27 21:37:54 UTC)
#35
Issue 611603002: Add the goog-unwanted-shavar list to a new SafeBrowsing PrefixSet.
(Closed)
Created 6 years, 2 months ago by gab
Modified 5 years, 9 months ago
Reviewers: noé, mattm, Alexei Svitkine (slow)
Base URL: https://chromium.googlesource.com/chromium/src.git@master
Comments: 40