|
|
DescriptionAdding a SQL implementation of the backing store for previews opt outs
Previews maintains opt outs to determine eligibility of a navigation to
a domain. To persist this information, a SQLite database is created to
store opt out choice, host name, preview type, and time. This portion of
the implementation only implements creating the table, and returning an
empty result to PreviewBlackList (after running a real query).
BUG=653730
Committed: https://crrev.com/2ae78ccf536394379900253a4099b5e089b07084
Cr-Commit-Position: refs/heads/master@{#424319}
Patch Set 1 #Patch Set 2 : formtatting #
Total comments: 4
Patch Set 3 : Split black list initialization and construction #
Total comments: 8
Patch Set 4 : tbansal comments #
Total comments: 4
Patch Set 5 : tbansal comments #
Total comments: 34
Patch Set 6 : rebase #
Total comments: 4
Patch Set 7 : shess comments #
Total comments: 2
Patch Set 8 : rebase #
Total comments: 10
Patch Set 9 : shess comments #
Total comments: 12
Patch Set 10 : rebase #Patch Set 11 : shess comments #Patch Set 12 : table change #
Total comments: 1
Depends on Patchset: Messages
Total messages: 83 (59 generated)
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
ryansturm@chromium.org changed reviewers: + tbansal@chromium.org
tbansal: PTAL @ *
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/2390773003/diff/20001/chrome/browser/previews... File chrome/browser/previews/previews_service.h (right): https://codereview.chromium.org/2390773003/diff/20001/chrome/browser/previews... chrome/browser/previews/previews_service.h:35: const base::FilePath& profile_path); comment on the variables: e.g., |profile_path| https://codereview.chromium.org/2390773003/diff/20001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/20001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:95: "SELECT full_table.host_name, full_table.time, full_table.opt_out" do you have to say: SELECT full_table.host_name AS host_name, .... https://codereview.chromium.org/2390773003/diff/40001/chrome/browser/previews... File chrome/browser/previews/previews_service.h (right): https://codereview.chromium.org/2390773003/diff/40001/chrome/browser/previews... chrome/browser/previews/previews_service.h:31: // Initializes the UI Service. Say something about the function arguments. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_black_list.cc (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_black_list.cc:28: void PreviewsBlackList::Initialize() { Add thread checker. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:105: " ORDER BY full_table.host_name, full_table.time DESC"; Can you comment on why this needs to be ordered by host_name and time. Also why descending order. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.h (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.h:63: base::FilePath db_file_path_; can this be const?
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
tbansal: PTAL https://codereview.chromium.org/2390773003/diff/40001/chrome/browser/previews... File chrome/browser/previews/previews_service.h (right): https://codereview.chromium.org/2390773003/diff/40001/chrome/browser/previews... chrome/browser/previews/previews_service.h:31: // Initializes the UI Service. On 2016/10/04 15:01:21, tbansal1 wrote: > Say something about the function arguments. Done. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_black_list.cc (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_black_list.cc:28: void PreviewsBlackList::Initialize() { On 2016/10/04 15:01:21, tbansal1 wrote: > Add thread checker. I'm undoing this. It doesn't actually help anything. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:105: " ORDER BY full_table.host_name, full_table.time DESC"; On 2016/10/04 15:01:21, tbansal1 wrote: > Can you comment on why this needs to be ordered by host_name and time. > Also why descending order. Removed the ORDER BY. https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.h (right): https://codereview.chromium.org/2390773003/diff/40001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.h:63: base::FilePath db_file_path_; On 2016/10/04 15:01:21, tbansal1 wrote: > can this be const? Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2390773003/diff/20001/chrome/browser/previews... File chrome/browser/previews/previews_service.h (right): https://codereview.chromium.org/2390773003/diff/20001/chrome/browser/previews... chrome/browser/previews/previews_service.h:35: const base::FilePath& profile_path); On 2016/10/04 15:01:21, tbansal1 wrote: > comment on the variables: e.g., |profile_path| Done. https://codereview.chromium.org/2390773003/diff/20001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/20001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:95: "SELECT full_table.host_name, full_table.time, full_table.opt_out" On 2016/10/04 15:01:21, tbansal1 wrote: > do you have to say: > SELECT full_table.host_name AS host_name, > .... Missed this comment. The columns are accessed by index not by name.
lgtm https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:42: OP_HOST_NAME = 0, IIUC, the order of enums here must match the order in which the table is read. Can that be made explicit with a comment. https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.h (right): https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.h:49: static void LoadBlackListSync( Can this be moved to anonymized namespace since it is not accessing any class members?
ryansturm@chromium.org changed reviewers: + mmenke@chromium.org, shess@chromium.org, thestig@chromium.org
thestig: PTAL @ chrome/common/chrome_constants mmenke: PTAL @ ProfileImplIOData.cc shess: PTAL @ new SQL dependency in previews_opt_out_store_sql.* https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:42: OP_HOST_NAME = 0, On 2016/10/06 15:25:53, tbansal1 wrote: > IIUC, the order of enums here must match the order in which the table is read. > Can that be made explicit with a comment. Done. https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.h (right): https://codereview.chromium.org/2390773003/diff/60001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.h:49: static void LoadBlackListSync( On 2016/10/06 15:25:53, tbansal1 wrote: > Can this be moved to anonymized namespace since it is not accessing any class > members? Done.
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_daisy_...) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
profile_io_data LGTM
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out"); Why a directory rather than a file? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; Any unique key? Otherwise it's just a bag of data. Are there _any_ circumstances where you'd want to insert NULL into host_name and have it fill with ''. That doesn't make sense to me. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:74: db->set_cache_size(250); How big is all this data going to be? Like average case and at the 90th percentile might be good things to know. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } So if all of this fails, you're just going to give up? How will the situation ever change? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:106: " ON max_host_table.host_name == full_table.host_name"; This is going to cause multiple full table scans. How big is this table going to be? Would it be materially less efficient to just do a single scan and filter at the C++ level? In terms of memory usage, this is asking SQLite to maintain a hashtable of strings to implement the GROUP BY, and that table is going to have to carry opt-out and time state. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:122: base::WrapUnique(black_list_item); Took me a second to parse this one because of the manual operator[] call. Maybe (*black_list_item_map)[host_name] =? Or black_list_item_map->insert(std::make_pair(host_name, WrapUnique(black_list_item))? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:145: if (needs_initialization) { Any reason you're using this odd flag-passing rather than just testing db->is_open()? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:156: LoadBlackListFromDataBase(db, runner, callback); I think that if you implemented LoadBlackListFromDatabase() to return unique_ptr<BLIM>, with a single post back to the callback here, it would work fine. The sql:: layer would throw some error messages and fail some DCHECKS, but the case of "The database was not opened or failed to initialize" is expected to _never_ happen, so that is reasonable. WRT the DCHECKS, there is support for expecting errors in testing, so if that's why you're doing these checks, let me know and I can give you pointers. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:175: } Is there any protection against an outstanding LoadBlackListSync() in progress at this point? https://codereview.chromium.org/2390773003/diff/100001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/100001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:75: db->set_histogram_tag("PreviewsOptOut"); This needs a matching histograms.xml entry for SqliteDatabases.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Waiting for shess to approve. https://codereview.chromium.org/2390773003/diff/100001/chrome/browser/preview... File chrome/browser/previews/previews_service.cc (right): https://codereview.chromium.org/2390773003/diff/100001/chrome/browser/preview... chrome/browser/previews/previews_service.cc:40: previews_ui_service_.reset(new previews::PreviewsUIService( base::MakeUnique?
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
shess: PTAL https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out"); On 2016/10/06 17:24:27, Scott Hess wrote: > Why a directory rather than a file? I was copying the Offline Page metadata approach, Done. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/06 17:24:28, Scott Hess wrote: > Any unique key? Otherwise it's just a bag of data. > > Are there _any_ circumstances where you'd want to insert NULL into host_name and > have it fill with ''. That doesn't make sense to me. Besides adding a unique key just to add a unique key, I can't really think of a good reason to have one. This is bag of data that will be at most 32 (entries per host) * # number of hosts. While we only return 100 hosts to the in-memory map the current plan is to have it be unbounded in the DB, but it might be useful to keep that limit in the DB as well. I will add you on the code review for that CL. Good point about "". https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:74: db->set_cache_size(250); On 2016/10/06 17:24:28, Scott Hess wrote: > How big is all this data going to be? Like average case and at the 90th > percentile might be good things to know. I added a pretty verbose comment. Comments on it would be appreciated. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } On 2016/10/06 17:24:28, Scott Hess wrote: > So if all of this fails, you're just going to give up? How will the situation > ever change? If it fails for this session, I don't plan for it to fix itself. The next time the user profile is instantiated it will retry. Any ideas on a better solution? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:106: " ON max_host_table.host_name == full_table.host_name"; On 2016/10/06 17:24:28, Scott Hess wrote: > This is going to cause multiple full table scans. How big is this table going > to be? Would it be materially less efficient to just do a single scan and > filter at the C++ level? > > In terms of memory usage, this is asking SQLite to maintain a hashtable of > strings to implement the GROUP BY, and that table is going to have to carry > opt-out and time state. Thanks. C++ sounds like the way to go. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:122: base::WrapUnique(black_list_item); On 2016/10/06 17:24:28, Scott Hess wrote: > Took me a second to parse this one because of the manual operator[] call. Maybe > (*black_list_item_map)[host_name] =? Or > black_list_item_map->insert(std::make_pair(host_name, > WrapUnique(black_list_item))? I've moved to using a static in PreviewsBlackList; let me know if this syntax bothers you there, and I can fix it. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:145: if (needs_initialization) { On 2016/10/06 17:24:28, Scott Hess wrote: > Any reason you're using this odd flag-passing rather than just testing > db->is_open()? Done. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:156: LoadBlackListFromDataBase(db, runner, callback); On 2016/10/06 17:24:28, Scott Hess wrote: > I think that if you implemented LoadBlackListFromDatabase() to return > unique_ptr<BLIM>, with a single post back to the callback here, it would work > fine. The sql:: layer would throw some error messages and fail some DCHECKS, > but the case of "The database was not opened or failed to initialize" is > expected to _never_ happen, so that is reasonable. > > WRT the DCHECKS, there is support for expecting errors in testing, so if that's > why you're doing these checks, let me know and I can give you pointers. I think I understand what you are saying, so let me know if the new approach looks good. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:175: } On 2016/10/06 17:24:28, Scott Hess wrote: > Is there any protection against an outstanding LoadBlackListSync() in progress > at this point? My understanding of task runners is that the LoadBlackListSync call will complete before the Delete happens on that thread. https://codereview.chromium.org/2390773003/diff/100001/chrome/browser/preview... File chrome/browser/previews/previews_service.cc (right): https://codereview.chromium.org/2390773003/diff/100001/chrome/browser/preview... chrome/browser/previews/previews_service.cc:40: previews_ui_service_.reset(new previews::PreviewsUIService( On 2016/10/06 19:16:50, Lei Zhang wrote: > base::MakeUnique? Done. https://codereview.chromium.org/2390773003/diff/100001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/100001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:75: db->set_histogram_tag("PreviewsOptOut"); On 2016/10/06 17:24:28, Scott Hess wrote: > This needs a matching histograms.xml entry for SqliteDatabases. Done.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
ryansturm@chromium.org changed reviewers: + asvitkine@chromium.org
asvitkine: PTAL @ histograms
Description was changed from ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=639087 ========== to ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=653730 ==========
I'll try to give it another pass with fresh eyes in a few hours, in case there are other comments. https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out"); On 2016/10/06 20:05:39, Ryan Sturm wrote: > On 2016/10/06 17:24:27, Scott Hess wrote: > > Why a directory rather than a file? > > I was copying the Offline Page metadata approach, Done. In that case ... I'll also question your use of kPreviewsColumns, TableInfo, and OP_* column identifiers. IMHO, those are not best practices. In other areas of the code, I find that that sort of thing somewhat obscures what is actually going on, to no benefit. [Yes, I have had this discussion with the Offline Page CLs. They elected to go forward as-is.] https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/06 20:05:39, Ryan Sturm wrote: > On 2016/10/06 17:24:28, Scott Hess wrote: > > Any unique key? Otherwise it's just a bag of data. > > > > Are there _any_ circumstances where you'd want to insert NULL into host_name > and > > have it fill with ''. That doesn't make sense to me. > > Besides adding a unique key just to add a unique key, I can't really think of a > good reason to have one. > > This is bag of data that will be at most 32 (entries per host) * # number of > hosts. While we only return 100 hosts to the in-memory map the current plan is > to have it be unbounded in the DB, but it might be useful to keep that limit in > the DB as well. I will add you on the code review for that CL. > > Good point about "". I'm not really sure what "unbounded in the db" means in this case. It feels like intentionally building a latent N^2 path, which is probably not what you wanted me to think :-). I'm not going to require you to add keys if you don't need keys, but if you don't use keys, then you're committing to do full-table-scans for all sorts of things. That might be reasonable if the database is going to remain small, but then you need to have systems in place to keep it small. It's hard to tell where things are going, though, since this intentionally has no updates implemented. You may be hoping to avoid going there to make things easier on yourself, but I can assure you from harsh experience that once you've got data on user disks, some changes become _way_ more annoying to accomplish. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } On 2016/10/06 20:05:39, Ryan Sturm wrote: > On 2016/10/06 17:24:28, Scott Hess wrote: > > So if all of this fails, you're just going to give up? How will the situation > > ever change? > > If it fails for this session, I don't plan for it to fix itself. The next time > the user profile is instantiated it will retry. Any ideas on a better solution? It's persistent on disk, so most failures will be persistent. If the schema is broken, it will remain broken, if the database is corrupt, it will remain corrupt. There are systems for detecting and responding to errors. DatabaseErrorCallback() in components/omnibox/browser/shortcuts_database.cc is probably the closest to where I'm trying to go with that. The incidence of corruption is pretty low, call it 1%/year, but when multiplied by our userbase is can means many millions, and it accumulates over time (if nothing fixes it). Another option is nuke-from-orbit. Like an error callback calling sql::Connection::RazeAndClose(). net/extras/sqlite/sqlite_persistent_cookie_store.cc does this (though I see the RazeAndClose part is still TODO, sigh). https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:156: LoadBlackListFromDataBase(db, runner, callback); On 2016/10/06 20:05:39, Ryan Sturm wrote: > On 2016/10/06 17:24:28, Scott Hess wrote: > > I think that if you implemented LoadBlackListFromDatabase() to return > > unique_ptr<BLIM>, with a single post back to the callback here, it would work > > fine. The sql:: layer would throw some error messages and fail some DCHECKS, > > but the case of "The database was not opened or failed to initialize" is > > expected to _never_ happen, so that is reasonable. > > > > WRT the DCHECKS, there is support for expecting errors in testing, so if > that's > > why you're doing these checks, let me know and I can give you pointers. > > I think I understand what you are saying, so let me know if the new approach > looks good. I was thinking more like having LoadBlackListFromDatabase() as a standalone helper which converts sql into a unique_ptr<BlackListItemMap>, then with LoadBlackListSync() a wrapper which knows about initializing databases and calling callbacks. But it's six of one, half a dozen of the other. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:175: } On 2016/10/06 20:05:39, Ryan Sturm wrote: > On 2016/10/06 17:24:28, Scott Hess wrote: > > Is there any protection against an outstanding LoadBlackListSync() in progress > > at this point? > > My understanding of task runners is that the LoadBlackListSync call will > complete before the Delete happens on that thread. Yeah. I found it - the LoadBlackList() callback targets a weak pointer, so if there's an outstanding LoadBlackListSync(), the response will just drain. https://codereview.chromium.org/2390773003/diff/120001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/120001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:123: } Here I find myself wondering why not GetOrCreateBlackListItem() from the get-go. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:68: } As a for-instance of what I mean, if this was written as: if (!db->DoestTableExist("previews_v1")) { if (!db->Execute("CREATE TABLE previews_v1 ...")) return; } then the first time through I'd have suggested you could just use "CREATE TABLE IF NOT EXISTS", which would let you remove the DoesTableExist() call. You could also remove the transaction in that case, but it's unclear if that's desirable in the long term (since it has to come back if you ever make schema changes). https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:80: // or very long host names. This kind of sounds like you intend to offload your memory usage to the SQLite page cache. If you're not willing to spend the megabyte, then you shouldn't be spending the megabyte! That said ... you probably don't need that much cache. The cache needs to be big enough to handle storing data during a statement, and also to handle the pages you touch on updates. Touching a large number of pages on update is bad regardless of the cache size, so hopefully that won't come up. All that out of the way, I'm not going to ask for any changes, because the default setup for sql::Connection uses memory-mapping and some update wrappers to make sure that the cache mostly uses no space between transactions. SQLite's default is 2MB, and this is smaller than 2MB, so I'm happy. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:96: db->Preload(); I don't think this is probably justified. Just let the system work. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:109: " FROM " PREVIEWS_TABLE_NAME; It feels like the rewrite lost a layer? I think the LIMIT bit? Is that now at a higher level? I see the DCHECK against MaxInMemoryHostsInBlacklist() - does that mean you're just deferring the problem? [I think that's fine, just want to make sure I'm following you.] https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:178: db_.reset(new sql::Connection()); Is the idiom these days: db_ = base::MakeUnique<sql::Connection>()? to avoid the bare new?
histograms.xml lgtm
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
shess: PTAL https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/80001/chrome/common/chrome_co... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out"); On 2016/10/07 00:08:45, Scott Hess wrote: > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > On 2016/10/06 17:24:27, Scott Hess wrote: > > > Why a directory rather than a file? > > > > I was copying the Offline Page metadata approach, Done. > > In that case ... I'll also question your use of kPreviewsColumns, TableInfo, and > OP_* column identifiers. IMHO, those are not best practices. In other areas of > the code, I find that that sort of thing somewhat obscures what is actually > going on, to no benefit. > > [Yes, I have had this discussion with the Offline Page CLs. They elected to go > forward as-is.] sgtm. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/07 00:08:46, Scott Hess wrote: > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > On 2016/10/06 17:24:28, Scott Hess wrote: > > > Any unique key? Otherwise it's just a bag of data. > > > > > > Are there _any_ circumstances where you'd want to insert NULL into host_name > > and > > > have it fill with ''. That doesn't make sense to me. > > > > Besides adding a unique key just to add a unique key, I can't really think of > a > > good reason to have one. > > > > This is bag of data that will be at most 32 (entries per host) * # number of > > hosts. While we only return 100 hosts to the in-memory map the current plan is > > to have it be unbounded in the DB, but it might be useful to keep that limit > in > > the DB as well. I will add you on the code review for that CL. > > > > Good point about "". > > I'm not really sure what "unbounded in the db" means in this case. It feels > like intentionally building a latent N^2 path, which is probably not what you > wanted me to think :-). > > I'm not going to require you to add keys if you don't need keys, but if you > don't use keys, then you're committing to do full-table-scans for all sorts of > things. That might be reasonable if the database is going to remain small, but > then you need to have systems in place to keep it small. It's hard to tell > where things are going, though, since this intentionally has no updates > implemented. You may be hoping to avoid going there to make things easier on > yourself, but I can assure you from harsh experience that once you've got data > on user disks, some changes become _way_ more annoying to accomplish. Good point. The deleting query I wrote seem like they could be inefficient. Let me know if you see a good way to use keys to make them more efficient. The basic idea is that when a new entry comes in, if there are already 32 entries for the host, the oldest one is evicted. Here are the queries: " DELETE FROM " PREVIEWS_TABLE_NAME " WHERE host_name == ? AND" " time NOT IN (SELECT time FROM " PREVIEWS_TABLE_NAME " where host_name == ?" " ORDER BY time DESC" " LIMIT ?)"; "INSERT INTO " PREVIEWS_TABLE_NAME " (host_name, time, opt_out, type)" " VALUES " " (?, ?, ?, ?)"; "DELETE FROM " PREVIEWS_TABLE_NAME " WHERE time > ? and time < ?" I now plan to also add a query that keeps the total entry count below 3200 based on entry time. Thoughts on that as well? https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } On 2016/10/07 00:08:45, Scott Hess wrote: > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > On 2016/10/06 17:24:28, Scott Hess wrote: > > > So if all of this fails, you're just going to give up? How will the > situation > > > ever change? > > > > If it fails for this session, I don't plan for it to fix itself. The next time > > the user profile is instantiated it will retry. Any ideas on a better > solution? > > It's persistent on disk, so most failures will be persistent. If the schema is > broken, it will remain broken, if the database is corrupt, it will remain > corrupt. > > There are systems for detecting and responding to errors. > DatabaseErrorCallback() in components/omnibox/browser/shortcuts_database.cc is > probably the closest to where I'm trying to go with that. The incidence of > corruption is pretty low, call it 1%/year, but when multiplied by our userbase > is can means many millions, and it accumulates over time (if nothing fixes it). > > Another option is nuke-from-orbit. Like an error callback calling > sql::Connection::RazeAndClose(). > net/extras/sqlite/sqlite_persistent_cookie_store.cc does this (though I see the > RazeAndClose part is still TODO, sigh). I've gone ahead and used The DatabaseErrorCallback() approach. I don't understand how the db gets reopened in after the recover call in components/omnibox/browser/shortcuts_database.cc. Is it just left to fail the db calls after an error until a new session? I'm fine with that approach, but I'm not sure if that is exactly what is happening. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:156: LoadBlackListFromDataBase(db, runner, callback); On 2016/10/07 00:08:45, Scott Hess wrote: > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > On 2016/10/06 17:24:28, Scott Hess wrote: > > > I think that if you implemented LoadBlackListFromDatabase() to return > > > unique_ptr<BLIM>, with a single post back to the callback here, it would > work > > > fine. The sql:: layer would throw some error messages and fail some > DCHECKS, > > > but the case of "The database was not opened or failed to initialize" is > > > expected to _never_ happen, so that is reasonable. > > > > > > WRT the DCHECKS, there is support for expecting errors in testing, so if > > that's > > > why you're doing these checks, let me know and I can give you pointers. > > > > I think I understand what you are saying, so let me know if the new approach > > looks good. > > I was thinking more like having LoadBlackListFromDatabase() as a standalone > helper which converts sql into a unique_ptr<BlackListItemMap>, then with > LoadBlackListSync() a wrapper which knows about initializing databases and > calling callbacks. But it's six of one, half a dozen of the other. Acknowledged. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:175: } On 2016/10/07 00:08:45, Scott Hess wrote: > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > On 2016/10/06 17:24:28, Scott Hess wrote: > > > Is there any protection against an outstanding LoadBlackListSync() in > progress > > > at this point? > > > > My understanding of task runners is that the LoadBlackListSync call will > > complete before the Delete happens on that thread. > > Yeah. I found it - the LoadBlackList() callback targets a weak pointer, so if > there's an outstanding LoadBlackListSync(), the response will just drain. Acknowledged. https://codereview.chromium.org/2390773003/diff/120001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/120001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:123: } On 2016/10/07 00:08:46, Scott Hess wrote: > Here I find myself wondering why not GetOrCreateBlackListItem() from the get-go. I'll add one. Either way, I wanted a const getter as well, but GetOrCreateBlackListItem() does make it a little simpler. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:68: } On 2016/10/07 00:08:46, Scott Hess wrote: > As a for-instance of what I mean, if this was written as: > if (!db->DoestTableExist("previews_v1")) { > if (!db->Execute("CREATE TABLE previews_v1 ...")) > return; > } > then the first time through I'd have suggested you could just use "CREATE TABLE > IF NOT EXISTS", which would let you remove the DoesTableExist() call. You could > also remove the transaction in that case, but it's unclear if that's desirable > in the long term (since it has to come back if you ever make schema changes). I'll implement what is right for the current design, and add it back later. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:80: // or very long host names. On 2016/10/07 00:08:46, Scott Hess wrote: > This kind of sounds like you intend to offload your memory usage to the SQLite > page cache. If you're not willing to spend the megabyte, then you shouldn't be > spending the megabyte! > > That said ... you probably don't need that much cache. The cache needs to be > big enough to handle storing data during a statement, and also to handle the > pages you touch on updates. Touching a large number of pages on update is bad > regardless of the cache size, so hopefully that won't come up. > > All that out of the way, I'm not going to ask for any changes, because the > default setup for sql::Connection uses memory-mapping and some update wrappers > to make sure that the cache mostly uses no space between transactions. SQLite's > default is 2MB, and this is smaller than 2MB, so I'm happy. Done. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:96: db->Preload(); On 2016/10/07 00:08:46, Scott Hess wrote: > I don't think this is probably justified. Just let the system work. Done. https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:109: " FROM " PREVIEWS_TABLE_NAME; On 2016/10/07 00:08:46, Scott Hess wrote: > It feels like the rewrite lost a layer? I think the LIMIT bit? Is that now at > a higher level? I see the DCHECK against MaxInMemoryHostsInBlacklist() - does > that mean you're just deferring the problem? > > [I think that's fine, just want to make sure I'm following you.] Yes it moved to the PreviewsBlackList::[GetOr]CreateBlackListItem implementation. I'm slightly worried that EvictOldestOptOut in previews_black_list iterates through the hosts when there are 100 to remove one, but it doesn't always iterate through them all. All said and done adding these entries to the C++ map is O(n*k) (where k is # of hosts), but limited to 100 hosts, so O(n*100). WDYT? WDYT? https://codereview.chromium.org/2390773003/diff/140001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:178: db_.reset(new sql::Connection()); On 2016/10/07 00:08:46, Scott Hess wrote: > Is the idiom these days: > db_ = base::MakeUnique<sql::Connection>()? > to avoid the bare new? Done.
Let me know what you think about the primary key bit. The "id INTEGER PRIMARY KEY" is essentially free, since it just exposes something already in there. Using the timestamp as the primary key is cheaper-than-free, but might put unacceptable requirements on your timestamp, and is probably silly (on the other hand, it would make delete-by-time pretty efficient). https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/07 20:07:05, Ryan Sturm wrote: > On 2016/10/07 00:08:46, Scott Hess wrote: > > On 2016/10/06 20:05:39, Ryan Sturm wrote: > > > On 2016/10/06 17:24:28, Scott Hess wrote: > > > > Any unique key? Otherwise it's just a bag of data. > > > > > > > > Are there _any_ circumstances where you'd want to insert NULL into > host_name > > > and > > > > have it fill with ''. That doesn't make sense to me. > > > > > > Besides adding a unique key just to add a unique key, I can't really think > of > > a > > > good reason to have one. > > > > > > This is bag of data that will be at most 32 (entries per host) * # number of > > > hosts. While we only return 100 hosts to the in-memory map the current plan > is > > > to have it be unbounded in the DB, but it might be useful to keep that limit > > in > > > the DB as well. I will add you on the code review for that CL. > > > > > > Good point about "". > > > > I'm not really sure what "unbounded in the db" means in this case. It feels > > like intentionally building a latent N^2 path, which is probably not what you > > wanted me to think :-). > > > > I'm not going to require you to add keys if you don't need keys, but if you > > don't use keys, then you're committing to do full-table-scans for all sorts of > > things. That might be reasonable if the database is going to remain small, > but > > then you need to have systems in place to keep it small. It's hard to tell > > where things are going, though, since this intentionally has no updates > > implemented. You may be hoping to avoid going there to make things easier on > > yourself, but I can assure you from harsh experience that once you've got data > > on user disks, some changes become _way_ more annoying to accomplish. > > Good point. The deleting query I wrote seem like they could be inefficient. Let > me know if you see a good way to use keys to make them more efficient. The basic > idea is that when a new entry comes in, if there are already 32 entries for the > host, the oldest one is evicted. Here are the queries: > > " DELETE FROM " PREVIEWS_TABLE_NAME > " WHERE host_name == ? AND" > " time NOT IN (SELECT time FROM " PREVIEWS_TABLE_NAME > " where host_name == ?" > " ORDER BY time DESC" > " LIMIT ?)"; How often will a new entry come in? If it's frequent, this could probably be a bit horrible. An index on host_name would help for grouping things, but the table data wouldn't show locality. I think an index on (host_name,time) would allow this query to calculate the set of rows to delete using only the index, and then send off specific deletes, which I think is as good as you'll do. Better might be to notice when things are being deleted in memory and then delete by id. That would imply having a unique id, though, which might not be reasonable to add for only this, and you might not have a place to carry that data in your store abstraction. [I mean like "id INTEGER PRIMARY KEY", which would be a 64-bit value assigned by SQLite that uniquely identifies every row. It's an alias for the rowid which is the b-tree key SQLite uses internally, so it's index-alike.] One possibility I've seen is to use the 64-bit timestamp as the INTEGER PRIMARY KEY. > "INSERT INTO " PREVIEWS_TABLE_NAME > " (host_name, time, opt_out, type)" > " VALUES " > " (?, ?, ?, ?)"; > > > "DELETE FROM " PREVIEWS_TABLE_NAME " WHERE time > ? and time < ?" Is this something that would be infrequent? Like only at startup (or a short delay after startup)? An index on (host_name,time) will not satisfy this kind of time query directly, but I'm not sure if it's worth an entire index. If it only happens at startup, and you're already doing most of a full table scan, it might be easier to accumulate ids to delete. > I now plan to also add a query that keeps the total entry count below 3200 based > on entry time. Thoughts on that as well? The total count for the entire database? In that case you're talking about a db with maybe 300k of data, right? In that case it maybe doesn't matter much about adding multiple indices, and it will put a cap on growth. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } On 2016/10/07 20:07:06, Ryan Sturm wrote: > I've gone ahead and used The DatabaseErrorCallback() approach. I don't > understand how the db gets reopened in after the recover call in > components/omnibox/browser/shortcuts_database.cc. Is it just left to fail the db > calls after an error until a new session? I'm fine with that approach, but I'm > not sure if that is exactly what is happening. Things are wired so that all future calls fail. There's not really a clean way to re-open the database handle. Since the contents of the database may have changed, you'd have to tear down all of the state the client derived from the database, and recreate it. Clients tend not to be written with that in mind, and it's also maybe impossible anyhow. As-is, what happens is that changes are lost until a restart, which is weird, but not generally locally remarkable. The alternative without recovery is generally that changes are lost forever, and a restart doesn't help. https://codereview.chromium.org/2390773003/diff/160001/chrome/browser/preview... File chrome/browser/previews/previews_service.cc (right): https://codereview.chromium.org/2390773003/diff/160001/chrome/browser/preview... chrome/browser/previews/previews_service.cc:38: new previews::PreviewsOptOutStoreSQL( Did MakeUnique<>() not work for this one? https://codereview.chromium.org/2390773003/diff/160001/chrome/common/chrome_c... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/160001/chrome/common/chrome_c... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out.db"); Would this directory need to exist otherwise? I think "Offline Pages" has various other things in there. Most dbs like History and Top Sites just live in the profile rather than adding a new directory layer. I think that's preferred, as long as you're naming in a way which makes it easy to find you (like "Previews Opt Out"). [Yes, many databases have spaces in their name, yes they use Leading Caps, and yes they don't tend to have a suffix. Predates my input. Just don't name them .sqlite, the SQLite team hates having people bug them for other people's software.] https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_black_list.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_black_list.cc:20: void EvictOldestOptOut(BlackListItemMap* black_list_item_map) { This is a bit involved, but I can't, offhand, think of something better. About all I can think of is something where you maintain an ordered vector with an index into it, which is probably also painful. std::min_element() would work, with appropriate predicate, but won't be any more efficient. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:83: db->set_error_callback(base::Bind(&DatabaseErrorCallback, db, path)); Please circle back with a later CL that tests this case. An unfortunately common pattern is to never test the error handler and then find code doing poorly in the wild (filesystem corruptions basically never happen to developers). There are examples in tests for the code using sql::Recovery. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:106: sql::Statement statement(db->GetCachedStatement(SQL_FROM_HERE, kSql)); This is unlikely to be called multiple times in a Chrome execution, right? Can probably dial that back to GetUniqueStatement().
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Shess: PTAL. I went with the primary key for timestamp. I don't think there is a large risk of collisions with the time stamp keys (occaisionally someone's clock might not be incrementing for whatever reason); it helps with at least one query and the table overall, so it seems worth it. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; > > > > " DELETE FROM " PREVIEWS_TABLE_NAME > > " WHERE host_name == ? AND" > > " time NOT IN (SELECT time FROM " PREVIEWS_TABLE_NAME > > " where host_name == ?" > > " ORDER BY time DESC" > > " LIMIT ?)"; > > How often will a new entry come in? > > If it's frequent, this could probably be a bit horrible. An index on host_name > would help for grouping things, but the table data wouldn't show locality. I > think an index on (host_name,time) would allow this query to calculate the set > of rows to delete using only the index, and then send off specific deletes, > which I think is as good as you'll do. > It's a per navigation under certain conditions (2G slow network right now), so not super frequently, but common enough for a user. > Better might be to notice when things are being deleted in memory and then > delete by id. That would imply having a unique id, though, which might not be > reasonable to add for only this, and you might not have a place to carry that > data in your store abstraction. Right now, the store abstraction won't carry these ID's in memory as there is a policy of only remembering x of the last preview navigations to the host (where x varies by field trial and will probably be below 32). > [I mean like "id INTEGER PRIMARY KEY", which > would be a 64-bit value assigned by SQLite that uniquely identifies every row. > It's an alias for the rowid which is the b-tree key SQLite uses internally, so > it's index-alike.] > > One possibility I've seen is to use the 64-bit timestamp as the INTEGER PRIMARY > KEY. That seems like it would help with the "clear black list" query below, so it seems worth it. > > > "INSERT INTO " PREVIEWS_TABLE_NAME > > " (host_name, time, opt_out, type)" > > " VALUES " > > " (?, ?, ?, ?)"; > > > > > > "DELETE FROM " PREVIEWS_TABLE_NAME " WHERE time > ? and time < ?" > > Is this something that would be infrequent? Like only at startup (or a short > delay after startup)? An index on (host_name,time) will not satisfy this kind > of time query directly, but I'm not sure if it's worth an entire index. If it > only happens at startup, and you're already doing most of a full table scan, it > might be easier to accumulate ids to delete. This is pretty infrequent. It happens when a user clears history, so probably less than once per session for most users. > > > I now plan to also add a query that keeps the total entry count below 3200 > based > > on entry time. Thoughts on that as well? > > The total count for the entire database? In that case you're talking about a db > with maybe 300k of data, right? In that case it maybe doesn't matter much about > adding multiple indices, and it will put a cap on growth. Considering the size of the DB, I think I will add the unique key for timestamp and leave it at that. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:88: } On 2016/10/08 00:04:21, Scott Hess wrote: > On 2016/10/07 20:07:06, Ryan Sturm wrote: > > I've gone ahead and used The DatabaseErrorCallback() approach. I don't > > understand how the db gets reopened in after the recover call in > > components/omnibox/browser/shortcuts_database.cc. Is it just left to fail the > db > > calls after an error until a new session? I'm fine with that approach, but I'm > > not sure if that is exactly what is happening. > > Things are wired so that all future calls fail. > > There's not really a clean way to re-open the database handle. Since the > contents of the database may have changed, you'd have to tear down all of the > state the client derived from the database, and recreate it. Clients tend not > to be written with that in mind, and it's also maybe impossible anyhow. > > As-is, what happens is that changes are lost until a restart, which is weird, > but not generally locally remarkable. The alternative without recovery is > generally that changes are lost forever, and a restart doesn't help. Acknowledged. https://codereview.chromium.org/2390773003/diff/160001/chrome/browser/preview... File chrome/browser/previews/previews_service.cc (right): https://codereview.chromium.org/2390773003/diff/160001/chrome/browser/preview... chrome/browser/previews/previews_service.cc:38: new previews::PreviewsOptOutStoreSQL( On 2016/10/08 00:04:22, Scott Hess wrote: > Did MakeUnique<>() not work for this one? Done. https://codereview.chromium.org/2390773003/diff/160001/chrome/common/chrome_c... File chrome/common/chrome_constants.cc (right): https://codereview.chromium.org/2390773003/diff/160001/chrome/common/chrome_c... chrome/common/chrome_constants.cc:160: FPL("Previews/opt_out.db"); On 2016/10/08 00:04:22, Scott Hess wrote: > Would this directory need to exist otherwise? > > I think "Offline Pages" has various other things in there. Most dbs like > History and Top Sites just live in the profile rather than adding a new > directory layer. I think that's preferred, as long as you're naming in a way > which makes it easy to find you (like "Previews Opt Out"). > > [Yes, many databases have spaces in their name, yes they use Leading Caps, and > yes they don't tend to have a suffix. Predates my input. Just don't name them > .sqlite, the SQLite team hates having people bug them for other people's > software.] Done. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_black_list.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_black_list.cc:20: void EvictOldestOptOut(BlackListItemMap* black_list_item_map) { On 2016/10/08 00:04:22, Scott Hess wrote: > This is a bit involved, but I can't, offhand, think of something better. About > all I can think of is something where you maintain an ordered vector with an > index into it, which is probably also painful. std::min_element() would work, > with appropriate predicate, but won't be any more efficient. That is the same conclusion I came to. I don't think the confusion of doubly indexing this collection to be accessed quick by multiple metrics will help in the long run. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:83: db->set_error_callback(base::Bind(&DatabaseErrorCallback, db, path)); On 2016/10/08 00:04:22, Scott Hess wrote: > Please circle back with a later CL that tests this case. An unfortunately > common pattern is to never test the error handler and then find code doing > poorly in the wild (filesystem corruptions basically never happen to > developers). There are examples in tests for the code using sql::Recovery. Acknowledged. Opened a bug on myself; didn't add a todo however, crbug.com/654464 https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:106: sql::Statement statement(db->GetCachedStatement(SQL_FROM_HERE, kSql)); On 2016/10/08 00:04:22, Scott Hess wrote: > This is unlikely to be called multiple times in a Chrome execution, right? Can > probably dial that back to GetUniqueStatement(). The clearblacklist function clears a certain time frame from the data base, after which the in-memory map is refreshed from the data base, so this will be called at profile startup and after a clear history event.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
I'm going to go verify that a SELECT against something which the index can satisfy will only consult the index. But I'm pretty sure that's the case (I'm just nervous if the ORDER BY is considered when making that decision). If the compound index doesn't make sense, let me know. It's a bit of a degenerate use geared to fit specific queries. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/10 18:50:35, Ryan Sturm wrote: > > > > > > > " DELETE FROM " PREVIEWS_TABLE_NAME > > > " WHERE host_name == ? AND" > > > " time NOT IN (SELECT time FROM " PREVIEWS_TABLE_NAME > > > " where host_name == ?" > > > " ORDER BY time DESC" > > > " LIMIT ?)"; > > > > How often will a new entry come in? > > > > If it's frequent, this could probably be a bit horrible. An index on > host_name > > would help for grouping things, but the table data wouldn't show locality. I > > think an index on (host_name,time) would allow this query to calculate the set > > of rows to delete using only the index, and then send off specific deletes, > > which I think is as good as you'll do. > > > > It's a per navigation under certain conditions (2G slow network right now), so > not super frequently, but common enough for a user. OK, then a mixed bag, I guess. The dominant cost when the sub-select is not empty is going to be the transaction overhead, locking and fsync type things. But if the sub-select is empty, then the cost is locking (basically free given set_exclusive_locking), plus the overhead to calculate the sub-select. With no indexing, that overhead is a full table scan, which basically going to read all the data into memory, which is not preferred for a DELETE. I think it may make sense to have a (host_name, time) index. The main reason I'm reluctant to go there is because the index would basically double the database size. But trading database size for memory usage probably is worthwhile. I think you'd probably want something like: CREATE TABLE previews_v1 ( host_name VARCHAR NOT NULL, time INTEGER NOT NULL, opt_out INTEGER NOT NULL, type INTEGER NOT NULL, PRIMARY KEY(host_name, time DESC, opt_out, type) ) With this, the DELETE above would be efficient in the sub-select and then only delete the rows necessary. SELECT with ORDER BY host_name, time DESC can read from the index and rely on that ordering being efficient without taking up additional memory. INSERT will be somewhat more expensive, having to update two pages instead of one. The history-clearing DELETE will lose out, but I don't think optimizing it is worth de-optimizing the other statements. [I'm going to log a bug about this WRT https://www.sqlite.org/withoutrowid.html and sql::Recovery. I think WITHOUT ROWID would be a excellent fit for this case because it collapses the index and table together, but sql::Recovery does not support such tables at this time. Maybe I can go out and fix it, then adapt it back to this case.] > > Better might be to notice when things are being deleted in memory and then > > delete by id. That would imply having a unique id, though, which might not be > > reasonable to add for only this, and you might not have a place to carry that > > data in your store abstraction. > > Right now, the store abstraction won't carry these ID's in memory as there is a > policy of only remembering x of the last preview navigations to the host (where > x varies by field trial and will probably be below 32). Does the in-memory abstraction know when the DELETE would not even be necessary? Like if there are 10 host_name entries in memory, will it be able to short-circuit and not send the DELETE to the store? That could be pretty relevant, because it might mean the DELETE load would be substantially reduced. Unfortunately it's probably impossible to figure out good estimates for how much that would help (I'd not want to code in an expensive DELETE for heavy users). > > > "DELETE FROM " PREVIEWS_TABLE_NAME " WHERE time > ? and time < ?" > > > > Is this something that would be infrequent? Like only at startup (or a short > > delay after startup)? An index on (host_name,time) will not satisfy this kind > > of time query directly, but I'm not sure if it's worth an entire index. If it > > only happens at startup, and you're already doing most of a full table scan, > it > > might be easier to accumulate ids to delete. > > This is pretty infrequent. It happens when a user clears history, so probably > less than once per session for most users. Just in case ... when you say "clears history", do you mean a Chrome-level history clear, or do you mean clearing just the history for this feature? If this is tied to Chrome-level history clearing, then IMHO there's so much going on that optimizing your history-clearing at the expensive of more-common operations is probably pointless. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:106: sql::Statement statement(db->GetCachedStatement(SQL_FROM_HERE, kSql)); On 2016/10/10 18:50:35, Ryan Sturm wrote: > On 2016/10/08 00:04:22, Scott Hess wrote: > > This is unlikely to be called multiple times in a Chrome execution, right? > Can > > probably dial that back to GetUniqueStatement(). > > The clearblacklist function clears a certain time frame from the data base, > after which the in-memory map is refreshed from the data base, so this will be > called at profile startup and after a clear history event. GetCachedStatement() pins the prepared statement into a cache, which is probably not worthwhile for this use case. I think this should be GetUniqueStatement(). Note that the "Unique" refers to the sql::Statement, it doesn't mean the SQL itself has to be unique or anything.
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Makes sense. Let me know if anything seems awry or if the select will consult things beside the index. https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/80001/components/previews/cor... components/previews/core/previews_opt_out_store_sql.cc:34: " type INTEGER NOT NULL)"; On 2016/10/10 19:57:45, Scott Hess wrote: > On 2016/10/10 18:50:35, Ryan Sturm wrote: > > > > > > > > > > " DELETE FROM " PREVIEWS_TABLE_NAME > > > > " WHERE host_name == ? AND" > > > > " time NOT IN (SELECT time FROM " PREVIEWS_TABLE_NAME > > > > " where host_name == ?" > > > > " ORDER BY time DESC" > > > > " LIMIT ?)"; > > > > > > How often will a new entry come in? > > > > > > If it's frequent, this could probably be a bit horrible. An index on > > host_name > > > would help for grouping things, but the table data wouldn't show locality. > I > > > think an index on (host_name,time) would allow this query to calculate the > set > > > of rows to delete using only the index, and then send off specific deletes, > > > which I think is as good as you'll do. > > > > > > > It's a per navigation under certain conditions (2G slow network right now), so > > not super frequently, but common enough for a user. > > OK, then a mixed bag, I guess. The dominant cost when the sub-select is not > empty is going to be the transaction overhead, locking and fsync type things. > But if the sub-select is empty, then the cost is locking (basically free given > set_exclusive_locking), plus the overhead to calculate the sub-select. With no > indexing, that overhead is a full table scan, which basically going to read all > the data into memory, which is not preferred for a DELETE. > > I think it may make sense to have a (host_name, time) index. The main reason > I'm reluctant to go there is because the index would basically double the > database size. But trading database size for memory usage probably is > worthwhile. I think you'd probably want something like: > > CREATE TABLE previews_v1 ( > host_name VARCHAR NOT NULL, > time INTEGER NOT NULL, > opt_out INTEGER NOT NULL, > type INTEGER NOT NULL, > PRIMARY KEY(host_name, time DESC, opt_out, type) > ) > > With this, the DELETE above would be efficient in the sub-select and then only > delete the rows necessary. SELECT with ORDER BY host_name, time DESC can read > from the index and rely on that ordering being efficient without taking up > additional memory. INSERT will be somewhat more expensive, having to update two > pages instead of one. The history-clearing DELETE will lose out, but I don't > think optimizing it is worth de-optimizing the other statements. > > [I'm going to log a bug about this WRT https://www.sqlite.org/withoutrowid.html > and sql::Recovery. I think WITHOUT ROWID would be a excellent fit for this case > because it collapses the index and table together, but sql::Recovery does not > support such tables at this time. Maybe I can go out and fix it, then adapt it > back to this case.] > Makes sense to me. > > > Better might be to notice when things are being deleted in memory and then > > > delete by id. That would imply having a unique id, though, which might not > be > > > reasonable to add for only this, and you might not have a place to carry > that > > > data in your store abstraction. > > > > Right now, the store abstraction won't carry these ID's in memory as there is > a > > policy of only remembering x of the last preview navigations to the host > (where > > x varies by field trial and will probably be below 32). > > Does the in-memory abstraction know when the DELETE would not even be necessary? > Like if there are 10 host_name entries in memory, will it be able to > short-circuit and not send the DELETE to the store? That could be pretty > relevant, because it might mean the DELETE load would be substantially reduced. > Unfortunately it's probably impossible to figure out good estimates for how much > that would help (I'd not want to code in an expensive DELETE for heavy users). Unfortunately, when a host is evicted from the map and then re-added later, it might not have enough information to determine that, so there are cases where this is possible, but they might not be generic enough to consider sending this signal for. For all intents and purposes, the map will only hold 4 navigations per host and the DB holds 32, so we really might not save that much with this approach anyway. > > > > > "DELETE FROM " PREVIEWS_TABLE_NAME " WHERE time > ? and time < ?" > > > > > > Is this something that would be infrequent? Like only at startup (or a > short > > > delay after startup)? An index on (host_name,time) will not satisfy this > kind > > > of time query directly, but I'm not sure if it's worth an entire index. If > it > > > only happens at startup, and you're already doing most of a full table scan, > > it > > > might be easier to accumulate ids to delete. > > > > This is pretty infrequent. It happens when a user clears history, so probably > > less than once per session for most users. > > Just in case ... when you say "clears history", do you mean a Chrome-level > history clear, or do you mean clearing just the history for this feature? If > this is tied to Chrome-level history clearing, then IMHO there's so much going > on that optimizing your history-clearing at the expensive of more-common > operations is probably pointless. Chrome level clear history. https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/160001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:106: sql::Statement statement(db->GetCachedStatement(SQL_FROM_HERE, kSql)); On 2016/10/10 19:57:45, Scott Hess wrote: > On 2016/10/10 18:50:35, Ryan Sturm wrote: > > On 2016/10/08 00:04:22, Scott Hess wrote: > > > This is unlikely to be called multiple times in a Chrome execution, right? > > Can > > > probably dial that back to GetUniqueStatement(). > > > > The clearblacklist function clears a certain time frame from the data base, > > after which the in-memory map is refreshed from the data base, so this will be > > called at profile startup and after a clear history event. > > GetCachedStatement() pins the prepared statement into a cache, which is probably > not worthwhile for this use case. I think this should be GetUniqueStatement(). > Note that the "Unique" refers to the sql::Statement, it doesn't mean the SQL > itself has to be unique or anything. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
The CQ bit was checked by ryansturm@chromium.org to run a CQ dry run
Patchset #12 (id:220001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM. I'll probably find three more complaints in the future, but you have to declare success at some point! https://codereview.chromium.org/2390773003/diff/240001/components/previews/co... File components/previews/core/previews_opt_out_store_sql.cc (right): https://codereview.chromium.org/2390773003/diff/240001/components/previews/co... components/previews/core/previews_opt_out_store_sql.cc:107: sql::Statement statement(db->GetUniqueStatement(kSql)); Note that with the index change, this could now be: SELECT host_name, time, opt_out FROM previews_v1 ORDER BY host_name, time DESC; And then you could do some sort of accumulation/pruning inline in the Step() loop. That would probably be notably faster than applying EvictOldestOptOut() on every entry. Like you could accumulate for a host_name then just ignore remaining items until host_name changes. But save it for a later CL :-). [Note that the sub-select from earlier would need two separate O(N) passes, which I think is still worth avoiding since it could lead to thrashing.]
lgtm
On 2016/10/10 19:57:45, Scott Hess wrote: > I'm going to go verify that a SELECT against something which the index can > satisfy will only consult the index. But I'm pretty sure that's the case (I'm > just nervous if the ORDER BY is considered when making that decision). BTW, AFAICT the answer is yes, the ORDER BY will cause it to use the index, and it can satisfy all columns from the index.
The CQ bit was checked by ryansturm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from tbansal@chromium.org, mmenke@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/2390773003/#ps240001 (title: "table change")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=653730 ========== to ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=653730 ==========
Message was sent while issue was closed.
Committed patchset #12 (id:240001)
Message was sent while issue was closed.
Description was changed from ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=653730 ========== to ========== Adding a SQL implementation of the backing store for previews opt outs Previews maintains opt outs to determine eligibility of a navigation to a domain. To persist this information, a SQLite database is created to store opt out choice, host name, preview type, and time. This portion of the implementation only implements creating the table, and returning an empty result to PreviewBlackList (after running a real query). BUG=653730 Committed: https://crrev.com/2ae78ccf536394379900253a4099b5e089b07084 Cr-Commit-Position: refs/heads/master@{#424319} ==========
Message was sent while issue was closed.
Patchset 12 (id:??) landed as https://crrev.com/2ae78ccf536394379900253a4099b5e089b07084 Cr-Commit-Position: refs/heads/master@{#424319} |