|
|
Created:
5 years, 9 months ago by Roger McFarlane (Chromium) Modified:
5 years, 8 months ago CC:
chromium-reviews, browser-components-watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd last_requested field to the favicon_bitmaps table of the favicons database.
This CL adds a last_requested timestamp for each favicon bitmap. In a coming change, large icons will be fetched for display on the New Tab page. To avoid having the database storage used for large bitmaps grow unreasonably large, the last_requested field will be used to implement a cache eviction policy for these large icons.
The cache cleanup policy applied to regular-sized favicons remains unchanged.
As of 2015/03/17, the number of icons in the icons DB of reporting users, is distributed approximately as follows:
* 50% have less than 100
* 75% have less than 200
* 85% have less than 300
* 90% have less than 400
* 98% have less than 1000
* No users exceed 10000 icons.
Adding an additional integer field to the table for such small database sizes should not significantly affect the resources consumed by the database.
Design Doc:
https://docs.google.com/document/d/1rv4x3goTWFJzQu5a_h94xfNhSUG3sU98lfxa1Prys1w/edit
BUG=467712
Committed: https://crrev.com/982ef2c11eb828b3c3728d1e42784d4b3056b251
Cr-Commit-Position: refs/heads/master@{#323176}
Patch Set 1 #
Total comments: 6
Patch Set 2 : Initial CL for review. #
Total comments: 25
Patch Set 3 : Address review comments #
Total comments: 16
Patch Set 4 : Address 2nd round of comments. #
Total comments: 9
Patch Set 5 : Address comments and rebase for changes exported to other CLs #
Total comments: 2
Patch Set 6 : fix expect/assert #
Total comments: 2
Patch Set 7 : Merge GetFaviconBitmapLastRequestedtime into GetFaviconBitmap. #
Messages
Total messages: 65 (13 generated)
huangs@chromium.org changed reviewers: + huangs@chromium.org
A preliminary look. I presume you'll add design docs to the descriptions. https://chromiumcodereview.appspot.com/1004373002/diff/1/components/history/c... File components/history/core/browser/thumbnail_database.cc (right): https://chromiumcodereview.appspot.com/1004373002/diff/1/components/history/c... components/history/core/browser/thumbnail_database.cc:70: // favicon_bitmap_usage I think |bitmap_usage| would be cleaner. Although the filename is thumbnail_database, the DB name is 'Favicon'. https://chromiumcodereview.appspot.com/1004373002/diff/1/components/history/c... components/history/core/browser/thumbnail_database.cc:72: // favicon bitmap was requested via the various icon s/favicon/icon/ (|favicon| is overloaded, but it's good to blur if possible). https://chromiumcodereview.appspot.com/1004373002/diff/1/components/history/c... components/history/core/browser/thumbnail_database.cc:1387: return true; Would you need to add the table by executing |kFaviconBitmapUsageSql| ?
Patchset #2 (id:20001) has been deleted
https://codereview.chromium.org/1004373002/diff/1/components/history/core/bro... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/1/components/history/core/bro... components/history/core/browser/thumbnail_database.cc:70: // favicon_bitmap_usage On 2015/03/14 02:15:26, huangs wrote: > I think |bitmap_usage| would be cleaner. Although the filename is > thumbnail_database, the DB name is 'Favicon'. Done. https://codereview.chromium.org/1004373002/diff/1/components/history/core/bro... components/history/core/browser/thumbnail_database.cc:72: // favicon bitmap was requested via the various icon On 2015/03/14 02:15:26, huangs wrote: > s/favicon/icon/ (|favicon| is overloaded, but it's good to blur if possible). Done. https://codereview.chromium.org/1004373002/diff/1/components/history/core/bro... components/history/core/browser/thumbnail_database.cc:1387: return true; On 2015/03/14 02:15:26, huangs wrote: > Would you need to add the table by executing |kFaviconBitmapUsageSql| ? No, that's already done by this point. The tables get initialized in InitTables() and InitIndices() (called around line 1254) so I only have to do any lingering alterations here.
rogerm@chromium.org changed reviewers: + shess@chromium.org, sky@chromium.org
sky@ - you were suggested by presubmit as an OWNER shess@ - you seem to have done a lot of the previous work here. PTAL?
Note: I've deferred the pruning to a follow-on CL. On Mar 17, 2015 4:58 PM, <rogerm@chromium.org> wrote: > sky@ - you were suggested by presubmit as an OWNER > shess@ - you seem to have done a lot of the previous work here. > > PTAL? > > https://codereview.chromium.org/1004373002/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
How SQLite performs an update is to read the affected pages and rewrite them. It cannot update a partial row, so a "large" thumbnail is going to touch multiple database pages to update a single integer. This need not be crushing, depending on how often you'll be querying this value, and how often the code will already be modifying other columns anyhow (to refresh the bitmap, maybe). The two things that would concern me are if you're going to update frequently, and if you're going to update many rows. Updating only when someone actually brings up the new tab page seems reasonable to me, but if you're going to preemptively update less so, since it could happen many times before the user receives the payoff. If it's only going to update like 8 images per new tab page, that also seems reasonable, but if it's going to query and then update the top 40 images or something, less so. The alternative would be a table of {id, last_requested} with id as a foreign key onto this table. In that case the entire table will only take a few pages, so the difference between updating one row and updating 100 rows is minimal. Like I said, it's not entirely obvious to me where this sits. You know your intended usage best, so if the above makes you concerned, then maybe I'll be concerned :-). My entire database is under 10M so it seems like the worst case isn't appallingly horrible (and anything which results in the database getting smaller is probably a bigger win than optimizing table layout). https://codereview.chromium.org/1004373002/diff/40001/chrome/browser/history/... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://codereview.chromium.org/1004373002/diff/40001/chrome/browser/history/... chrome/browser/history/thumbnail_database_unittest.cc:850: EXPECT_TRUE(CreateDatabaseFromSQL(file_name_, "Favicons.v8.sql")); Create test for Recovery7 which runs the recovery from a version 7 file. I think it would be reasonable to just copy Recovery test and tweak the contents appropriately to init from the right golden file. In some of these things it might seem like it would be "better" to consolidate/rewrite/refactor to make the code smaller/easier/something. I have found this to not be generally true WRT persistence. The databases in the wild are spread across multiple states, so having a straightforward view of the code can be really helpful if/when things go awry. Not that they're going to go awry here. But I'm sure every time I've had to reconstruct things from revision control the original author didn't think things would go awry, either :-). https://codereview.chromium.org/1004373002/diff/40001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:83: // Version 8: ????????/r?????? by rogerm@chromium on 2015-??-?? Don't worry too much about filling this in, usually the most recent change is easy to find, it's the one that happened four years ago which takes forever. No more svn means the /r? section isn't sensible any longer. Also chromium.org. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:315: "last_requested INTEGER DEFAULT 0," Prefer to put the added column at the end, since that's where it will be for the ALTER TABLE version. One might argue "But this is how I would rather it be", but reality is that a billion users (or so) will have it at the end, so it's better to just let them all match. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:325: // column. Pull this out to UpgradeToVersion8() like the other cases. Also, try not to use conditional logic, if the column already exists in a version-7 database, that implies that transactional integrity has been broken and the correct response is to recover the database, not to hope that everything is just fine. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:366: return false; My earlier statement means this will have to be duplicated in the UpgradeToVersion8() function. That's fine, version 9 might remove this column, in which case this code will go away, but the v8 upgrade transition will still work to transition v7 databases to the v9 upgrade function. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:474: if (version != 8) { This should be < 7, in case someone's database was left corrupt just before an upgrade. This recovery code probably means that's a very small number, but it's free... https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:647: SQL_FROM_HERE, "SELECT COUNT(*) FROM favicons WHERE icon_type > ?")); Is there any reason to re-use the same sql::Statement variable, rather than using a different one per metric? sql::Statement::Assign() is not heavily used (though I wasn't quite able to get rid of it last time I looked at it), so I'd tend to prefer using separate statements in case there's some subtle edge case. It would make sense to put each in a {} scope, probably. [As an aside, you make me want to write some sort of GetSingleResultInt() helper for sql::Connection.] https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:651: count_statement.Step() ? count_statement.ColumnInt(0) : 0); These new histograms don't seem directly related to the CL, but I assume they're to inform future decisions? They probably need to have histograms.xml changes, though? https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:946: "DELETE FROM favicons WHERE id=?")); Egregious whitespace change is egregious. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:952: "DELETE FROM favicon_bitmaps WHERE icon_id=?")); Also here. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:1259: // detritus. That's just embarrassing! https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:1403: bool ThumbnailDatabase::UpgradeToVersion8() { This should include the schema to do the upgrade.
On 2015/03/17 22:07:46, Scott Hess wrote: > How SQLite performs an update is to read the affected pages and rewrite them. > It cannot update a partial row, so a "large" thumbnail is going to touch > multiple database pages to update a single integer. This need not be crushing, > depending on how often you'll be querying this value, and how often the code > will already be modifying other columns anyhow (to refresh the bitmap, maybe). > > The two things that would concern me are if you're going to update frequently, > and if you're going to update many rows. Updating only when someone actually > brings up the new tab page seems reasonable to me, but if you're going to > preemptively update less so, since it could happen many times before the user > receives the payoff. If it's only going to update like 8 images per new tab > page, that also seems reasonable, but if it's going to query and then update the > top 40 images or something, less so. The last_requested timestamp would only be updated for tiles shown to the user when user actually brings up the NTP. > The alternative would be a table of {id, last_requested} with id as a foreign > key onto this table. In that case the entire table will only take a few pages, > so the difference between updating one row and updating 100 rows is minimal. The initial iteration of the CL (still visible in the history) is set up exactly as you just described. After looking at the distribution of the number of icons stored in the wild, it didn't seem worthwhile to maintain this as as separate table along side the favicon_bitmap table. Maintaining a separate table might also be marginally faster whine pruning the large icons. At any given time, the large icons should be a small subset of the icons. So sorting and culling based on a table that only contained the (probably tens of) large icons would be quite fast. All of that said, we're only talking about hundreds to low thousands for the even the largest icon sets. So I would expect this difference to be almost immaterial. > Like I said, it's not entirely obvious to me where this sits. You know your > intended usage best, so if the above makes you concerned, then maybe I'll be > concerned :-). My entire database is under 10M so it seems like the worst case > isn't appallingly horrible (and anything which results in the database getting > smaller is probably a bigger win than optimizing table layout). Embedding the field in the record for every bitmap consumes represents less than 100K of data for even the largest of icon DB currently reporting stats. We'd be adding up the the pruning threshold (say 40 or so) new large icons to the DB, which would be the only ones potentially having a non-zero last_requested field value.
On 2015/03/18 18:28:02, Roger McFarlane (Chromium) wrote: > On 2015/03/17 22:07:46, Scott Hess wrote: > > How SQLite performs an update is to read the affected pages and rewrite them. > > It cannot update a partial row, so a "large" thumbnail is going to touch > > multiple database pages to update a single integer. This need not be > crushing, > > depending on how often you'll be querying this value, and how often the code > > will already be modifying other columns anyhow (to refresh the bitmap, maybe). > > > > The two things that would concern me are if you're going to update frequently, > > and if you're going to update many rows. Updating only when someone actually > > brings up the new tab page seems reasonable to me, but if you're going to > > preemptively update less so, since it could happen many times before the user > > receives the payoff. If it's only going to update like 8 images per new tab > > page, that also seems reasonable, but if it's going to query and then update > the > > top 40 images or something, less so. > > The last_requested timestamp would only be updated for tiles shown to the user > when user actually brings up the NTP. > > > The alternative would be a table of {id, last_requested} with id as a foreign > > key onto this table. In that case the entire table will only take a few > pages, > > so the difference between updating one row and updating 100 rows is minimal. > > The initial iteration of the CL (still visible in the history) is set up exactly > as you just described. After looking at the distribution of the number of icons > stored in the wild, it didn't seem worthwhile to maintain this as as separate > table along side the favicon_bitmap table. > > Maintaining a separate table might also be marginally faster whine pruning the > large icons. At any given time, the large icons should be a small subset of the > icons. So sorting and culling based on a table that only contained the (probably > tens of) large icons would be quite fast. > > All of that said, we're only talking about hundreds to low thousands for the > even the largest icon sets. So I would expect this difference to be almost > immaterial. > > > Like I said, it's not entirely obvious to me where this sits. You know your > > intended usage best, so if the above makes you concerned, then maybe I'll be > > concerned :-). My entire database is under 10M so it seems like the worst > case > > isn't appallingly horrible (and anything which results in the database getting > > smaller is probably a bigger win than optimizing table layout). > > Embedding the field in the record for every bitmap consumes represents less than > 100K of data for even the largest of icon DB currently reporting stats. We'd be > adding up the the pruning threshold (say 40 or so) new large icons to the DB, > which would be the only ones potentially having a non-zero last_requested field > value. I think this all sounds good, but that last paragraph makes me worried we're talking past each other. My point was that the update cost isn't sizeof(int64) with high locality between rows, it's sizeof(entire row) with low locality between rows (because many/most are already page size), so updating a single row is probably going to touch 6-8k of data in the database, and updating N rows will touch N* that. But I think that N<=8 is probably not big enough to make it worth changing this. SQLite uses a varint encoding for integers, so 0 will be stored as a single byte for rows which don't have a specific value, so the total space footprint probably won't be large. For a database storing bitmaps it's not worth worrying about.
rogerm@chromium.org changed reviewers: + beaudoin@chromium.org
On 2015/03/18 19:29:30, Scott Hess wrote: > On 2015/03/18 18:28:02, Roger McFarlane (Chromium) wrote: > > On 2015/03/17 22:07:46, Scott Hess wrote: > > > How SQLite performs an update is to read the affected pages and rewrite > them. > > > It cannot update a partial row, so a "large" thumbnail is going to touch > > > multiple database pages to update a single integer. This need not be > > crushing, > > > depending on how often you'll be querying this value, and how often the code > > > will already be modifying other columns anyhow (to refresh the bitmap, > maybe). > > > > > > The two things that would concern me are if you're going to update > frequently, > > > and if you're going to update many rows. Updating only when someone > actually > > > brings up the new tab page seems reasonable to me, but if you're going to > > > preemptively update less so, since it could happen many times before the > user > > > receives the payoff. If it's only going to update like 8 images per new tab > > > page, that also seems reasonable, but if it's going to query and then update > > the > > > top 40 images or something, less so. > > > > The last_requested timestamp would only be updated for tiles shown to the user > > when user actually brings up the NTP. > > > > > The alternative would be a table of {id, last_requested} with id as a > foreign > > > key onto this table. In that case the entire table will only take a few > > pages, > > > so the difference between updating one row and updating 100 rows is minimal. > > > > The initial iteration of the CL (still visible in the history) is set up > exactly > > as you just described. After looking at the distribution of the number of > icons > > stored in the wild, it didn't seem worthwhile to maintain this as as separate > > table along side the favicon_bitmap table. > > > > Maintaining a separate table might also be marginally faster whine pruning the > > large icons. At any given time, the large icons should be a small subset of > the > > icons. So sorting and culling based on a table that only contained the > (probably > > tens of) large icons would be quite fast. > > > > All of that said, we're only talking about hundreds to low thousands for the > > even the largest icon sets. So I would expect this difference to be almost > > immaterial. > > > > > Like I said, it's not entirely obvious to me where this sits. You know your > > > intended usage best, so if the above makes you concerned, then maybe I'll be > > > concerned :-). My entire database is under 10M so it seems like the worst > > case > > > isn't appallingly horrible (and anything which results in the database > getting > > > smaller is probably a bigger win than optimizing table layout). > > > > Embedding the field in the record for every bitmap consumes represents less > than > > 100K of data for even the largest of icon DB currently reporting stats. We'd > be > > adding up the the pruning threshold (say 40 or so) new large icons to the DB, > > which would be the only ones potentially having a non-zero last_requested > field > > value. > > I think this all sounds good, but that last paragraph makes me worried we're > talking past each other. My point was that the update cost isn't sizeof(int64) > with high locality between rows, it's sizeof(entire row) with low locality > between rows (because many/most are already page size), so updating a single row > is probably going to touch 6-8k of data in the database, and updating N rows > will touch N* that. But I think that N<=8 is probably not big enough to make it > worth changing this. > > SQLite uses a varint encoding for integers, so 0 will be stored as a single byte > for rows which don't have a specific value, so the total space footprint > probably won't be large. For a database storing bitmaps it's not worth worrying > about. Hi Shess, Thanks for that careful analysis of the sqlite memory model. I read a bit more about it and I agree we have to be conscious of the number of additional row updates we'll impose on favicon_bitmaps. Here are these additional updates (or delete): 1) As the user browses, update 1 row when the large icon is received 2) As the user browses, if the number of large icons grows beyond the high water mark (set at 100 for now based on the small favicon distribution above) ...perform a compaction operation, deleting least recently requested rows to bring it back to the low water mark (set at 50) 3) When the user visits the NTP, update 8 rows for the 8 requested icons. Each update is an independent operation. Of these, only operation (2) is slightly worrying to me, but given that in the worst case this will happen once every 50 webpage visit (if the user visits only new webpages), I believe it's infrequent enough not to worry about. Given that I'm confident we can go ahead with the proposed change.
On 2015/03/18 21:29:53, beaudoin wrote: > Given that I'm confident we can go ahead with the proposed change. OK, your analysis on that front sounds right. My general rule is that if it's a human-sized operation caused by human action, it's _probably_ not worth going overboard on. I was mostly concerned about automated/background type things causing some sort of amplification, and your #1 and #2 sound like they'll already touch that data and the new field won't change that.
LGTM modulo Scott's comments.
Thanks. I've addressed the review comments and dropped the last_requested index. Another look? sky@: are you an appropriate OWNER reviewer? Care to weigh in, or should I tap someone else? https://codereview.chromium.org/1004373002/diff/40001/chrome/browser/history/... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://codereview.chromium.org/1004373002/diff/40001/chrome/browser/history/... chrome/browser/history/thumbnail_database_unittest.cc:850: EXPECT_TRUE(CreateDatabaseFromSQL(file_name_, "Favicons.v8.sql")); On 2015/03/17 22:07:44, Scott Hess wrote: > Create test for Recovery7 which runs the recovery from a version 7 file. I > think it would be reasonable to just copy Recovery test and tweak the contents > appropriately to init from the right golden file. > > In some of these things it might seem like it would be "better" to > consolidate/rewrite/refactor to make the code smaller/easier/something. I have > found this to not be generally true WRT persistence. The databases in the wild > are spread across multiple states, so having a straightforward view of the code > can be really helpful if/when things go awry. > > Not that they're going to go awry here. But I'm sure every time I've had to > reconstruct things from revision control the original author didn't think things > would go awry, either :-). Done. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:83: // Version 8: ????????/r?????? by rogerm@chromium on 2015-??-?? On 2015/03/17 22:07:45, Scott Hess wrote: > Don't worry too much about filling this in, usually the most recent change is > easy to find, it's the one that happened four years ago which takes forever. > > No more svn means the /r? section isn't sensible any longer. Also http://chromium.org. Fair enough. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:315: "last_requested INTEGER DEFAULT 0," On 2015/03/17 22:07:45, Scott Hess wrote: > Prefer to put the added column at the end, since that's where it will be for the > ALTER TABLE version. One might argue "But this is how I would rather it be", > but reality is that a billion users (or so) will have it at the end, so it's > better to just let them all match. Done. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:325: // column. On 2015/03/17 22:07:46, Scott Hess wrote: > Pull this out to UpgradeToVersion8() like the other cases. Also, try not to use > conditional logic, if the column already exists in a version-7 database, that > implies that transactional integrity has been broken and the correct response is > to recover the database, not to hope that everything is just fine. I've clobbered the index over favicon_bitmaps(last_requested), so moving this to the version 8 upgrade is now fine. The "problem" that made me do it this way initially was that the code is structured as: InitTablesOrFail(); InitIndicesOrFail(); if (previous inits didn't get you to the right version) VersionDeltaUpgrad(); InitTables() might be a NOP, which would cause InitIndices() to fail for missing columns. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:366: return false; On 2015/03/17 22:07:46, Scott Hess wrote: > My earlier statement means this will have to be duplicated in the > UpgradeToVersion8() function. That's fine, version 9 might remove this column, > in which case this code will go away, but the v8 upgrade transition will still > work to transition v7 databases to the v9 upgrade function. I ended up removing this index entirely. I was going to use it to help limit the range of bitmaps considered then doing garbage collection, but I don't think it actually helps do that. As an alternative, I could conceivably keep the index but not set a default value for regular icon bitmaps (i.e., leave it as NULL) and set 0 for to signify never-requested large icon bitmaps. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:474: if (version != 8) { On 2015/03/17 22:07:45, Scott Hess wrote: > This should be < 7, in case someone's database was left corrupt just before an > upgrade. This recovery code probably means that's a very small number, but it's > free... Ah, quite right. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:647: SQL_FROM_HERE, "SELECT COUNT(*) FROM favicons WHERE icon_type > ?")); On 2015/03/17 22:07:46, Scott Hess wrote: > Is there any reason to re-use the same sql::Statement variable, rather than > using a different one per metric? sql::Statement::Assign() is not heavily used > (though I wasn't quite able to get rid of it last time I looked at it), so I'd > tend to prefer using separate statements in case there's some subtle edge case. > It would make sense to put each in a {} scope, probably. > > [As an aside, you make me want to write some sort of GetSingleResultInt() helper > for sql::Connection.] Done. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:651: count_statement.Step() ? count_statement.ColumnInt(0) : 0); On 2015/03/17 22:07:45, Scott Hess wrote: > These new histograms don't seem directly related to the CL, but I assume they're > to inform future decisions? They probably need to have histograms.xml changes, > though? I'll pull out the large favicon histograms to a follow-on CL. Thanks for the histograms.xml reminder. Actually, it looks like the previous author also forgot to update histograms.xml... History.NumFaviconsInDB is missing an entry. I'll fix that at the same time. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:946: "DELETE FROM favicons WHERE id=?")); On 2015/03/17 22:07:45, Scott Hess wrote: > Egregious whitespace change is egregious. Done. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:952: "DELETE FROM favicon_bitmaps WHERE icon_id=?")); On 2015/03/17 22:07:44, Scott Hess wrote: > Also here. Done. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:1259: // detritus. On 2015/03/17 22:07:46, Scott Hess wrote: > That's just embarrassing! :) https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:1403: bool ThumbnailDatabase::UpgradeToVersion8() { On 2015/03/17 22:07:44, Scott Hess wrote: > This should include the schema to do the upgrade. Done.
Few more NITs. https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... chrome/browser/history/thumbnail_database_unittest.cc:244: EXPECT_TRUE(db.SetFaviconBitmapLastRequestedTime(id, now)); NIT: add redundant last_requested = base::Time::UnixEpoch(); to make this case independent of the last? https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... File chrome/test/data/History/Favicons.v8.sql (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... chrome/test/data/History/Favicons.v8.sql:12: CREATE TABLE favicon_bitmaps(id INTEGER PRIMARY KEY,icon_id INTEGER NOT NULL,last_updated INTEGER DEFAULT 0, image_data BLOB,width INTEGER DEFAULT 0,height INTEGER DEFAULT 0,last_requested INTEGER DEFAULT 0); NIT: extra space before comma. https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... chrome/test/data/History/Favicons.v8.sql:13: INSERT INTO "favicon_bitmaps" VALUES(1,1,1287424416,X'313233343631303233353631323033393437353136333435313635393133343837313034373831323336343931363534313932333435313932333435313233343931333400',32,32, 0); NIT: same. https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... File components/history/core/browser/thumbnail_database.cc (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:1244: // thumbnails table has been obsolete for a long time, remove any NIT: unwrap? https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:1396: NIT: extra space. I recently learned about "git cl lint", maybe try that?
Has this gone through eng review? -Scott On Thu, Mar 19, 2015 at 12:14 PM, <huangs@chromium.org> wrote: > Few more NITs. > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... > File chrome/browser/history/thumbnail_database_unittest.cc (right): > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... > chrome/browser/history/thumbnail_database_unittest.cc:244: > EXPECT_TRUE(db.SetFaviconBitmapLastRequestedTime(id, now)); > NIT: add redundant > last_requested = base::Time::UnixEpoch(); > > to make this case independent of the last? > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > File chrome/test/data/History/Favicons.v8.sql (right): > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > chrome/test/data/History/Favicons.v8.sql:12: CREATE TABLE > favicon_bitmaps(id INTEGER PRIMARY KEY,icon_id INTEGER NOT > NULL,last_updated INTEGER DEFAULT 0, image_data BLOB,width INTEGER > DEFAULT 0,height INTEGER DEFAULT 0,last_requested INTEGER DEFAULT 0); > NIT: extra space before comma. > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > chrome/test/data/History/Favicons.v8.sql:13: INSERT INTO > "favicon_bitmaps" > VALUES(1,1,1287424416,X'313233343631303233353631323033393437353136333435313635393133343837313034373831323336343931363534313932333435313932333435313233343931333400',32,32, > 0); > NIT: same. > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > File components/history/core/browser/thumbnail_database.cc (right): > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > components/history/core/browser/thumbnail_database.cc:1244: // > thumbnails table has been obsolete for a long time, remove any > NIT: unwrap? > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > components/history/core/browser/thumbnail_database.cc:1396: > NIT: extra space. I recently learned about "git cl lint", maybe try > that? > > https://chromiumcodereview.appspot.com/1004373002/ To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/03/19 19:34:31, sky wrote: > Has this gone through eng review? > > -Scott > > On Thu, Mar 19, 2015 at 12:14 PM, <mailto:huangs@chromium.org> wrote: > > Few more NITs. > > > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... > > File chrome/browser/history/thumbnail_database_unittest.cc (right): > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... > > chrome/browser/history/thumbnail_database_unittest.cc:244: > > EXPECT_TRUE(db.SetFaviconBitmapLastRequestedTime(id, now)); > > NIT: add redundant > > last_requested = base::Time::UnixEpoch(); > > > > to make this case independent of the last? > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > > File chrome/test/data/History/Favicons.v8.sql (right): > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > > chrome/test/data/History/Favicons.v8.sql:12: CREATE TABLE > > favicon_bitmaps(id INTEGER PRIMARY KEY,icon_id INTEGER NOT > > NULL,last_updated INTEGER DEFAULT 0, image_data BLOB,width INTEGER > > DEFAULT 0,height INTEGER DEFAULT 0,last_requested INTEGER DEFAULT 0); > > NIT: extra space before comma. > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... > > chrome/test/data/History/Favicons.v8.sql:13: INSERT INTO > > "favicon_bitmaps" > > > VALUES(1,1,1287424416,X'313233343631303233353631323033393437353136333435313635393133343837313034373831323336343931363534313932333435313932333435313233343931333400',32,32, > > 0); > > NIT: same. > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > > File components/history/core/browser/thumbnail_database.cc (right): > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > > components/history/core/browser/thumbnail_database.cc:1244: // > > thumbnails table has been obsolete for a long time, remove any > > NIT: unwrap? > > > > > https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... > > components/history/core/browser/thumbnail_database.cc:1396: > > NIT: extra space. I recently learned about "git cl lint", maybe try > > that? > > > > https://chromiumcodereview.appspot.com/1004373002/ > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. Hi Scott, This went through launch review (which included eng leads discussing the proposed design). The initial design docs have been reviewed and discussed (the design doc linked to that CL will point you to the older ones with traces of these discussions). The LRU cache this CL will enable is a result of these discussions.
sky@chromium.org changed reviewers: + pkotwicz@chromium.org
Excellent. I'm going to redirect this to Peter. He should really be in the owners file for all of these files (he was prior to refactoring to components).
On 2015/03/19 19:50:10, sky wrote: > Excellent. > > I'm going to redirect this to Peter. He should really be in the owners file for > all of these files (he was prior to refactoring to components). Ooops. Got my PM's messages mixed-up. :) This proposal did NOT go through launch review. It has been discussed quite a bit with michaelbai and others, but may be worth having a formal eng review if you think it deserves it.
I will look at this tomorrow.
LGTM. I'm going to be OOO until Monday, so apologies if you need to loop me in again and I'm not responsive. https://codereview.chromium.org/1004373002/diff/40001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/40001/components/history/core... components/history/core/browser/thumbnail_database.cc:325: // column. On 2015/03/19 17:55:51, Roger McFarlane (Chromium) wrote: > On 2015/03/17 22:07:46, Scott Hess wrote: > > Pull this out to UpgradeToVersion8() like the other cases. Also, try not to > use > > conditional logic, if the column already exists in a version-7 database, that > > implies that transactional integrity has been broken and the correct response > is > > to recover the database, not to hope that everything is just fine. > > I've clobbered the index over favicon_bitmaps(last_requested), so moving this to > the version 8 upgrade is now fine. > > The "problem" that made me do it this way initially was that the code is > structured as: > > InitTablesOrFail(); > InitIndicesOrFail(); > if (previous inits didn't get you to the right version) > VersionDeltaUpgrad(); > > InitTables() might be a NOP, which would cause InitIndices() to fail for missing > columns. Grr. I really hate our system. I want to have it only call Init*() on fresh databases and always do migrate on versioned databases. Your point sounds similar to if your database version N needs to be upgraded to N+2 where N+1 adds a table and N+2 alters the table. InitTables() adds the N+2 table directly then the migrate script fails. Unfortunately I haven't been brave enough to make this change. Maybe it's time. https://codereview.chromium.org/1004373002/diff/60001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/60001/components/history/core... components/history/core/browser/thumbnail_database.cc:86: // Version 5: e2ee8ae9/r105004 by groby@chromium.org on 2011-10-12 (deprecated) Cool beans. I was going to suggest that earlier, but it seemed mostly unrelated to this CL. Certainly fine with doing maintenance as you go, though :-).
lgtm https://codereview.chromium.org/1004373002/diff/60001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/60001/components/history/core... components/history/core/browser/thumbnail_database.cc:1244: // thumbnails table has been obsolete for a long time, remove any On 2015/03/19 19:14:11, huangs wrote: > NIT: unwrap? I'm fine leaving it wrapped, 90% of the multi-line comments in this file could be unwrapped. It would almost look off if this one was. :)
Yes please, send to eng review. On Thu, Mar 19, 2015 at 12:53 PM, <beaudoin@chromium.org> wrote: > On 2015/03/19 19:50:10, sky wrote: >> >> Excellent. > > >> I'm going to redirect this to Peter. He should really be in the owners >> file > > for >> >> all of these files (he was prior to refactoring to components). > > > Ooops. Got my PM's messages mixed-up. :) This proposal did NOT go through > launch > review. It has been discussed quite a bit with michaelbai and others, but > may be > worth having a formal eng review if you think it deserves it. > > https://chromiumcodereview.appspot.com/1004373002/ To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/browser/h... chrome/browser/history/thumbnail_database_unittest.cc:244: EXPECT_TRUE(db.SetFaviconBitmapLastRequestedTime(id, now)); On 2015/03/19 19:14:11, huangs wrote: > NIT: add redundant > last_requested = base::Time::UnixEpoch(); > > to make this case independent of the last? Done. Not strictly necessary since last_requested should otherwise be either UnixEpoch or null, but not "now". https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... File chrome/test/data/History/Favicons.v8.sql (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... chrome/test/data/History/Favicons.v8.sql:12: CREATE TABLE favicon_bitmaps(id INTEGER PRIMARY KEY,icon_id INTEGER NOT NULL,last_updated INTEGER DEFAULT 0, image_data BLOB,width INTEGER DEFAULT 0,height INTEGER DEFAULT 0,last_requested INTEGER DEFAULT 0); On 2015/03/19 19:14:11, huangs wrote: > NIT: extra space before comma. Done. https://chromiumcodereview.appspot.com/1004373002/diff/60001/chrome/test/data... chrome/test/data/History/Favicons.v8.sql:13: INSERT INTO "favicon_bitmaps" VALUES(1,1,1287424416,X'313233343631303233353631323033393437353136333435313635393133343837313034373831323336343931363534313932333435313932333435313233343931333400',32,32, 0); On 2015/03/19 19:14:11, huangs wrote: > NIT: same. Done. https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... File components/history/core/browser/thumbnail_database.cc (right): https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:86: // Version 5: e2ee8ae9/r105004 by groby@chromium.org on 2011-10-12 (deprecated) On 2015/03/19 21:32:22, Scott Hess wrote: > Cool beans. I was going to suggest that earlier, but it seemed mostly unrelated > to this CL. Certainly fine with doing maintenance as you go, though :-). With they way recovery/upgrade is done, it was messy to support 5. So, this wasn't *only* maintenance/cleanup. :) https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:1244: // thumbnails table has been obsolete for a long time, remove any On 2015/03/19 19:14:11, huangs wrote: > NIT: unwrap? Done. https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:1244: // thumbnails table has been obsolete for a long time, remove any On 2015/03/19 21:36:54, beaudoin wrote: > On 2015/03/19 19:14:11, huangs wrote: > > NIT: unwrap? > > I'm fine leaving it wrapped, 90% of the multi-line comments in this file could > be unwrapped. It would almost look off if this one was. :) Acknowledged. https://chromiumcodereview.appspot.com/1004373002/diff/60001/components/histo... components/history/core/browser/thumbnail_database.cc:1396: On 2015/03/19 19:14:11, huangs wrote: > NIT: extra space. I recently learned about "git cl lint", maybe try that? Done.
Looks good. Can you please split the portion of this CL which deprecates version 5 of the database into a different CL? https://codereview.chromium.org/1004373002/diff/60001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/60001/components/history/core... components/history/core/browser/thumbnail_database.cc:450: // the code simple. http://crbug.com/327485 for numbers. Nit: Can you please update the comment? https://codereview.chromium.org/1004373002/diff/80001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:638: bitmap_count.Step() ? bitmap_count.ColumnInt(0) : 0); Can you explain why you are adding this UMA stat? My understanding is that there is at most two favicon bitmaps per entry in the favicons database. If you are in fact adding the stat, you should add an entry to tools/metrics/histograms/histograms.xml https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1136: // value (0). We should copy the last_requested field over. Having special behavior is confusing. https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1290: if (cur_version == 5) { Given that version 5 is now deprecated shouldn't this be removed? https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1393: return false; I wonder whether adding an extra index would be useful to make the compaction query quicker. Unfortunately, shess@ would be the person who would know whether this is desirable.
Ping me when you are ready for another review
Patchset #5 (id:100001) has been deleted
Patchset #5 (id:120001) has been deleted
Thanks. Another look? https://codereview.chromium.org/1004373002/diff/60001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/60001/components/history/core... components/history/core/browser/thumbnail_database.cc:450: // the code simple. http://crbug.com/327485 for numbers. On 2015/03/20 19:51:55, pkotwicz wrote: > Nit: Can you please update the comment? See: https://codereview.chromium.org/1010973011 https://codereview.chromium.org/1004373002/diff/80001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:638: bitmap_count.Step() ? bitmap_count.ColumnInt(0) : 0); On 2015/03/20 19:51:55, pkotwicz wrote: > Can you explain why you are adding this UMA stat? My understanding is that there > is at most two favicon bitmaps per entry in the favicons database. > > If you are in fact adding the stat, you should add an entry to > tools/metrics/histograms/histograms.xml This change was moved to https://codereview.chromium.org/1032763002/ Let's continue the discussion there. https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1136: // value (0). On 2015/03/20 19:51:55, pkotwicz wrote: > We should copy the last_requested field over. Having special behavior is > confusing. Done. https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1290: if (cur_version == 5) { On 2015/03/20 19:51:55, pkotwicz wrote: > Given that version 5 is now deprecated shouldn't this be removed? Done. See: https://codereview.chromium.org/1010973011 https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1393: return false; On 2015/03/20 19:51:55, pkotwicz wrote: > I wonder whether adding an extra index would be useful to make the compaction > query quicker. Unfortunately, shess@ would be the person who would know whether > this is desirable. Yes, that's something I wanted to discuss more as well. I don't think an index over the last_requested column would help. We're only compacting large icons, of which we expect to have very few compared to the number of regular favicons. The latter gets culled on a history deletion, modulo items referenced by bookmarks. I.e., we're deleting a limited number large bitmaps whose IDs are in the set of large bitmaps ordered by their last used time. AFAICT, this would still require a full scan even if we had the index. A multi-column index over the icon type and the last_requested time would help. As might putting the last_requested time in a separate id/value table used only for NTP large icons.
On 2015/03/20 19:51:55, pkotwicz wrote: > Looks good. > > Can you please split the portion of this CL which deprecates version 5 of the > database into a different CL? Done. https://codereview.chromium.org/1010973011/
LGTM Thanks for your patience! https://codereview.chromium.org/1004373002/diff/140001/chrome/browser/history... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://codereview.chromium.org/1004373002/diff/140001/chrome/browser/history... chrome/browser/history/thumbnail_database_unittest.cc:229: EXPECT_NE(0, id); Nit: EXPECT_NE -> ASSERT_NE ASSERT_NE will cause the remainder of the test to be skipped if it fails
https://codereview.chromium.org/1004373002/diff/80001/components/history/core... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/80001/components/history/core... components/history/core/browser/thumbnail_database.cc:1393: return false; On 2015/03/27 14:40:18, Roger McFarlane (Chromium) wrote: > On 2015/03/20 19:51:55, pkotwicz wrote: > > I wonder whether adding an extra index would be useful to make the compaction > > query quicker. Unfortunately, shess@ would be the person who would know > whether > > this is desirable. > > Yes, that's something I wanted to discuss more as well. > > I don't think an index over the last_requested column would help. > > We're only compacting large icons, of which we expect to have very few compared > to the number of regular favicons. The latter gets culled on a history deletion, > modulo items referenced by bookmarks. I.e., we're deleting a limited number > large bitmaps whose IDs are in the set of large bitmaps ordered by their last > used time. AFAICT, this would still require a full scan even if we had the > index. A multi-column index over the icon type and the last_requested time would > help. As might putting the last_requested time in a separate id/value table used > only for NTP large icons. The CL doesn't contain any information about which icons are considered large, nor which icons have their last_requested set (large only), nor who wants to make the decision about what to trim from the cache, nor whether the decision is "older than T" or "keep the first N newest". As-is, it feels like the caller is tucking the last_requested in and then pulling it back out and making the decision itself, in which case the only index that matters is the id index. If the last_requested were only set for large icons, then an index on last_requested would allow writing something like: -- partial index. Could also just allow NULL in last_requested and use "IS NOT NULL". CREATE INDEX last_requested_index ON favicon_bitmaps(last_requested) WHERE last_requested > 0; DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM favicon_bitmaps WHERE last_requested > 0 ORDER BY last_requested DESC LIMIT 1000000 OFFSET ?N) The LIMIT constant is a hack to say "All the id's after N". "Older than 30 days" would be cleaner. I think the real question is when you're going to do this cleaning? If you're going to do it once per browser session, then it probably doesn't matter. If you're going to do it incrementally all the time, then it might matter.
On 2015/03/27 17:59:33, Scott Hess wrote: > https://codereview.chromium.org/1004373002/diff/80001/components/history/core... > File components/history/core/browser/thumbnail_database.cc (right): > > https://codereview.chromium.org/1004373002/diff/80001/components/history/core... > components/history/core/browser/thumbnail_database.cc:1393: return false; > On 2015/03/27 14:40:18, Roger McFarlane (Chromium) wrote: > > On 2015/03/20 19:51:55, pkotwicz wrote: > > > I wonder whether adding an extra index would be useful to make the > compaction > > > query quicker. Unfortunately, shess@ would be the person who would know > > whether > > > this is desirable. > > > > Yes, that's something I wanted to discuss more as well. > > > > I don't think an index over the last_requested column would help. > > > > We're only compacting large icons, of which we expect to have very few > compared > > to the number of regular favicons. The latter gets culled on a history > deletion, > > modulo items referenced by bookmarks. I.e., we're deleting a limited number > > large bitmaps whose IDs are in the set of large bitmaps ordered by their last > > used time. AFAICT, this would still require a full scan even if we had the > > index. A multi-column index over the icon type and the last_requested time > would > > help. As might putting the last_requested time in a separate id/value table > used > > only for NTP large icons. > > The CL doesn't contain any information about which icons are considered large, > nor which icons have their last_requested set (large only), nor who wants to > make the decision about what to trim from the cache, nor whether the decision is > "older than T" or "keep the first N newest". As-is, it feels like the caller is > tucking the last_requested in and then pulling it back out and making the > decision itself, in which case the only index that matters is the id index. > > If the last_requested were only set for large icons, then an index on > last_requested would allow writing something like: > -- partial index. Could also just allow NULL in last_requested and use "IS > NOT NULL". > CREATE INDEX last_requested_index ON favicon_bitmaps(last_requested) WHERE > last_requested > 0; > > DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM favicon_bitmaps WHERE > last_requested > 0 ORDER BY last_requested DESC LIMIT 1000000 OFFSET ?N) > > The LIMIT constant is a hack to say "All the id's after N". "Older than 30 > days" would be cleaner. > > I think the real question is when you're going to do this cleaning? If you're > going to do it once per browser session, then it probably doesn't matter. If > you're going to do it incrementally all the time, then it might matter. Hmm. Maybe scratch some of that. Partial indices are in SQLite as of 3.8, we are currently at 3.8.7.4, but sqlite.gyp has required_sqlite_version = 3.6.1. I don't know what that is there go cover, maybe iOS?
On 2015/03/27 18:02:43, Scott Hess wrote: > On 2015/03/27 17:59:33, Scott Hess wrote: > > > https://codereview.chromium.org/1004373002/diff/80001/components/history/core... > > File components/history/core/browser/thumbnail_database.cc (right): > > > > > https://codereview.chromium.org/1004373002/diff/80001/components/history/core... > > components/history/core/browser/thumbnail_database.cc:1393: return false; > > On 2015/03/27 14:40:18, Roger McFarlane (Chromium) wrote: > > > On 2015/03/20 19:51:55, pkotwicz wrote: > > > > I wonder whether adding an extra index would be useful to make the > > compaction > > > > query quicker. Unfortunately, shess@ would be the person who would know > > > whether > > > > this is desirable. > > > > > > Yes, that's something I wanted to discuss more as well. > > > > > > I don't think an index over the last_requested column would help. > > > > > > We're only compacting large icons, of which we expect to have very few > > compared > > > to the number of regular favicons. The latter gets culled on a history > > deletion, > > > modulo items referenced by bookmarks. I.e., we're deleting a limited number > > > large bitmaps whose IDs are in the set of large bitmaps ordered by their > last > > > used time. AFAICT, this would still require a full scan even if we had the > > > index. A multi-column index over the icon type and the last_requested time > > would > > > help. As might putting the last_requested time in a separate id/value table > > used > > > only for NTP large icons. > > > > The CL doesn't contain any information about which icons are considered large, > > nor which icons have their last_requested set (large only), nor who wants to > > make the decision about what to trim from the cache, nor whether the decision > is > > "older than T" or "keep the first N newest". As-is, it feels like the caller > is > > tucking the last_requested in and then pulling it back out and making the > > decision itself, in which case the only index that matters is the id index. > > > > If the last_requested were only set for large icons, then an index on > > last_requested would allow writing something like: > > -- partial index. Could also just allow NULL in last_requested and use "IS > > NOT NULL". > > CREATE INDEX last_requested_index ON favicon_bitmaps(last_requested) WHERE > > last_requested > 0; > > > > DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM favicon_bitmaps > WHERE > > last_requested > 0 ORDER BY last_requested DESC LIMIT 1000000 OFFSET ?N) > > > > The LIMIT constant is a hack to say "All the id's after N". "Older than 30 > > days" would be cleaner. > > > > I think the real question is when you're going to do this cleaning? If you're > > going to do it once per browser session, then it probably doesn't matter. If > > you're going to do it incrementally all the time, then it might matter. > > Hmm. Maybe scratch some of that. Partial indices are in SQLite as of 3.8, we > are currently at 3.8.7.4, but sqlite.gyp has required_sqlite_version = 3.6.1. I > don't know what that is there go cover, maybe iOS? Well, and the above will still work with a non-partial index, the index size will just be bigger. SQLite encodes 0, 1, and NULL very compactly, so even that might not be a big deal.
Sorry I did not respond to the comment. My concern was that the query to determine whether compaction is necessary would be expensive if we did it on every page load. (My understanding is that we are not planning to do this) I suspect that the query to determine whether compaction is necessary would be: "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 In this case, I was wondering whether an index on "icon_type" would be useful. Based on offline discussion (I don't remember who I had the offline discussion with) it seems that the current plan is to: (A) Query how many large icons there are (and whether compaction is required) at the beginning of the browser session. (B) Keep track of how many large icons there are during the browser session (and whether compaction is necessary) in RAM. If this is the case, then an extra index would not be useful. (B) seems kind of hard to do
On 2015/03/27 18:42:10, pkotwicz wrote: > Sorry I did not respond to the comment. My concern was that the query to > determine whether compaction is necessary would be expensive if we did it on > every page load. (My understanding is that we are not planning to do this) > > I suspect that the query to determine whether compaction is necessary would be: > "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 > In this case, I was wondering whether an index on "icon_type" would be useful. > > Based on offline discussion (I don't remember who I had the offline discussion > with) it seems that the current plan is to: > (A) Query how many large icons there are (and whether compaction is required) at > the beginning of the browser session. > (B) Keep track of how many large icons there are during the browser session (and > whether compaction is necessary) in RAM. > If this is the case, then an extra index would not be useful. (B) seems kind of > hard to do I believe the conversation was with me. The plan you describe is correct: query the number of large icons once, cache it, update it as new large icons are added, and clean-up when this number gets too big. I considered it implementation detail but now realize it's worth spelling out in the design doc, which I've done here: go/icon-ntp
On 2015/03/27 18:58:11, beaudoin wrote: > On 2015/03/27 18:42:10, pkotwicz wrote: > > Sorry I did not respond to the comment. My concern was that the query to > > determine whether compaction is necessary would be expensive if we did it on > > every page load. (My understanding is that we are not planning to do this) > > > > I suspect that the query to determine whether compaction is necessary would > be: > > "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 > > In this case, I was wondering whether an index on "icon_type" would be useful. > > > > Based on offline discussion (I don't remember who I had the offline discussion > > with) it seems that the current plan is to: > > (A) Query how many large icons there are (and whether compaction is required) > at > > the beginning of the browser session. > > (B) Keep track of how many large icons there are during the browser session > (and > > whether compaction is necessary) in RAM. > > If this is the case, then an extra index would not be useful. (B) seems kind > of > > hard to do > > I believe the conversation was with me. The plan you describe is correct: query > the number of large icons once, cache it, update it as new large icons are > added, and clean-up when this number gets too big. > > I considered it implementation detail but now realize it's worth spelling out in > the design doc, which I've done here: go/icon-ntp Yeah, that doesn't sound like an index is warranted. It would make most things slower, and only help with the COUNT(*) query (which I'm assuming has a WHERE clause, with w/o an index it will be a full table scan). My guess is that at startup we're already going to fault enough of the pages in that avoiding the full table scan isn't going to gain a ton.
On 2015/03/27 19:03:48, Scott Hess wrote: > On 2015/03/27 18:58:11, beaudoin wrote: > > On 2015/03/27 18:42:10, pkotwicz wrote: > > > Sorry I did not respond to the comment. My concern was that the query to > > > determine whether compaction is necessary would be expensive if we did it on > > > every page load. (My understanding is that we are not planning to do this) > > > > > > I suspect that the query to determine whether compaction is necessary would > > be: > > > "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 > > > In this case, I was wondering whether an index on "icon_type" would be > useful. > > > > > > Based on offline discussion (I don't remember who I had the offline > discussion > > > with) it seems that the current plan is to: > > > (A) Query how many large icons there are (and whether compaction is > required) > > at > > > the beginning of the browser session. > > > (B) Keep track of how many large icons there are during the browser session > > (and > > > whether compaction is necessary) in RAM. > > > If this is the case, then an extra index would not be useful. (B) seems kind > > of > > > hard to do > > > > I believe the conversation was with me. The plan you describe is correct: > query > > the number of large icons once, cache it, update it as new large icons are > > added, and clean-up when this number gets too big. > > > > I considered it implementation detail but now realize it's worth spelling out > in > > the design doc, which I've done here: go/icon-ntp > > Yeah, that doesn't sound like an index is warranted. It would make most things > slower, and only help with the COUNT(*) query (which I'm assuming has a WHERE > clause, with w/o an index it will be a full table scan). My guess is that at > startup we're already going to fault enough of the pages in that avoiding the > full table scan isn't going to gain a ton. Right. Also, we already do that COUNT(*) (with the same WHERE :)) on 1% of the startup to log a UMA stat. We'll just do it on 100% of the startups. :)
So, it doesn't actually need to be that bad. We only need to know how many "large-and-disposable" icons are when we add a new one to the set. We can actually count them the first time (in browsing session) that we add such an icon to the DB and just keep a running counter in RAM after that. On passing the cull threshold, we do the delete shess@ described earlier. DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM favicon_bitmaps WHERE last_requested > 0 ORDER BY last_requested DESC LIMIT 1000000 OFFSET ?N) i.e., delete all but the most recent N On Fri, Mar 27, 2015 at 3:07 PM, <beaudoin@chromium.org> wrote: > On 2015/03/27 19:03:48, Scott Hess wrote: > >> On 2015/03/27 18:58:11, beaudoin wrote: >> > On 2015/03/27 18:42:10, pkotwicz wrote: >> > > Sorry I did not respond to the comment. My concern was that the query >> to >> > > determine whether compaction is necessary would be expensive if we >> did it >> > on > >> > > every page load. (My understanding is that we are not planning to do >> this) >> > > >> > > I suspect that the query to determine whether compaction is necessary >> > would > >> > be: >> > > "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 >> > > In this case, I was wondering whether an index on "icon_type" would be >> useful. >> > > >> > > Based on offline discussion (I don't remember who I had the offline >> discussion >> > > with) it seems that the current plan is to: >> > > (A) Query how many large icons there are (and whether compaction is >> required) >> > at >> > > the beginning of the browser session. >> > > (B) Keep track of how many large icons there are during the browser >> > session > >> > (and >> > > whether compaction is necessary) in RAM. >> > > If this is the case, then an extra index would not be useful. (B) >> seems >> > kind > >> > of >> > > hard to do >> > >> > I believe the conversation was with me. The plan you describe is >> correct: >> query >> > the number of large icons once, cache it, update it as new large icons >> are >> > added, and clean-up when this number gets too big. >> > >> > I considered it implementation detail but now realize it's worth >> spelling >> > out > >> in >> > the design doc, which I've done here: go/icon-ntp >> <https://goto.google.com/icon-ntp> >> > > Yeah, that doesn't sound like an index is warranted. It would make most >> > things > >> slower, and only help with the COUNT(*) query (which I'm assuming has a >> WHERE >> clause, with w/o an index it will be a full table scan). My guess is >> that at >> startup we're already going to fault enough of the pages in that avoiding >> the >> full table scan isn't going to gain a ton. >> > > Right. Also, we already do that COUNT(*) (with the same WHERE :)) on 1% of > the > startup to log a UMA stat. We'll just do it on 100% of the startups. :) > > https://codereview.chromium.org/1004373002/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/03/27 19:16:52, Roger McFarlane (Chromium) wrote: > On passing the cull threshold, we do the delete shess@ described earlier. > > DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM > favicon_bitmaps WHERE last_requested > 0 ORDER BY last_requested DESC > LIMIT 1000000 OFFSET ?N) > > i.e., delete all but the most recent N The above statement will do a full table scan to generate a set of ids, then delete them in ascending order. If the client code already knows the id values it should just delete them by id in a loop. In id order would be best. Note that this database has a fairly small cache size set, so a large update has potential to thrash the cache.
On "what is a large icon?"... For our purposes (showing NTP recommendations as icons) large icons are those that we downloaded (on demand) for the purpose of adding them to the NTP. These may be of various types (large favicon, touch, manifest) depending on what the site provides. Further, we don't want to take ownership of icons that were downloaded for another purpose (bookmarks, send to homescreen, tab/history icons, etc). So, we update the last_requested time when we fetch an icon because we didn't find an appropriate icon already in the DB that we could just return. I.e.: on mobile, this implies that we should mostly just be returning already cached touch icons for sites the user has already visited. On mobile and desktop, it would imply picking up large favicons for those sites that don't expose a touch icon but do expose various favicon sizes (current large favicons are ignored by the DB) and picking the touch icon for desktop sites that elect to provide one. On Fri, Mar 27, 2015 at 3:16 PM, Roger McFarlane <rogerm@chromium.org> wrote: > So, it doesn't actually need to be that bad. We only need to know how many > "large-and-disposable" icons are when we add a new one to the set. We can > actually count them the first time (in browsing session) that we add such > an icon to the DB and just keep a running counter in RAM after that. > > On passing the cull threshold, we do the delete shess@ described earlier. > > DELETE FROM favicon_bitmaps WHERE id IN (SELECT id FROM > favicon_bitmaps WHERE last_requested > 0 ORDER BY last_requested DESC > LIMIT 1000000 OFFSET ?N) > > i.e., delete all but the most recent N > > > > > > On Fri, Mar 27, 2015 at 3:07 PM, <beaudoin@chromium.org> wrote: > >> On 2015/03/27 19:03:48, Scott Hess wrote: >> >>> On 2015/03/27 18:58:11, beaudoin wrote: >>> > On 2015/03/27 18:42:10, pkotwicz wrote: >>> > > Sorry I did not respond to the comment. My concern was that the >>> query to >>> > > determine whether compaction is necessary would be expensive if we >>> did it >>> >> on >> >>> > > every page load. (My understanding is that we are not planning to do >>> this) >>> > > >>> > > I suspect that the query to determine whether compaction is necessary >>> >> would >> >>> > be: >>> > > "SELECT COUNT(*) FROM favicons WHERE icon_type != 1 >>> > > In this case, I was wondering whether an index on "icon_type" would >>> be >>> useful. >>> > > >>> > > Based on offline discussion (I don't remember who I had the offline >>> discussion >>> > > with) it seems that the current plan is to: >>> > > (A) Query how many large icons there are (and whether compaction is >>> required) >>> > at >>> > > the beginning of the browser session. >>> > > (B) Keep track of how many large icons there are during the browser >>> >> session >> >>> > (and >>> > > whether compaction is necessary) in RAM. >>> > > If this is the case, then an extra index would not be useful. (B) >>> seems >>> >> kind >> >>> > of >>> > > hard to do >>> > >>> > I believe the conversation was with me. The plan you describe is >>> correct: >>> query >>> > the number of large icons once, cache it, update it as new large icons >>> are >>> > added, and clean-up when this number gets too big. >>> > >>> > I considered it implementation detail but now realize it's worth >>> spelling >>> >> out >> >>> in >>> > the design doc, which I've done here: go/icon-ntp >>> <https://goto.google.com/icon-ntp> >>> >> >> Yeah, that doesn't sound like an index is warranted. It would make most >>> >> things >> >>> slower, and only help with the COUNT(*) query (which I'm assuming has a >>> WHERE >>> clause, with w/o an index it will be a full table scan). My guess is >>> that at >>> startup we're already going to fault enough of the pages in that >>> avoiding the >>> full table scan isn't going to gain a ton. >>> >> >> Right. Also, we already do that COUNT(*) (with the same WHERE :)) on 1% >> of the >> startup to log a UMA stat. We'll just do it on 100% of the startups. :) >> >> https://codereview.chromium.org/1004373002/ >> > > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
> > The above statement will do a full table scan to generate a set of ids, > then > delete them in ascending order. If the client code already knows the id > values > it should just delete them by id in a loop. In id order would be best. Just to clarify: are you suggesting running the nested query first then doing the delete in a separate loop, or keeping track of the id:last_requested pair in a separate structure (in mem, or say in a separate table). To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Fri, Mar 27, 2015 at 1:51 PM, Roger McFarlane <rogerm@chromium.org> wrote: >> The above statement will do a full table scan to generate a set of ids, >> then >> delete them in ascending order. If the client code already knows the id >> values >> it should just delete them by id in a loop. In id order would be best. > > Just to clarify: are you suggesting running the nested query first then > doing the delete in a separate loop, or keeping track of the > id:last_requested pair in a separate structure (in mem, or say in a separate > table). Nevermind. I thought that since you were able to track a garbage-collection threshold in memory, then that implied you had a glob of metadata. Now I realize that you can update the COUNT(*) info based on whether an image came from the database or was constructed anew and added to the database. So I don't think you can get away from the full table scan. -scott To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
So, it sounds like everyone is supportive of this landing? Can I consider it approved? Were there any questions raised that I/we failed to address? On Fri, Mar 27, 2015 at 4:57 PM, Scott Hess <shess@chromium.org> wrote: > On Fri, Mar 27, 2015 at 1:51 PM, Roger McFarlane <rogerm@chromium.org> > wrote: > >> The above statement will do a full table scan to generate a set of ids, > >> then > >> delete them in ascending order. If the client code already knows the id > >> values > >> it should just delete them by id in a loop. In id order would be best. > > > > Just to clarify: are you suggesting running the nested query first then > > doing the delete in a separate loop, or keeping track of the > > id:last_requested pair in a separate structure (in mem, or say in a > separate > > table). > > Nevermind. I thought that since you were able to track a > garbage-collection threshold in memory, then that implied you had a > glob of metadata. Now I realize that you can update the COUNT(*) info > based on whether an image came from the database or was constructed > anew and added to the database. So I don't think you can get away > from the full table scan. > > -scott > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/03/27 21:09:19, Roger McFarlane (Chromium) wrote: > So, it sounds like everyone is supportive of this landing? > > Can I consider it approved? Were there any questions raised that I/we > failed to address? I think everyone was LGTM, w/pkotwicz nit in thumbnail_database_unittest.cc
https://codereview.chromium.org/1004373002/diff/140001/chrome/browser/history... File chrome/browser/history/thumbnail_database_unittest.cc (right): https://codereview.chromium.org/1004373002/diff/140001/chrome/browser/history... chrome/browser/history/thumbnail_database_unittest.cc:229: EXPECT_NE(0, id); On 2015/03/27 17:27:33, pkotwicz wrote: > Nit: EXPECT_NE -> ASSERT_NE > ASSERT_NE will cause the remainder of the test to be skipped if it fails Done.
The CQ bit was checked by rogerm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from beaudoin@chromium.org, pkotwicz@chromium.org, shess@chromium.org Link to the patchset: https://codereview.chromium.org/1004373002/#ps160001 (title: "fix expect/assert")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1004373002/160001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
On 2015/03/28 02:54:20, I haz the power (commit-bot) wrote: > Try jobs failed on following builders: > chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...) sky@... I need your OWNERS stamp of approval to let the CQ land this. Cheers, RogerM
https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... components/history/core/browser/thumbnail_database.cc:736: bool ThumbnailDatabase::GetFaviconBitmapLastRequestedTime( Generally the history code has tried to have functions that return all related state, rather than a slew of functions for standalone data that was need at one time or another. Rather than a new function, can you change GetFaviconBitmap() to return the last_requested time as well.
On 2015/03/30 14:51:42, sky wrote: > https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... > File components/history/core/browser/thumbnail_database.cc (right): > > https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... > components/history/core/browser/thumbnail_database.cc:736: bool > ThumbnailDatabase::GetFaviconBitmapLastRequestedTime( > Generally the history code has tried to have functions that return all related > state, rather than a slew of functions for standalone data that was need at one > time or another. Rather than a new function, can you change GetFaviconBitmap() > to return the last_requested time as well. I don't fundamentally object to this. But... I wouldn't expect a caller who's just gotten an icon bitmap to care about the last time at which they requested the same bitmap (assuming it returns the previous time). Nor would they be interested in knowing the current time (one could argue it would more correct to always return "now" for the last_requested time if it's inline with an actual request?). This function is mostly exposed as a testing seam; the real user of that field will be the compaction code (coming to a CL near you). What say you?
You could apply those arguments to other functions in the history classes. I prefer consistency. -Scott On Mon, Mar 30, 2015 at 10:45 AM, <rogerm@chromium.org> wrote: > On 2015/03/30 14:51:42, sky wrote: > > https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... >> >> File components/history/core/browser/thumbnail_database.cc (right): > > > > https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... >> >> components/history/core/browser/thumbnail_database.cc:736: bool >> ThumbnailDatabase::GetFaviconBitmapLastRequestedTime( >> Generally the history code has tried to have functions that return all >> related >> state, rather than a slew of functions for standalone data that was need >> at > > one >> >> time or another. Rather than a new function, can you change >> GetFaviconBitmap() >> to return the last_requested time as well. > > > I don't fundamentally object to this. But... > > I wouldn't expect a caller who's just gotten an icon bitmap to care about > the > last time at which they requested the same bitmap (assuming it returns the > previous time). Nor would they be interested in knowing the current time > (one > could argue it would more correct to always return "now" for the > last_requested > time if it's inline with an actual request?). > > This function is mostly exposed as a testing seam; the real user of that > field > will be the compaction code (coming to a CL near you). > > What say you? > > > https://codereview.chromium.org/1004373002/ To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Patchset #7 (id:180001) has been deleted
Thanks, sky! Another look? https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... File components/history/core/browser/thumbnail_database.cc (right): https://codereview.chromium.org/1004373002/diff/160001/components/history/cor... components/history/core/browser/thumbnail_database.cc:736: bool ThumbnailDatabase::GetFaviconBitmapLastRequestedTime( On 2015/03/30 14:51:42, sky wrote: > Generally the history code has tried to have functions that return all related > state, rather than a slew of functions for standalone data that was need at one > time or another. Rather than a new function, can you change GetFaviconBitmap() > to return the last_requested time as well. Done.
Thanks, LGTM
The CQ bit was checked by rogerm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from beaudoin@chromium.org, pkotwicz@chromium.org, shess@chromium.org Link to the patchset: https://codereview.chromium.org/1004373002/#ps200001 (title: "Merge GetFaviconBitmapLastRequestedtime into GetFaviconBitmap.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1004373002/200001
Message was sent while issue was closed.
Committed patchset #7 (id:200001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/982ef2c11eb828b3c3728d1e42784d4b3056b251 Cr-Commit-Position: refs/heads/master@{#323176} |