|
|
Created:
3 years, 10 months ago by qinmin Modified:
3 years, 10 months ago Reviewers:
asanka, David Trainor- moved to gerrit, sky, Steven Holte, xingliu, Scott Hess - ex-Googler CC:
chromium-reviews, grt+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptionadd a download slices table into history download db
To support parallel downloads, a download item may have multiple jobs.
For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work.
This CL adds a new table to store the slice information.
This change also increases the history db version by 1
BUG=644352
Review-Url: https://codereview.chromium.org/2665243003
Cr-Commit-Position: refs/heads/master@{#450606}
Committed: https://chromium.googlesource.com/chromium/src/+/b2925e8fc7cacf09215b6c9c12b975fb7a9bdee5
Patch Set 1 #
Total comments: 2
Patch Set 2 : increase history db version #
Total comments: 18
Patch Set 3 : addressing comments #
Total comments: 26
Patch Set 4 : addressing comments #
Total comments: 4
Patch Set 5 : use Select changes() instead of select count(*) #
Total comments: 22
Patch Set 6 : addressing comments #
Total comments: 29
Patch Set 7 : addressing comments #Patch Set 8 : Simplify DownloadJobInfo #
Total comments: 1
Patch Set 9 : rename download_job_info to download_slice_info #
Total comments: 14
Patch Set 10 : nits #
Total comments: 2
Patch Set 11 : nit #Patch Set 12 : add test file to BUILD.gn #Messages
Total messages: 83 (32 generated)
qinmin@chromium.org changed reviewers: + asanka@chromium.org, dtrainor@chromium.org, sdefresne@chromium.org, xingliu@chromium.org
PTAL
Patchset #1 (id:1) has been deleted
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Is it possible to come up with a scheme where the shard state can be discovered by what's on the disk? It seems reliance on this table will make parallel downloads very fragile. While we keep the primary download information in the history DB, I'm a bit apprehensive about storing shard info there due to how fickle I think shard states are going to be. We are already dealing with issues related to on-disk state diverging from what's kept in downloads history. Adding shard state would exacerbate this. Ultimately regardless of what the history DB has to say about the individual download jobs, we'd need to: * Examine the disk to locate the download job artifacts for the purpose of cleanup. * Examine the on disk artifacts to know how much progress was made by each job. We are unlikely to be able to update the job state for all shards on browser kill / shutdown. So for example, if all the download jobs for download with GUID xxx-yyy-zzz was named: xxx-yyy-zzz.<hhh>.crdownload where <hhh> is the hex encoded start offset, then by listing and stat()ing such files, we'd know the offset and length of each job without needing to worry about whether the history DB is up-to-date. Also, the intended size of the job file is irrelevant. The only thing that matters is how far along the job got before it was killed/interrupted. We don't need to store the state/interrupt reason for these jobs. If the download is not continuable, we'd need to summarily remove all the job artifacts and the primary partial file. The specific reason doesn't affect how we handle reason WRT download jobs. https://codereview.chromium.org/2665243003/diff/20001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/20001/components/history/core... components/history/core/browser/download_database.cc:487: if (IsParallelDownloadingEnabled()) { The DB schema shouldn't depend on a feature state. It's okay to rev up the schema version. Doing so lets us know when we introduced this table and will be important if we want to modify/remove it later. Also, it doesn't matter that the feature isn't available by default on the stable channel.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
Our current plan is to use a single file, not stitching shards together for parallel download. So instead of introducing lots of crdownload files, we just create a big file(UC browser create a file with 60% of its original size), and ask each download job to write to the target offset in that file. This is also the behavior for UC browser: https://docs.google.com/document/d/18KAE4KACcXrnGD860w1Rsvjqk1527YOXAl0HmNyc4... Using small shards can probably save some space, but it will introduce name conflict and disk failure issues when stitching shards together. And a large number of downloads don't have strong validators, so having GUID as intermediate file name seems very wierd. Using a single file can also be problematic if chrome crashes. We have to rollback the received bytes for each job, but that should be OK since most download won't have a lot of browser crashes and losing 10 seconds of data might be ok. (I am working on UMA to check how much extra data we download). On 2017/02/01 22:34:10, asanka wrote: > Is it possible to come up with a scheme where the shard state can be discovered > by what's on the disk? It seems reliance on this table will make parallel > downloads very fragile. > > While we keep the primary download information in the history DB, I'm a bit > apprehensive about storing shard info there due to how fickle I think shard > states are going to be. We are already dealing with issues related to on-disk > state diverging from what's kept in downloads history. Adding shard state would > exacerbate this. Ultimately regardless of what the history DB has to say about > the individual download jobs, we'd need to: > > * Examine the disk to locate the download job artifacts for the purpose of > cleanup. > * Examine the on disk artifacts to know how much progress was made by each job. > We are unlikely to be able to update the job state for all shards on browser > kill / shutdown. > > So for example, if all the download jobs for download with GUID xxx-yyy-zzz was > named: > > xxx-yyy-zzz.<hhh>.crdownload > > where <hhh> is the hex encoded start offset, then by listing and stat()ing such > files, we'd know the offset and length of each job without needing to worry > about whether the history DB is up-to-date. Also, the intended size of the job > file is irrelevant. The only thing that matters is how far along the job got > before it was killed/interrupted. > > We don't need to store the state/interrupt reason for these jobs. If the > download is not continuable, we'd need to summarily remove all the job artifacts > and the primary partial file. The specific reason doesn't affect how we handle > reason WRT download jobs. > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > File components/history/core/browser/download_database.cc (right): > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > components/history/core/browser/download_database.cc:487: if > (IsParallelDownloadingEnabled()) { > The DB schema shouldn't depend on a feature state. > > It's okay to rev up the schema version. Doing so lets us know when we introduced > this table and will be important if we want to modify/remove it later. > > Also, it doesn't matter that the feature isn't available by default on the > stable channel.
https://codereview.chromium.org/2665243003/diff/20001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/20001/components/history/core... components/history/core/browser/download_database.cc:487: if (IsParallelDownloadingEnabled()) { On 2017/02/01 22:34:10, asanka wrote: > The DB schema shouldn't depend on a feature state. > > It's okay to rev up the schema version. Doing so lets us know when we introduced > this table and will be important if we want to modify/remove it later. > > Also, it doesn't matter that the feature isn't available by default on the > stable channel. removed all feature state checking in this file. So the downloads_jobs table will now always be created. But the table and DownloadRow.download_job_info will always be empty until we launch this feature
Description was changed from ========== add an optional download jobs table into history download db To support parallel downloads, a download item may have multiple jobs For each job, we need to store their information so download resumption can work. This CL adds a new table to store the job information. The feature is controlled by a flag and is still WIP. Since the table is optional, no database version change for now. Will increase the history db version once this feature goes to public. BUG=644352 ========== to ========== add an optional download jobs table into history download db To support parallel downloads, a download item may have multiple jobs For each job, we need to store their information so download resumption can work. This CL adds a new table to store the job information. This change also increases the history db version by 1 BUG=644352 ==========
asanka@chromium.org changed reviewers: + sky@chromium.org
https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_constants.h (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_constants.h:8: #include "base/feature_list.h" Not needed. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:323: !GetDB().DoesTableExist("downloads_jobs") && This condition isn't relevant here since neither kSchema nor kUrlChainSchema assume downloads_jobs doesn't exist. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:326: ret = ret && EnsureDownloadJobsTableExists(); Roll EnsureDownloadJobsTableExists here for consistency. Or pull out downloads_url_chain initialization into a separate function. Either is fine. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:507: if (!base::ContainsKey(info_map, id)) Might as well front load this check. There's no point populating the rest of DownloadJobInfo if this condition is false. Could you also add metrics to measure how often this happens? https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:758: "id INTEGER NOT NULL," // downloads.id. Probably call this |download_id| since it doesn't uniquely identify a record in this table. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:766: "interrupt_reason INTEGER NOT NULL, " // DownloadInterruptReason How do you plan to use |state| and |interrupt_reason| ? https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:785: job_exists = statement_query.ColumnInt(0) > 0; Separate UpdateDownloadJob and CreateDownloadJob. An overwhelming majority of the time we are just trying to update an existing download job (possibly tens of times a second). It's much more efficient to deal with missing |downloads_jobs| entries by handling the failure of the "UPDATE" query instead of doing a "SELECT count(*)" before every update. Also can't new download jobs be created at any point in the lifetime of a download? The |is_new_download| condition doesn't really capture the set of cases where we don't expect the job to already exist in the DB. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:786: // There should not be any jobs in downloads_jobs for this id. If there This doesn't really follow from the (!is_new_download && job_exists) case. A download job created at any point of the download's lifetime should assert that there isn't a conflicting download job in the table. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.h:32: static bool IsParallelDownloadingEnabled(); Not needed.
On 2017/02/01 at 23:07:10, qinmin wrote: > Our current plan is to use a single file, not stitching shards together for parallel download. > So instead of introducing lots of crdownload files, we just create a big file(UC browser create a file with 60% of its original size), and ask each download job to write to the target offset in that file. > This is also the behavior for UC browser: https://docs.google.com/document/d/18KAE4KACcXrnGD860w1Rsvjqk1527YOXAl0HmNyc4... This isn't a feature where compatibility is an issue, so let's evaluate technical decisions on their own merits. Can you double check the frequency of database updates that this change is going to introduce? It's probably fine, but I've grown a bit apprehensive about relying on DB operations being fast. The 10 second delay in transaction commit may not be your problem if DB updates are scheduled with a increasingly longer delay as we go. We don't want to overwhelm the DB thread if the DB is running slow, so it's worth checking with folks about whether a sustained stream of updates at a high frequency is OK. > Using small shards can probably save some space, but it will introduce name conflict and disk failure issues when stitching shards together. And a large number of downloads don't have strong validators, so having GUID as intermediate file name seems very wierd. If GUIDs start colliding we are going to have other problems :) My comment was merely a strawman, but the use of GUID in the name was so that the filename + stat() will immediately tell us exactly which download a shard file corresponds to, its starting offset, and its length (so far). Its intended length is irrelevant to intermediate state. I understand that some things are easier said than done, so ultimately ease of implementation is something I can't comment on. So I'm okay with your decision either way. > Using a single file can also be problematic if chrome crashes. We have to rollback the received bytes for each job, but that should be OK since most download won't have a lot of browser crashes and losing 10 seconds of data might be ok. (I am working on UMA to check how much extra data we download). > > > > > On 2017/02/01 22:34:10, asanka wrote: > > Is it possible to come up with a scheme where the shard state can be discovered > > by what's on the disk? It seems reliance on this table will make parallel > > downloads very fragile. > > > > While we keep the primary download information in the history DB, I'm a bit > > apprehensive about storing shard info there due to how fickle I think shard > > states are going to be. We are already dealing with issues related to on-disk > > state diverging from what's kept in downloads history. Adding shard state would > > exacerbate this. Ultimately regardless of what the history DB has to say about > > the individual download jobs, we'd need to: > > > > * Examine the disk to locate the download job artifacts for the purpose of > > cleanup. > > * Examine the on disk artifacts to know how much progress was made by each job. > > We are unlikely to be able to update the job state for all shards on browser > > kill / shutdown. > > > > So for example, if all the download jobs for download with GUID xxx-yyy-zzz was > > named: > > > > xxx-yyy-zzz.<hhh>.crdownload > > > > where <hhh> is the hex encoded start offset, then by listing and stat()ing such > > files, we'd know the offset and length of each job without needing to worry > > about whether the history DB is up-to-date. Also, the intended size of the job > > file is irrelevant. The only thing that matters is how far along the job got > > before it was killed/interrupted. > > > > We don't need to store the state/interrupt reason for these jobs. If the > > download is not continuable, we'd need to summarily remove all the job artifacts > > and the primary partial file. The specific reason doesn't affect how we handle > > reason WRT download jobs. > > > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > > File components/history/core/browser/download_database.cc (right): > > > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > > components/history/core/browser/download_database.cc:487: if > > (IsParallelDownloadingEnabled()) { > > The DB schema shouldn't depend on a feature state. > > > > It's okay to rev up the schema version. Doing so lets us know when we introduced > > this table and will be important if we want to modify/remove it later. > > > > Also, it doesn't matter that the feature isn't available by default on the > > stable channel.
+sky: for /components/history/
https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_constants.h (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_constants.h:8: #include "base/feature_list.h" On 2017/02/02 19:06:27, asanka wrote: > Not needed. Done. removed https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:323: !GetDB().DoesTableExist("downloads_jobs") && On 2017/02/02 19:06:27, asanka wrote: > This condition isn't relevant here since neither kSchema nor kUrlChainSchema > assume downloads_jobs doesn't exist. Done. removed, it just feels wierd that download_jobs table exists even if both downloads and downloads_url_chain don't exist. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:326: ret = ret && EnsureDownloadJobsTableExists(); On 2017/02/02 19:06:27, asanka wrote: > Roll EnsureDownloadJobsTableExists here for consistency. Or pull out > downloads_url_chain initialization into a separate function. Either is fine. Done. Moved all the code here https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:507: if (!base::ContainsKey(info_map, id)) On 2017/02/02 19:06:27, asanka wrote: > Might as well front load this check. There's no point populating the rest of > DownloadJobInfo if this condition is false. > > Could you also add metrics to measure how often this happens? Done. Added a new UMA entry https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:758: "id INTEGER NOT NULL," // downloads.id. On 2017/02/02 19:06:27, asanka wrote: > Probably call this |download_id| since it doesn't uniquely identify a record in > this table. Done. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:766: "interrupt_reason INTEGER NOT NULL, " // DownloadInterruptReason On 2017/02/02 19:06:27, asanka wrote: > How do you plan to use |state| and |interrupt_reason| ? I was thinking whether we should keep the state for later resumption purpose. For example, if a download has 2 jobs, and the 2nd job is interrupted with a SERVER_FORBIDDEN reason and the received byte is 0 , we probably don't want to create the 2nd job when resuming the download later on. Of course we can always create all the jobs on resumption and let them fail, but I am just wondering whether we should avoid doing that. Having those informations around could always help up to change the logic later on, and also have some UMAs to study whether creating multiple jobs are desired for certain interrupt reasons. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:785: job_exists = statement_query.ColumnInt(0) > 0; On 2017/02/02 19:06:28, asanka wrote: > Separate UpdateDownloadJob and CreateDownloadJob. > > An overwhelming majority of the time we are just trying to update an existing > download job (possibly tens of times a second). It's much more efficient to deal > with missing |downloads_jobs| entries by handling the failure of the "UPDATE" > query instead of doing a "SELECT count(*)" before every update. > > Also can't new download jobs be created at any point in the lifetime of a > download? The |is_new_download| condition doesn't really capture the set of > cases where we don't expect the job to already exist in the DB. Done. Separated UpdateDownloadJob from CreateDownloadJob. Once the sql in UpdateDownloadJob fails, it will call CreateDownloadJob to create a new one https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.cc:786: // There should not be any jobs in downloads_jobs for this id. If there On 2017/02/02 19:06:27, asanka wrote: > This doesn't really follow from the (!is_new_download && job_exists) case. A > download job created at any point of the download's lifetime should assert that > there isn't a conflicting download job in the table. Done. https://codereview.chromium.org/2665243003/diff/40001/components/history/core... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/40001/components/history/core... components/history/core/browser/download_database.h:32: static bool IsParallelDownloadingEnabled(); On 2017/02/02 19:06:28, asanka wrote: > Not needed. Done. Removed
On 2017/02/02 19:29:08, asanka wrote: > On 2017/02/01 at 23:07:10, qinmin wrote: > > Our current plan is to use a single file, not stitching shards together for > parallel download. > > So instead of introducing lots of crdownload files, we just create a big > file(UC browser create a file with 60% of its original size), and ask each > download job to write to the target offset in that file. > > This is also the behavior for UC browser: > https://docs.google.com/document/d/18KAE4KACcXrnGD860w1Rsvjqk1527YOXAl0HmNyc4... > > This isn't a feature where compatibility is an issue, so let's evaluate > technical decisions on their own merits. > > Can you double check the frequency of database updates that this change is going > to introduce? It's probably fine, but I've grown a bit apprehensive about > relying on DB operations being fast. The 10 second delay in transaction commit > may not be your problem if DB updates are scheduled with a increasingly longer > delay as we go. We don't want to overwhelm the DB thread if the DB is running > slow, so it's worth checking with folks about whether a sustained stream of > updates at a high frequency is OK. I captured some trace of this on an Android One device. Since we don't have the parallel downloading implementation yet, I created 4 separate download to simulate that 4 download jobs are running at the same time. The result looks ok. We have about 5 db updates each second, and each updates takes an average of 0.3 ms, the history thread is mostly idle, so I think we will not be impacted by this change later on when we start supporting parallel download. > > > Using small shards can probably save some space, but it will introduce name > conflict and disk failure issues when stitching shards together. And a large > number of downloads don't have strong validators, so having GUID as intermediate > file name seems very wierd. > > If GUIDs start colliding we are going to have other problems :) My comment was > merely a strawman, but the use of GUID in the name was so that the filename + > stat() will immediately tell us exactly which download a shard file corresponds > to, its starting offset, and its length (so far). Its intended length is > irrelevant to intermediate state. > > I understand that some things are easier said than done, so ultimately ease of > implementation is something I can't comment on. So I'm okay with your decision > either way. > > > Using a single file can also be problematic if chrome crashes. We have to > rollback the received bytes for each job, but that should be OK since most > download won't have a lot of browser crashes and losing 10 seconds of data might > be ok. (I am working on UMA to check how much extra data we download). > > > > > > > > > > On 2017/02/01 22:34:10, asanka wrote: > > > Is it possible to come up with a scheme where the shard state can be > discovered > > > by what's on the disk? It seems reliance on this table will make parallel > > > downloads very fragile. > > > > > > While we keep the primary download information in the history DB, I'm a bit > > > apprehensive about storing shard info there due to how fickle I think shard > > > states are going to be. We are already dealing with issues related to > on-disk > > > state diverging from what's kept in downloads history. Adding shard state > would > > > exacerbate this. Ultimately regardless of what the history DB has to say > about > > > the individual download jobs, we'd need to: > > > > > > * Examine the disk to locate the download job artifacts for the purpose of > > > cleanup. > > > * Examine the on disk artifacts to know how much progress was made by each > job. > > > We are unlikely to be able to update the job state for all shards on browser > > > kill / shutdown. > > > > > > So for example, if all the download jobs for download with GUID xxx-yyy-zzz > was > > > named: > > > > > > xxx-yyy-zzz.<hhh>.crdownload > > > > > > where <hhh> is the hex encoded start offset, then by listing and stat()ing > such > > > files, we'd know the offset and length of each job without needing to worry > > > about whether the history DB is up-to-date. Also, the intended size of the > job > > > file is irrelevant. The only thing that matters is how far along the job got > > > before it was killed/interrupted. > > > > > > We don't need to store the state/interrupt reason for these jobs. If the > > > download is not continuable, we'd need to summarily remove all the job > artifacts > > > and the primary partial file. The specific reason doesn't affect how we > handle > > > reason WRT download jobs. > > > > > > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > > > File components/history/core/browser/download_database.cc (right): > > > > > > > https://codereview.chromium.org/2665243003/diff/20001/components/history/core... > > > components/history/core/browser/download_database.cc:487: if > > > (IsParallelDownloadingEnabled()) { > > > The DB schema shouldn't depend on a feature state. > > > > > > It's okay to rev up the schema version. Doing so lets us know when we > introduced > > > this table and will be important if we want to modify/remove it later. > > > > > > Also, it doesn't matter that the feature isn't available by default on the > > > stable channel.
https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:324: "state INTEGER NOT NULL," // To be determined. Is this comment right? It seems like you're using it. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:493: sql::Statement statement_download_job(GetDB().GetCachedStatement( wnloadDatabase::QueryDownload is incredibly long. Can you move querying the jobs into its own function? QueryDownload can still call it, but it's easier to reason about smaller functions. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:504: continue; Wouldn't this indicate corruption? Should we log it? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:596: return false; Is it possible to get test coverage of this code path? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:616: sql::Statement statement_download_job(GetDB().GetCachedStatement( Is there test coverage of this code path? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:751: RemoveDownloadJobs(id); Is there test coverage of this? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.h:17: #include "components/history/core/browser/download_job_info.h" Forward declare DownloadJobInfo? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_job_info.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:1: // Copyright (c) 2017 The Chromium Authors. All rights reserved. no (c) https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:12: : download_id(0), kInvalidDownloadId? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:16: state(DownloadState::IN_PROGRESS), You sure you don't want interrupted? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.h:17: // multiple download job associated with it, with each job has its own 'download job' -> 'download jobs' 'job has' -> ;job having' https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.h:31: bool operator==(const DownloadJobInfo&) const; Do you have test coverage of this? https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/history_backend_db_unittest.cc:1019: id, 0, 0, 1000, received, DownloadState::INTERRUPTED, Can you change the two zeros to non-zero values that aren't identical? Better to verify the two aren't transposed in code and/or using a default of 0.
https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:324: "state INTEGER NOT NULL," // To be determined. On 2017/02/03 16:02:33, sky wrote: > Is this comment right? It seems like you're using it. Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:493: sql::Statement statement_download_job(GetDB().GetCachedStatement( On 2017/02/03 16:02:33, sky wrote: > wnloadDatabase::QueryDownload is incredibly long. Can you move querying the jobs > into its own function? QueryDownload can still call it, but it's easier to > reason about smaller functions. Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:504: continue; On 2017/02/03 16:02:33, sky wrote: > Wouldn't this indicate corruption? Should we log it? Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:596: return false; On 2017/02/03 16:02:33, sky wrote: > Is it possible to get test coverage of this code path? Done. Added HistoryBackendDBTest.UpdateDownloadWithNewJob() test https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:616: sql::Statement statement_download_job(GetDB().GetCachedStatement( On 2017/02/03 16:02:33, sky wrote: > Is there test coverage of this code path? Done. HistoryBackendDBTest.ConfirmDownloadInProgressCleanup() now tests this https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.cc:751: RemoveDownloadJobs(id); On 2017/02/03 16:02:33, sky wrote: > Is there test coverage of this? Done. HistoryBackendDBTest.ConfirmDownloadRowCreateAndDelete() now tests this https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_database.h:17: #include "components/history/core/browser/download_job_info.h" On 2017/02/03 16:02:33, sky wrote: > Forward declare DownloadJobInfo? Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_job_info.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:1: // Copyright (c) 2017 The Chromium Authors. All rights reserved. On 2017/02/03 16:02:34, sky wrote: > no (c) Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:12: : download_id(0), On 2017/02/03 16:02:34, sky wrote: > kInvalidDownloadId? Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.cc:16: state(DownloadState::IN_PROGRESS), On 2017/02/03 16:02:34, sky wrote: > You sure you don't want interrupted? When a job is created, it should always in the state of IN_PROGRESS. So using IN_PROGRESS is justified. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.h:17: // multiple download job associated with it, with each job has its own On 2017/02/03 16:02:34, sky wrote: > 'download job' -> 'download jobs' > 'job has' -> ;job having' Done. https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/download_job_info.h:31: bool operator==(const DownloadJobInfo&) const; On 2017/02/03 16:02:34, sky wrote: > Do you have test coverage of this? added download_job_info_unittest.cc to test this https://codereview.chromium.org/2665243003/diff/60001/components/history/core... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/60001/components/history/core... components/history/core/browser/history_backend_db_unittest.cc:1019: id, 0, 0, 1000, received, DownloadState::INTERRUPTED, On 2017/02/03 16:02:34, sky wrote: > Can you change the two zeros to non-zero values that aren't identical? Better to > verify the two aren't transposed in code and/or using a default of 0. Done. https://codereview.chromium.org/2665243003/diff/80001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/80001/components/history/core... components/history/core/browser/download_database.cc:760: sql::Statement statement_query(GetDB().GetCachedStatement( Unfortunately, update statement always return true. So I have to use a separate select count(*) statement here. Given that we don't have too much db updates per second (only 5 updates if 4 jobs are created), this shouldn't cause too much issue. Or a more efficient approach would be ask HistoryService to specifically call CreateDowloadJob() whenever a download job is added. So later calls to UpdateDownload() will no longer need to check if the job exists or not, wdyt?
sky@chromium.org changed reviewers: + shess@chromium.org
+shess for following question https://codereview.chromium.org/2665243003/diff/80001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/80001/components/history/core... components/history/core/browser/download_database.cc:760: sql::Statement statement_query(GetDB().GetCachedStatement( On 2017/02/04 00:06:57, qinmin wrote: > Unfortunately, update statement always return true. So I have to use a separate > select count(*) statement here. Given that we don't have too much db updates per > second (only 5 updates if 4 jobs are created), this shouldn't cause too much > issue. > Or a more efficient approach would be ask HistoryService to specifically call > CreateDowloadJob() whenever a download job is added. So later calls to > UpdateDownload() will no longer need to check if the job exists or not, wdyt? http://stackoverflow.com/questions/15277373/sqlite-upsert-update-or-insert mentions INSERT OR IGNORE. But I'm not sure what is best. Adding shess for his thoughts.
https://codereview.chromium.org/2665243003/diff/80001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/80001/components/history/core... components/history/core/browser/download_database.cc:760: sql::Statement statement_query(GetDB().GetCachedStatement( On 2017/02/06 16:29:04, sky wrote: > On 2017/02/04 00:06:57, qinmin wrote: > > Unfortunately, update statement always return true. So I have to use a > separate > > select count(*) statement here. Given that we don't have too much db updates > per > > second (only 5 updates if 4 jobs are created), this shouldn't cause too much > > issue. > > Or a more efficient approach would be ask HistoryService to specifically call > > CreateDowloadJob() whenever a download job is added. So later calls to > > UpdateDownload() will no longer need to check if the job exists or not, wdyt? > > http://stackoverflow.com/questions/15277373/sqlite-upsert-update-or-insert > mentions INSERT OR IGNORE. But I'm not sure what is best. Adding shess for his > thoughts. INSERT OR IGNORE doesn't improve the performance of update because we will have to call INSERT OR IGNORE first, then call UPDATE. So each UPDATE now becomes 2 operation. What we really need is INSERT OR UPDATE statement, but that is missing in sqlite. Another alternative is to call "SELECT changes()" after each UPDATE statement, that will return the number of rows that was updated during the last update call. This should be a very efficient call since it doesn't require any query into the database and all the history db operations are on the same thread.
https://codereview.chromium.org/2665243003/diff/80001/components/history/core... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/80001/components/history/core... components/history/core/browser/download_database.cc:760: sql::Statement statement_query(GetDB().GetCachedStatement( On 2017/02/06 16:41:53, qinmin wrote: > On 2017/02/06 16:29:04, sky wrote: > > On 2017/02/04 00:06:57, qinmin wrote: > > > Unfortunately, update statement always return true. So I have to use a > > separate > > > select count(*) statement here. Given that we don't have too much db updates > > per > > > second (only 5 updates if 4 jobs are created), this shouldn't cause too much > > > issue. > > > Or a more efficient approach would be ask HistoryService to specifically > call > > > CreateDowloadJob() whenever a download job is added. So later calls to > > > UpdateDownload() will no longer need to check if the job exists or not, > wdyt? > > > > http://stackoverflow.com/questions/15277373/sqlite-upsert-update-or-insert > > mentions INSERT OR IGNORE. But I'm not sure what is best. Adding shess for his > > thoughts. > > INSERT OR IGNORE doesn't improve the performance of update because we will have > to call INSERT OR IGNORE first, then call UPDATE. So each UPDATE now becomes 2 > operation. What we really need is INSERT OR UPDATE statement, but that is > missing in sqlite. > Another alternative is to call "SELECT changes()" after each UPDATE statement, > that will return the number of rows that was updated during the last update > call. This should be a very efficient call since it doesn't require any query > into the database and all the history db operations are on the same thread. Compared the performance of "SELECT changes" and "SELECT count(*)" through tracing. "SELECT Count(*)" takes about 0.15 ms, while "select changes()" takes an average of 0.05 ms. So select changes() is much more efficient. The UPDATE statement takes about 0.3ms. Uploaded a new patchset to use "select changes()".
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_cronet on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_cron...) ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:341: GetDB().Execute(kJobsSchema)); IMHO this is getting convoluted. Someone coming along later is going to have to decode what this all logically means before they can make their change. I think it would be worthwhile to invest the time now to straighten out the logic of what gets created when. To be honest, I'm not entirely clear the previous code is even correct. The if clause seems to allow the downloads table to exist without requiring a downloads_url_chains table, while the else clause seems to imply that they both should exist. I don't know what to make of that at all, and the comment doesn't help me understand. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:716: RemoveDownloadJobs(id); Are these statements all in a containing transaction? Note that sql::Transaction is basically free if nested. So there's no downside to having extra transactions around. But this code doesn't seem to use them, so I figure there's already a transaction out there, so just wanted to verify. DCHECK(db().transaction_nesting()) would require a transaction... https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:792: std::map<uint32_t, DownloadRow*>* download_row_map) { If you keep the out pointer, you should DCHECK it, because you won't touch it (and generate a NULL deref) if the table contains no data. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:801: // See the comment above about SQLITE lacking unsigned integers. 404 - Comment Not Found That said - why does it need to be unsigned? I would expect that DownloadId would be literally the id in the downloads table, in which case it should mostly be an opaque thing generated by SQLite which you carry from one SQLite query to another SQLite query to reference that row. The higher-level code shouldn't really be making assumptions about structure, only uniqueness. [Though now I see that someone higher-level wants to create the ids ahead of time. Oh, well, hopefully both sources of truth stay in sync.] https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:808: int download_id = IntToDownloadId(signed_id); This should either be DownloadId to match the function you're calling, or uint32_t to match the map you're keying. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:810: // record. The record is still in the database, right? Is there code elsewhere which brings this back in line, or does the record just live unloved forever? I guess the same applies to the invalid download id case above. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.h:126: void RemoveDownloadJobs(uint32_t id); Seems like this should be DownloadId. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.h:130: void QueryDownloadJobs(std::map<uint32_t, DownloadRow*>* download_row_map); This too. Also, you should be able to return a map<>, right? Though I haven't dug around related code to see if that's the style in these parts. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_job_info.h:17: // multiple download jobis associated with it, with each job having its own probably s/jobis/job ids/ or something like. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/history_database.cc (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/history_database.cc:543: } maintain the whitespace line between clauses.
https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:560: !CreateDownloadJob(data.download_job_info[i])) { Oh, as I noted elsewhere, since update is touching all columns (either overwriting them or using them in WHERE clause), you could perhaps use INSERT OR REPLACE and consolidate update/create in one function.
On 2017/02/06 19:25:02, qinmin wrote: > https://codereview.chromium.org/2665243003/diff/80001/components/history/core... > File components/history/core/browser/download_database.cc (right): > > https://codereview.chromium.org/2665243003/diff/80001/components/history/core... > components/history/core/browser/download_database.cc:760: sql::Statement > statement_query(GetDB().GetCachedStatement( > On 2017/02/06 16:41:53, qinmin wrote: > > On 2017/02/06 16:29:04, sky wrote: > > > On 2017/02/04 00:06:57, qinmin wrote: > > > > Unfortunately, update statement always return true. So I have to use a > > > separate > > > > select count(*) statement here. Given that we don't have too much db > updates > > > per > > > > second (only 5 updates if 4 jobs are created), this shouldn't cause too > much > > > > issue. > > > > Or a more efficient approach would be ask HistoryService to specifically > > call > > > > CreateDowloadJob() whenever a download job is added. So later calls to > > > > UpdateDownload() will no longer need to check if the job exists or not, > > wdyt? > > > > > > http://stackoverflow.com/questions/15277373/sqlite-upsert-update-or-insert > > > mentions INSERT OR IGNORE. But I'm not sure what is best. Adding shess for > his > > > thoughts. > > > > INSERT OR IGNORE doesn't improve the performance of update because we will > have > > to call INSERT OR IGNORE first, then call UPDATE. So each UPDATE now becomes 2 > > operation. What we really need is INSERT OR UPDATE statement, but that is > > missing in sqlite. > > Another alternative is to call "SELECT changes()" after each UPDATE statement, > > that will return the number of rows that was updated during the last update > > call. This should be a very efficient call since it doesn't require any query > > into the database and all the history db operations are on the same thread. > > Compared the performance of "SELECT changes" and "SELECT count(*)" through > tracing. "SELECT Count(*)" takes about 0.15 ms, while "select changes()" takes > an average of 0.05 ms. So select changes() is much more efficient. The UPDATE > statement takes about 0.3ms. Uploaded a new patchset to use "select changes()". I wouldn't let micro-benchmarks drive this. The biggest consideration is disk I/O, and whether you write things as "select then insert or update" or "insert then update if fail" or "update then insert if fail" doesn't really matter. Each variant is going to touch the same db pages, so as long as you're in a transaction the cache will keep them all approximately the same cost. I think your best options are: - INSERT OR REPLACE if you're always updating the entire row. You won't find out if the row was pre-existing, but in most cases you don't really need to know that. - INSERT, on fail UPDATE if you expect not to have an existing row. - UPDATE, on fail INSERT if you expect to have an existing row. But, like I said, I don't think there's a strong argument versus SELECT then INSERT or UPDATE, especially if they make things harder to read. WRT the insert stream, I don't think the idleness of the history thread is the best thing to consider. Generating frequent COMMIT calls is not great simply because it can have poor interactions with the overall system, both at the filesystem level, and down in the flash controller (cheap flash controllers can cause horrible results with lots of file I/O and frequent fsync calls). Is this code in the long-running history transaction? If so, it doesn't much matter. If it's not in the long-running history transaction, then I think you shouldn't update more frequently than once every 5s or 10s (or even every 30s or 60s). Obviously there's a tradeoff in terms of duplicate downloading, here, but it seems likely to me that the majority bins are "Downloads happen fast and you hardly need to even track the jobs" and "Downloads absolutely suck and are slow with lots of retries". In the first case, checkpointing frequency hardly matters, in the second case checkpoints hardly demonstrate progress.
On 2017/02/06 21:40:48, Scott Hess wrote: > On 2017/02/06 19:25:02, qinmin wrote: > > > https://codereview.chromium.org/2665243003/diff/80001/components/history/core... > > File components/history/core/browser/download_database.cc (right): > > > > > https://codereview.chromium.org/2665243003/diff/80001/components/history/core... > > components/history/core/browser/download_database.cc:760: sql::Statement > > statement_query(GetDB().GetCachedStatement( > > On 2017/02/06 16:41:53, qinmin wrote: > > > On 2017/02/06 16:29:04, sky wrote: > > > > On 2017/02/04 00:06:57, qinmin wrote: > > > > > Unfortunately, update statement always return true. So I have to use a > > > > separate > > > > > select count(*) statement here. Given that we don't have too much db > > updates > > > > per > > > > > second (only 5 updates if 4 jobs are created), this shouldn't cause too > > much > > > > > issue. > > > > > Or a more efficient approach would be ask HistoryService to specifically > > > call > > > > > CreateDowloadJob() whenever a download job is added. So later calls to > > > > > UpdateDownload() will no longer need to check if the job exists or not, > > > wdyt? > > > > > > > > http://stackoverflow.com/questions/15277373/sqlite-upsert-update-or-insert > > > > mentions INSERT OR IGNORE. But I'm not sure what is best. Adding shess for > > his > > > > thoughts. > > > > > > INSERT OR IGNORE doesn't improve the performance of update because we will > > have > > > to call INSERT OR IGNORE first, then call UPDATE. So each UPDATE now becomes > 2 > > > operation. What we really need is INSERT OR UPDATE statement, but that is > > > missing in sqlite. > > > Another alternative is to call "SELECT changes()" after each UPDATE > statement, > > > that will return the number of rows that was updated during the last update > > > call. This should be a very efficient call since it doesn't require any > query > > > into the database and all the history db operations are on the same thread. > > > > Compared the performance of "SELECT changes" and "SELECT count(*)" through > > tracing. "SELECT Count(*)" takes about 0.15 ms, while "select changes()" takes > > an average of 0.05 ms. So select changes() is much more efficient. The UPDATE > > statement takes about 0.3ms. Uploaded a new patchset to use "select > changes()". > > I wouldn't let micro-benchmarks drive this. The biggest consideration is disk > I/O, and whether you write things as "select then insert or update" or "insert > then update if fail" or "update then insert if fail" doesn't really matter. > Each variant is going to touch the same db pages, so as long as you're in a > transaction the cache will keep them all approximately the same cost. > > I think your best options are: > > - INSERT OR REPLACE if you're always updating the entire row. You won't find > out if the row was pre-existing, but in most cases you don't really need to know > that. > - INSERT, on fail UPDATE if you expect not to have an existing row. > - UPDATE, on fail INSERT if you expect to have an existing row. > > But, like I said, I don't think there's a strong argument versus SELECT then > INSERT or UPDATE, especially if they make things harder to read. > > WRT the insert stream, I don't think the idleness of the history thread is the > best thing to consider. Generating frequent COMMIT calls is not great simply > because it can have poor interactions with the overall system, both at the > filesystem level, and down in the flash controller (cheap flash controllers can > cause horrible results with lots of file I/O and frequent fsync calls). Is this > code in the long-running history transaction? If so, it doesn't much matter. > If it's not in the long-running history transaction, then I think you shouldn't > update more frequently than once every 5s or 10s (or even every 30s or 60s). > Obviously there's a tradeoff in terms of duplicate downloading, here, but it > seems likely to me that the majority bins are "Downloads happen fast and you > hardly need to even track the jobs" and "Downloads absolutely suck and are slow > with lots of retries". In the first case, checkpointing frequency hardly > matters, in the second case checkpoints hardly demonstrate progress. Thanks Scott for the information. Yes, the new code will be part of the long-running history transaction. History db has a commit schedule of every 10 seconds, so this is unlikely to cause a lot of I/O interactions.
https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:341: GetDB().Execute(kJobsSchema)); On 2017/02/06 21:22:16, Scott Hess wrote: > IMHO this is getting convoluted. Someone coming along later is going to have to > decode what this all logically means before they can make their change. I think > it would be worthwhile to invest the time now to straighten out the logic of > what gets created when. > > To be honest, I'm not entirely clear the previous code is even correct. The if > clause seems to allow the downloads table to exist without requiring a > downloads_url_chains table, while the else clause seems to imply that they both > should exist. I don't know what to make of that at all, and the comment doesn't > help me understand. The issue is related to database versions. download_url_chains table is introduced in version 24, so it is possible that the downloads table exists while the downloads_url_chains doesn't. There are migration functions to reintroduce this table when user ugrade from v23 to something >24. I added comments here to describe the logic about downloads_url_chains and downloads_jobs. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:560: !CreateDownloadJob(data.download_job_info[i])) { On 2017/02/06 21:23:51, Scott Hess wrote: > Oh, as I noted elsewhere, since update is touching all columns (either > overwriting them or using them in WHERE clause), you could perhaps use INSERT OR > REPLACE and consolidate update/create in one function. Done. Switched to use REPLACE(short form of INSERT OR REPLACE) and changed the function name to CreateOrUpdateDownloadJob(). https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:716: RemoveDownloadJobs(id); On 2017/02/06 21:22:16, Scott Hess wrote: > Are these statements all in a containing transaction? > > Note that sql::Transaction is basically free if nested. So there's no downside > to having extra transactions around. But this code doesn't seem to use them, so > I figure there's already a transaction out there, so just wanted to verify. > > DCHECK(db().transaction_nesting()) would require a transaction... Yes, all these statement are in a long running transaction. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:792: std::map<uint32_t, DownloadRow*>* download_row_map) { On 2017/02/06 21:22:16, Scott Hess wrote: > If you keep the out pointer, you should DCHECK it, because you won't touch it > (and generate a NULL deref) if the table contains no data. Done. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:801: // See the comment above about SQLITE lacking unsigned integers. Changed the comments to make it make more sense. The comment is just to refer to some sentences above that sqlite ints are signed, so we need to convert it to unsigned DownloadId. On 2017/02/06 21:22:16, Scott Hess wrote: > 404 - Comment Not Found > > That said - why does it need to be unsigned? I would expect that DownloadId > would be literally the id in the downloads table, in which case it should mostly > be an opaque thing generated by SQLite which you carry from one SQLite query to > another SQLite query to reference that row. The higher-level code shouldn't > really be making assumptions about structure, only uniqueness. [Though now I > see that someone higher-level wants to create the ids ahead of time. Oh, well, > hopefully both sources of truth stay in sync.] https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:808: int download_id = IntToDownloadId(signed_id); On 2017/02/06 21:22:16, Scott Hess wrote: > This should either be DownloadId to match the function you're calling, or > uint32_t to match the map you're keying. Done. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.cc:810: // record. On 2017/02/06 21:22:16, Scott Hess wrote: > The record is still in the database, right? Is there code elsewhere which > brings this back in line, or does the record just live unloved forever? > > I guess the same applies to the invalid download id case above. Call RemoveDownloadJobs() to remove the job from db. For the nagative download id above, added a TODO. Fixing that will require changing the RemoveDownloadJobs() definition to support negative ids. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.h:126: void RemoveDownloadJobs(uint32_t id); On 2017/02/06 21:22:16, Scott Hess wrote: > Seems like this should be DownloadId. Done. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_database.h:130: void QueryDownloadJobs(std::map<uint32_t, DownloadRow*>* download_row_map); On 2017/02/06 21:22:16, Scott Hess wrote: > This too. > > Also, you should be able to return a map<>, right? Though I haven't dug around > related code to see if that's the style in these parts. Done changing it to DownloadId. For the return values, we have a "void QueryDownloads(std::vector<DownloadRow>* results)" at the top of this file, so i think it is ok to use the input param for output. This is also more efficient than making a new map as output. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/download_job_info.h:17: // multiple download jobis associated with it, with each job having its own On 2017/02/06 21:22:16, Scott Hess wrote: > probably s/jobis/job ids/ or something like. Done. https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... File components/history/core/browser/history_database.cc (right): https://codereview.chromium.org/2665243003/diff/100001/components/history/cor... components/history/core/browser/history_database.cc:543: } On 2017/02/06 21:22:16, Scott Hess wrote: > maintain the whitespace line between clauses. Done.
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
reviewers, PTAL again. @asanka, I added some comments in DownloadDatabase::InitDownloadTable(), please check if they are correct.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
qinmin@chromium.org changed reviewers: - sdefresne@chromium.org
LGTM % comments https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:331: // If the "downloads" table exist, "download_url_chain" might not be there nit: s/exist/exists/ https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:781: if (signed_id <= static_cast<int64_t>(kInvalidDownloadId)) { Nit: Introduce IsValidDownloadId() or somesuch and move this check there. download_types.h implies that kInvalidDownloadId should be treated as opaque. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:794: DCHECK(found); Nit: remove DCHECK. This is an inconsistency in the DB, not one in the browser. And we've already allowed for the possibility of negative download IDs. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_job_info.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.cc:12: : download_id(kInvalidDownloadId), initialize job_id https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:1049: // Create the DB. nit: remove. here and below. https://codereview.chromium.org/2665243003/diff/120001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2665243003/diff/120001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:12502: + Whether a download can be found for a download job in the DB. Should mention that the histogram is recorded for each download job row each time the database is loaded.
lgtm % nits https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:565: if (!CreateOrUpdateDownloadJob(data.download_job_info[i])) Do we have to clean up any other state we persisted if this fails? https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:758: return statement_replace.Run();; Remove extra ;
lgtm % nits
https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:333: // create it later. Sigh. This is why we can't have nice things. Having this code probe the schema while other code blindly accepts the version isn't great. Does this code have unit tests to verify the schema transitions, so at least we know when a change breaks the world? Also, I don't see any tests specifically targeting this migration. Those tests need to be around so that future changes don't break you. I don't know what the standards are in this code (*), but you should definitely have a test which verifies that the v32->v33 transition works. (*) IMHO the best case is having golden files so that you can reasonably test migration chains and have high fidelity with actual user schema. Synthetic schema are attractive to developers, but if the synthetic schema departs from the actual schema embedded in user profiles, it makes the tests pointless. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:565: if (!CreateOrUpdateDownloadJob(data.download_job_info[i])) On 2017/02/08 07:54:05, David Trainor wrote: > Do we have to clean up any other state we persisted if this fails? Unfortunately, since this is in the long-running transaction, I'm not sure that there is anything in place to do that. But that's intrinsic to the design of this module, I think. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:781: if (signed_id <= static_cast<int64_t>(kInvalidDownloadId)) { On 2017/02/08 03:50:50, asanka wrote: > Nit: Introduce IsValidDownloadId() or somesuch and move this check there. > download_types.h implies that kInvalidDownloadId should be treated as opaque. It might even make sense to have the signature be: bool ConvertIntToDownloadId(int64_t int_version, DownloadId* out) WARN_UNUSED_RESULT; Then callers can't simply assume it all works. That said ... I'm not entirely clear what the use case is, here. Presumably this code is not inserting invalid ids in the first place, so this should never happen. If it does happen ... what's special about this case which deserves special handling, in a module where nothing else does much in the way of sanitization? Incomplete ad-hoc error handling is almost worse than nothing at all, sometimes. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:791: bool found = base::ContainsKey(*download_row_map, download_id); Reading this and the (*download_row_map)[download_id]-> at the bottom makes me wonder if it would make sense to rearrange this code a bit. One option would be to use find() here and save the iterator for there, rather than having separate calls. Another option would be to optimistically create the DownloadJobInfo, the put all of the test-and-set code in one place. Actually, this check also can subsume the invalid-id check (which I assume you expect to never happen and are only coding out of an abundance of caution). Then you could rearrange this block to look like: - Build |info| - find() the row iter - if end(), error handling - else add |info| to the row. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.h:127: void QueryDownloadJobs(std::map<uint32_t, DownloadRow*>* download_row_map); I still want this key to be a DownloadId, but I can see that the overall file isn't consistent in use of DownloadId versus uint32_t, so I'll just encourage a someday-if-you-have-time cleanup. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.h:33: // The id of the download in the database. Not changed by UpdateDownload(). "Not changed by UpdateDownload()" seem irrelevant. It's passed as a const&, so UpdateDownload() can't change it. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.h:37: int job_id; AFAICT, this CL isn't wired in to anything, so I can't really see how you plan to use it, but based in the tuple (start_position, length, received_bytes), I would guess you're going to do something like: - Setup a bunch of sub-jobs for the overall download. - Make progress, updating received_bytes and state. - Finish and remove the sub-jobs. In that case, start_position and length should uniquely identify the job, and, indeed, start_position alone would too. In fact, start_position probably must be unique. So do you need a job_id, or is start_position sufficient? [I don't feel super strongly about this, it's a trivial change to the database space. I'm actually more concerned about whether the data model fits usage well.]
https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:331: // If the "downloads" table exist, "download_url_chain" might not be there On 2017/02/08 03:50:50, asanka wrote: > nit: s/exist/exists/ Done. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:333: // create it later. On 2017/02/08 22:01:39, Scott Hess wrote: > Sigh. This is why we can't have nice things. Having this code probe the schema > while other code blindly accepts the version isn't great. Does this code have > unit tests to verify the schema transitions, so at least we know when a change > breaks the world? > > Also, I don't see any tests specifically targeting this migration. Those tests > need to be around so that future changes don't break you. I don't know what the > standards are in this code (*), but you should definitely have a test which > verifies that the v32->v33 transition works. > > (*) IMHO the best case is having golden files so that you can reasonably test > migration chains and have high fidelity with actual user schema. Synthetic > schema are attractive to developers, but if the synthetic schema departs from > the actual schema embedded in user profiles, it makes the tests pointless. I have added HistoryBackendDBTest.MigrateDownloadsJobsTable() to test the migration from V32->V33. There are also some other tests in that file to test earlier migrations. For example, the introduction of the downloads_url_chain table was tested in HistoryBackendDBTest.MigrateDownloadsReasonPathsAndDangerType. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:565: if (!CreateOrUpdateDownloadJob(data.download_job_info[i])) On 2017/02/08 22:01:39, Scott Hess wrote: > On 2017/02/08 07:54:05, David Trainor wrote: > > Do we have to clean up any other state we persisted if this fails? > > Unfortunately, since this is in the long-running transaction, I'm not sure that > there is anything in place to do that. But that's intrinsic to the design of > this module, I think. If some of job failed to update/create, the worst case will be we have to discard those jobs or getting out-dated information when querying the downloads table. In both cases, the impact is limited (user will probably have to redownload some of the content when resuming the download). https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:758: return statement_replace.Run();; On 2017/02/08 07:54:05, David Trainor wrote: > Remove extra ; Done. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:781: if (signed_id <= static_cast<int64_t>(kInvalidDownloadId)) { On 2017/02/08 22:01:39, Scott Hess wrote: > On 2017/02/08 03:50:50, asanka wrote: > > Nit: Introduce IsValidDownloadId() or somesuch and move this check there. > > download_types.h implies that kInvalidDownloadId should be treated as opaque. > > It might even make sense to have the signature be: > > bool ConvertIntToDownloadId(int64_t int_version, DownloadId* out) > WARN_UNUSED_RESULT; > > Then callers can't simply assume it all works. > > That said ... I'm not entirely clear what the use case is, here. Presumably > this code is not inserting invalid ids in the first place, so this should never > happen. If it does happen ... what's special about this case which deserves > special handling, in a module where nothing else does much in the way of > sanitization? Incomplete ad-hoc error handling is almost worse than nothing at > all, sometimes. Followed Scott's suggestion, changed the IntToDownloadId() to bool ConvertItToDownloadId(int64_t, DownloadId*). If that function returns false here, the downloadId will be equal to kInvalidDownloadId, which should fail subsequent map.find() statement. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:791: bool found = base::ContainsKey(*download_row_map, download_id); On 2017/02/08 22:01:39, Scott Hess wrote: > Reading this and the (*download_row_map)[download_id]-> at the bottom makes me > wonder if it would make sense to rearrange this code a bit. One option would be > to use find() here and save the iterator for there, rather than having separate > calls. Another option would be to optimistically create the DownloadJobInfo, > the put all of the test-and-set code in one place. > > Actually, this check also can subsume the invalid-id check (which I assume you > expect to never happen and are only coding out of an abundance of caution). > Then you could rearrange this block to look like: > - Build |info| > - find() the row iter > - if end(), error handling > - else add |info| to the row. Done. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.cc:794: DCHECK(found); On 2017/02/08 03:50:50, asanka wrote: > Nit: remove DCHECK. This is an inconsistency in the DB, not one in the browser. > And we've already allowed for the possibility of negative download IDs. Done. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_database.h (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_database.h:127: void QueryDownloadJobs(std::map<uint32_t, DownloadRow*>* download_row_map); On 2017/02/08 22:01:39, Scott Hess wrote: > I still want this key to be a DownloadId, but I can see that the overall file > isn't consistent in use of DownloadId versus uint32_t, so I'll just encourage a > someday-if-you-have-time cleanup. Done. Changed all member functions that using uint32_t to DownloadId https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_job_info.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.cc:12: : download_id(kInvalidDownloadId), On 2017/02/08 03:50:50, asanka wrote: > initialize job_id Done. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.h:33: // The id of the download in the database. Not changed by UpdateDownload(). On 2017/02/08 22:01:40, Scott Hess wrote: > "Not changed by UpdateDownload()" seem irrelevant. It's passed as a const&, so > UpdateDownload() can't change it. removed https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.h:37: int job_id; On 2017/02/08 22:01:40, Scott Hess wrote: > AFAICT, this CL isn't wired in to anything, so I can't really see how you plan > to use it, but based in the tuple (start_position, length, received_bytes), I > would guess you're going to do something like: > - Setup a bunch of sub-jobs for the overall download. > - Make progress, updating received_bytes and state. > - Finish and remove the sub-jobs. > > In that case, start_position and length should uniquely identify the job, and, > indeed, start_position alone would too. In fact, start_position probably must > be unique. So do you need a job_id, or is start_position sufficient? > > [I don't feel super strongly about this, it's a trivial change to the database > space. I'm actually more concerned about whether the data model fits usage > well.] Using start position might not be enough. For example, if Chrome always create 4 jobs to evenly handle the data downloading, and the first job finishes 50% while the other 3 jobs didn't make any progress. Suppose browser crashes at that moment, when resuming the download later, we need to update the starting position for all jobs so that the remaining 7/8 data is splitted evenly between the 4 jobs. That will cause us to change the start_position for each job. Using an id to identify the jobs make it easier to achieve such tasks. https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:1049: // Create the DB. On 2017/02/08 03:50:50, asanka wrote: > nit: remove. here and below. Done. https://codereview.chromium.org/2665243003/diff/120001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2665243003/diff/120001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:12502: + Whether a download can be found for a download job in the DB. On 2017/02/08 03:50:50, asanka wrote: > Should mention that the histogram is recorded for each download job row each > time the database is loaded. Done.
qinmin@chromium.org changed reviewers: + holte@chromium.org
+holte for tools/metrics OWNER
histograms lgtm
https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... File components/history/core/browser/download_job_info.h (right): https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... components/history/core/browser/download_job_info.h:37: int job_id; On 2017/02/09 at 01:01:42, qinmin wrote: > On 2017/02/08 22:01:40, Scott Hess wrote: > > AFAICT, this CL isn't wired in to anything, so I can't really see how you plan > > to use it, but based in the tuple (start_position, length, received_bytes), I > > would guess you're going to do something like: > > - Setup a bunch of sub-jobs for the overall download. > > - Make progress, updating received_bytes and state. > > - Finish and remove the sub-jobs. > > > > In that case, start_position and length should uniquely identify the job, and, > > indeed, start_position alone would too. In fact, start_position probably must > > be unique. So do you need a job_id, or is start_position sufficient? > > > > [I don't feel super strongly about this, it's a trivial change to the database > > space. I'm actually more concerned about whether the data model fits usage > > well.] > > Using start position might not be enough. For example, if Chrome always create 4 jobs to evenly handle the data downloading, and the first job finishes 50% while the other 3 jobs didn't make any progress. Suppose browser crashes at that moment, when resuming the download later, we need to update the starting position for all jobs so that the remaining 7/8 data is splitted evenly between the 4 jobs. That will cause us to change the start_position for each job. Using an id to identify the jobs make it easier to achieve such tasks. I think it's worth fleshing out how you are going to use it. All you really need is a list of slices of the file that have been successfully downloaded. Those slices can be uniquely identified by download_id and offset as shess mentioned. The additional bit you need is the count of bytes downloaded from that offset. The list of download jobs is an ephemeral thing that each download session needs to figure out based on the distribution of the holes in the file and network conditions. I don't think persisting the download jobs to disk is going to help.
On 2017/02/09 02:19:47, asanka wrote: > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > File components/history/core/browser/download_job_info.h (right): > > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > components/history/core/browser/download_job_info.h:37: int job_id; > On 2017/02/09 at 01:01:42, qinmin wrote: > > On 2017/02/08 22:01:40, Scott Hess wrote: > > > AFAICT, this CL isn't wired in to anything, so I can't really see how you > plan > > > to use it, but based in the tuple (start_position, length, received_bytes), > I > > > would guess you're going to do something like: > > > - Setup a bunch of sub-jobs for the overall download. > > > - Make progress, updating received_bytes and state. > > > - Finish and remove the sub-jobs. > > > > > > In that case, start_position and length should uniquely identify the job, > and, > > > indeed, start_position alone would too. In fact, start_position probably > must > > > be unique. So do you need a job_id, or is start_position sufficient? > > > > > > [I don't feel super strongly about this, it's a trivial change to the > database > > > space. I'm actually more concerned about whether the data model fits usage > > > well.] > > > > Using start position might not be enough. For example, if Chrome always create > 4 jobs to evenly handle the data downloading, and the first job finishes 50% > while the other 3 jobs didn't make any progress. Suppose browser crashes at > that moment, when resuming the download later, we need to update the starting > position for all jobs so that the remaining 7/8 data is splitted evenly between > the 4 jobs. That will cause us to change the start_position for each job. Using > an id to identify the jobs make it easier to achieve such tasks. > > > I think it's worth fleshing out how you are going to use it. > > All you really need is a list of slices of the file that have been successfully > downloaded. Those slices can be uniquely identified by download_id and offset as > shess mentioned. The additional bit you need is the count of bytes downloaded > from that offset. > > The list of download jobs is an ephemeral thing that each download session needs > to figure out based on the distribution of the holes in the file and network > conditions. I don't think persisting the download jobs to disk is going to help. Yes, each download session is ephemeral, but that doesn't mean download_id and offset(bytes downloaded) can uniquely identify such a session. The problem is that if a download session is created with offset x, and length y, the session can immediately got killed due to a browser crash. When user resumes that download, the information(x, y) of that session is loaded into memory. Unfortunately, Chrome might recreate another download session with same offset x and length y based on the current downloaded pieces. So we need either another id to uniquely identify the 2 different sessions, or reuse the previous session. That's why a job_id might be helpful. Anyway, I think querying the download jobs table by ordering on job_id can give us a lot of information on how each piece is created over the time. This can help us debug potential incorrect logics in session creation later on.
On 2017/02/09 at 06:58:11, qinmin wrote: > On 2017/02/09 02:19:47, asanka wrote: > > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > > File components/history/core/browser/download_job_info.h (right): > > > > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > > components/history/core/browser/download_job_info.h:37: int job_id; > > On 2017/02/09 at 01:01:42, qinmin wrote: > > > On 2017/02/08 22:01:40, Scott Hess wrote: > > > > AFAICT, this CL isn't wired in to anything, so I can't really see how you > > plan > > > > to use it, but based in the tuple (start_position, length, received_bytes), > > I > > > > would guess you're going to do something like: > > > > - Setup a bunch of sub-jobs for the overall download. > > > > - Make progress, updating received_bytes and state. > > > > - Finish and remove the sub-jobs. > > > > > > > > In that case, start_position and length should uniquely identify the job, > > and, > > > > indeed, start_position alone would too. In fact, start_position probably > > must > > > > be unique. So do you need a job_id, or is start_position sufficient? > > > > > > > > [I don't feel super strongly about this, it's a trivial change to the > > database > > > > space. I'm actually more concerned about whether the data model fits usage > > > > well.] > > > > > > Using start position might not be enough. For example, if Chrome always create > > 4 jobs to evenly handle the data downloading, and the first job finishes 50% > > while the other 3 jobs didn't make any progress. Suppose browser crashes at > > that moment, when resuming the download later, we need to update the starting > > position for all jobs so that the remaining 7/8 data is splitted evenly between > > the 4 jobs. That will cause us to change the start_position for each job. Using > > an id to identify the jobs make it easier to achieve such tasks. > > > > > > I think it's worth fleshing out how you are going to use it. > > > > All you really need is a list of slices of the file that have been successfully > > downloaded. Those slices can be uniquely identified by download_id and offset as > > shess mentioned. The additional bit you need is the count of bytes downloaded > > from that offset. > > > > The list of download jobs is an ephemeral thing that each download session needs > > to figure out based on the distribution of the holes in the file and network > > conditions. I don't think persisting the download jobs to disk is going to help. > > Yes, each download session is ephemeral, but that doesn't mean download_id and offset(bytes downloaded) can uniquely identify such a session. > The problem is that if a download session is created with offset x, and length y, the session can immediately got killed due to a browser crash. > When user resumes that download, the information(x, y) of that session is loaded into memory. > Unfortunately, Chrome might recreate another download session with same offset x and length y based on the current downloaded pieces. > So we need either another id to uniquely identify the 2 different sessions, or reuse the previous session. That's why a job_id might be helpful. Those are problems that we'll have if we store download jobs. My point is that you don't need to store download jobs. :-) The slices -- which you do need to store -- only record what's written to disk, not how they got there. > Anyway, I think querying the download jobs table by ordering on job_id can give us a lot of information on how each piece is created over the time. > This can help us debug potential incorrect logics in session creation later on. But do you need to store it in history for all users? Debugging logic like this is what we use netlogs for since the behavior depends on lots of factors that aren't captured here. Either way, I think you have an implementation plan in mind. Do you have a doc we can look at?
On 2017/02/09 21:17:21, asanka wrote: > On 2017/02/09 at 06:58:11, qinmin wrote: > > On 2017/02/09 02:19:47, asanka wrote: > > > > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > > > File components/history/core/browser/download_job_info.h (right): > > > > > > > https://codereview.chromium.org/2665243003/diff/120001/components/history/cor... > > > components/history/core/browser/download_job_info.h:37: int job_id; > > > On 2017/02/09 at 01:01:42, qinmin wrote: > > > > On 2017/02/08 22:01:40, Scott Hess wrote: > > > > > AFAICT, this CL isn't wired in to anything, so I can't really see how > you > > > plan > > > > > to use it, but based in the tuple (start_position, length, > received_bytes), > > > I > > > > > would guess you're going to do something like: > > > > > - Setup a bunch of sub-jobs for the overall download. > > > > > - Make progress, updating received_bytes and state. > > > > > - Finish and remove the sub-jobs. > > > > > > > > > > In that case, start_position and length should uniquely identify the > job, > > > and, > > > > > indeed, start_position alone would too. In fact, start_position > probably > > > must > > > > > be unique. So do you need a job_id, or is start_position sufficient? > > > > > > > > > > [I don't feel super strongly about this, it's a trivial change to the > > > database > > > > > space. I'm actually more concerned about whether the data model fits > usage > > > > > well.] > > > > > > > > Using start position might not be enough. For example, if Chrome always > create > > > 4 jobs to evenly handle the data downloading, and the first job finishes 50% > > > while the other 3 jobs didn't make any progress. Suppose browser crashes at > > > that moment, when resuming the download later, we need to update the > starting > > > position for all jobs so that the remaining 7/8 data is splitted evenly > between > > > the 4 jobs. That will cause us to change the start_position for each job. > Using > > > an id to identify the jobs make it easier to achieve such tasks. > > > > > > > > > I think it's worth fleshing out how you are going to use it. > > > > > > All you really need is a list of slices of the file that have been > successfully > > > downloaded. Those slices can be uniquely identified by download_id and > offset as > > > shess mentioned. The additional bit you need is the count of bytes > downloaded > > > from that offset. > > > > > > The list of download jobs is an ephemeral thing that each download session > needs > > > to figure out based on the distribution of the holes in the file and network > > > conditions. I don't think persisting the download jobs to disk is going to > help. > > > > Yes, each download session is ephemeral, but that doesn't mean download_id > and offset(bytes downloaded) can uniquely identify such a session. > > The problem is that if a download session is created with offset x, and length > y, the session can immediately got killed due to a browser crash. > > When user resumes that download, the information(x, y) of that session is > loaded into memory. > > Unfortunately, Chrome might recreate another download session with same offset > x and length y based on the current downloaded pieces. > > So we need either another id to uniquely identify the 2 different sessions, or > reuse the previous session. That's why a job_id might be helpful. > > Those are problems that we'll have if we store download jobs. My point is that > you don't need to store download jobs. :-) The slices -- which you do need to > store -- only record what's written to disk, not how they got there. > > > Anyway, I think querying the download jobs table by ordering on job_id can > give us a lot of information on how each piece is created over the time. > > This can help us debug potential incorrect logics in session creation later > on. > > But do you need to store it in history for all users? Debugging logic like this > is what we use netlogs for since the behavior depends on lots of factors that > aren't captured here. > > Either way, I think you have an implementation plan in mind. Do you have a doc > we can look at? Only storing the slices info sounds like a reasonable idea. In that case, we only need 3 fields for DownloadJobInfo (id, start_position, bytes_received). I have made the change in the new patchset, and updated the WIP doc here: https://docs.google.com/document/d/1w8jgAs77VqR1QH1LzD-LbExJzSBE6xhtEa_9drdBn... @shess, @sky, PTAL again, thanks!
nit: job => slice https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... components/history/core/browser/download_database.cc:559: for (size_t i = 0; i < data.download_job_info.size(); ++i) { Either with jobs or slices, we'd need to deal with the cases where we discard the intermediate file. This could happen, for example, if the server responds with a precondition failure indicating that the resource has changed. If that happens we'd need to discard the file and start from scratch. The code as written would leave the old slices around. If the browser is restarted, in the new session it would consider parts of the new file as being valid leading to a corrupt download file. It's worth considering that the slices can only mutate in three ways: 1. All slices disappear. 2. New slices are introduced. 3. Existing slices update with a new value for 'received_bytes' 2 and 3 are handled here. We just need to worry about 1. I.e. if there are no slices in data.download_job_info (or download_slice_info), then just nuke the slices matching the download id. Also, consider not writing slices to the DB unless they have a non-zero 'received_bytes'. Those records are just clutter if they turn out to be DOA and can complicate assumption 1 above.
Oh and thank you for dealing with all these comments!
Description was changed from ========== add an optional download jobs table into history download db To support parallel downloads, a download item may have multiple jobs For each job, we need to store their information so download resumption can work. This CL adds a new table to store the job information. This change also increases the history db version by 1 BUG=644352 ========== to ========== add an download slices table into history download db To support parallel downloads, a download item may have multiple jobs. For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work. This CL adds a new table to store the slice information. This change also increases the history db version by 1 BUG=644352 ==========
On 2017/02/10 16:30:30, asanka wrote: > nit: job => slice > > https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... > File components/history/core/browser/download_database.cc (right): > > https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... > components/history/core/browser/download_database.cc:559: for (size_t i = 0; i < > data.download_job_info.size(); ++i) { > Either with jobs or slices, we'd need to deal with the cases where we discard > the intermediate file. This could happen, for example, if the server responds > with a precondition failure indicating that the resource has changed. If that > happens we'd need to discard the file and start from scratch. > > The code as written would leave the old slices around. If the browser is > restarted, in the new session it would consider parts of the new file as being > valid leading to a corrupt download file. > > It's worth considering that the slices can only mutate in three ways: > > 1. All slices disappear. > 2. New slices are introduced. > 3. Existing slices update with a new value for 'received_bytes' > > 2 and 3 are handled here. We just need to worry about 1. I.e. if there are no > slices in data.download_job_info (or download_slice_info), then just nuke the > slices matching the download id. > > Also, consider not writing slices to the DB unless they have a non-zero > 'received_bytes'. Those records are just clutter if they turn out to be DOA and > can complicate assumption 1 above. For 1, we can also pass a download_slice_info vector that all the received_bytes are set 0. In that case, the database should also delete those entries. In the new PS, I have added the logic to delete all the slice information whenever download_slice_info vector is empty. The new PS will also skip inserting a slice into the db if received_bytes is 0.
Description was changed from ========== add an download slices table into history download db To support parallel downloads, a download item may have multiple jobs. For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work. This CL adds a new table to store the slice information. This change also increases the history db version by 1 BUG=644352 ========== to ========== add a download slices table into history download db To support parallel downloads, a download item may have multiple jobs. For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work. This CL adds a new table to store the slice information. This change also increases the history db version by 1 BUG=644352 ==========
PTAL again. Updated the DownloadJobInfo to DownloadSliceInfo, and changed the CL description. The job status are no longer needed, only the slice information needs to be stored. Added new test to test the behavior of deleting all slices when the input is empty. On 2017/02/10 18:29:42, qinmin wrote: > On 2017/02/10 16:30:30, asanka wrote: > > nit: job => slice > > > > > https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... > > File components/history/core/browser/download_database.cc (right): > > > > > https://codereview.chromium.org/2665243003/diff/160001/components/history/cor... > > components/history/core/browser/download_database.cc:559: for (size_t i = 0; i > < > > data.download_job_info.size(); ++i) { > > Either with jobs or slices, we'd need to deal with the cases where we discard > > the intermediate file. This could happen, for example, if the server responds > > with a precondition failure indicating that the resource has changed. If that > > happens we'd need to discard the file and start from scratch. > > > > The code as written would leave the old slices around. If the browser is > > restarted, in the new session it would consider parts of the new file as being > > valid leading to a corrupt download file. > > > > It's worth considering that the slices can only mutate in three ways: > > > > 1. All slices disappear. > > 2. New slices are introduced. > > 3. Existing slices update with a new value for 'received_bytes' > > > > 2 and 3 are handled here. We just need to worry about 1. I.e. if there are no > > slices in data.download_job_info (or download_slice_info), then just nuke the > > slices matching the download id. > > > > Also, consider not writing slices to the DB unless they have a non-zero > > 'received_bytes'. Those records are just clutter if they turn out to be DOA > and > > can complicate assumption 1 above. > > For 1, we can also pass a download_slice_info vector that all the received_bytes > are set 0. > In that case, the database should also delete those entries. > > In the new PS, I have added the logic to delete all the slice information > whenever download_slice_info vector is empty. > The new PS will also skip inserting a slice into the db if received_bytes is 0.
I think my suggestions are all in the "Obvious suggestion, you got this" or "Suggestion, you can ignore" bucket, so LGTM. BUT, and I feel really bad for bringing this up so late in the game, but I don't see anyone in the prior comments asking about why are we laying down persistence data for a performance feature who's effect hasn't really been quantified, yet? I'm not asking this because I have any opinion on the actual feature, but rather because once the data is persisted on disk, we have no control over cleanup, so we have to consider it forever (or at least for 2-3 years). So if this is just groundwork to do an experiment, I wonder if there's any way to get some/most of the data we need without persisting state information to disk? I'm fine with "Yes, we considered that, but couldn't think of a way around it." I don't really need to know the reasons, I just want to make sure you're appropriately reluctant to commit to long-term ownership of this schema change! https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:26: #include "download_slice_info.h" This needs the entire path. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:497: it != info_map.end(); ++it) { This line is showing a whitespace-only change. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:560: RemoveDownloadSlices(data.id); I would rather see this as an if/else where the else contains the for loop, or always call the Remove* call (so it's always remove-and-rewrite rather than overwrite-in-place). I'm not so much worried about efficiency as about having multiple phrasings of the condition being brittle to maintenance. [I guess I would lean towards if/else, because update-in-place is slightly more efficient WRT the index.] https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:772: info.received_bytes = statement_query.ColumnInt64(column++); Of course, now that this has half as many fields, it hardly matters what order things are in! https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:756: } I see that this follows the style in this file, so it's fine, but just FYI, both of these cases mimic the implementation of: ASSERT_TRUE(db.Execute(sql)); Full statements are really only needed for bind parameters or reading off results (other than success). https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:785: ASSERT_TRUE(s2.Run()); This could also be Execute(), unless the style of this file suggests otherwise. https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... File components/test/data/history/history.32.sql (right): https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... components/test/data/history/history.32.sql:11: CREATE TABLE downloads_url_chains (id INTEGER NOT NULL,chain_index INTEGER NOT NULL,url LONGVARCHAR NOT NULL, PRIMARY KEY (id, chain_index) ); Perhaps I am missing something, but why does the migration test use manually-created schema and data, rather than just leveraging this golden file?
Thanks Scott for your thoughts. Based on our experiment with parallel downloading, the results are quite promising in many situations. Yes, we have considered not introducing any schema changes for end users. But based on the user feedbacksand the results we have, we believe this is something we should pursue in the near future. If things really go against us ( very unlikely but there is always the possibility), we can drop this table with a schema change and it won't have much impact on users since download can always restart from the beginning. On 2017/02/10 22:41:12, Scott Hess wrote: > I think my suggestions are all in the "Obvious suggestion, you got this" or > "Suggestion, you can ignore" bucket, so LGTM. > > BUT, and I feel really bad for bringing this up so late in the game, but I don't > see anyone in the prior comments asking about why are we laying down persistence > data for a performance feature who's effect hasn't really been quantified, yet? > I'm not asking this because I have any opinion on the actual feature, but rather > because once the data is persisted on disk, we have no control over cleanup, so > we have to consider it forever (or at least for 2-3 years). So if this is just > groundwork to do an experiment, I wonder if there's any way to get some/most of > the data we need without persisting state information to disk? > > I'm fine with "Yes, we considered that, but couldn't think of a way around it." > I don't really need to know the reasons, I just want to make sure you're > appropriately reluctant to commit to long-term ownership of this schema change! > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > File components/history/core/browser/download_database.cc (right): > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/download_database.cc:26: #include > "download_slice_info.h" > This needs the entire path. > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/download_database.cc:497: it != info_map.end(); > ++it) { > This line is showing a whitespace-only change. > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/download_database.cc:560: > RemoveDownloadSlices(data.id); > I would rather see this as an if/else where the else contains the for loop, or > always call the Remove* call (so it's always remove-and-rewrite rather than > overwrite-in-place). I'm not so much worried about efficiency as about having > multiple phrasings of the condition being brittle to maintenance. > > [I guess I would lean towards if/else, because update-in-place is slightly more > efficient WRT the index.] > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/download_database.cc:772: info.received_bytes = > statement_query.ColumnInt64(column++); > Of course, now that this has half as many fields, it hardly matters what order > things are in! > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > File components/history/core/browser/history_backend_db_unittest.cc (right): > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/history_backend_db_unittest.cc:756: } > I see that this follows the style in this file, so it's fine, but just FYI, both > of these cases mimic the implementation of: > ASSERT_TRUE(db.Execute(sql)); > Full statements are really only needed for bind parameters or reading off > results (other than success). > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > components/history/core/browser/history_backend_db_unittest.cc:785: > ASSERT_TRUE(s2.Run()); > This could also be Execute(), unless the style of this file suggests otherwise. > > https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... > File components/test/data/history/history.32.sql (right): > > https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... > components/test/data/history/history.32.sql:11: CREATE TABLE > downloads_url_chains (id INTEGER NOT NULL,chain_index INTEGER NOT NULL,url > LONGVARCHAR NOT NULL, PRIMARY KEY (id, chain_index) ); > Perhaps I am missing something, but why does the migration test use > manually-created schema and data, rather than just leveraging this golden file?
https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... File components/history/core/browser/download_database.cc (right): https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:26: #include "download_slice_info.h" On 2017/02/10 22:41:11, Scott Hess wrote: > This needs the entire path. Fixed. Strange, the path is there in the previous PS, must accidentally hit the backspace key while working on this. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:497: it != info_map.end(); ++it) { On 2017/02/10 22:41:11, Scott Hess wrote: > This line is showing a whitespace-only change. Done. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/download_database.cc:560: RemoveDownloadSlices(data.id); On 2017/02/10 22:41:12, Scott Hess wrote: > I would rather see this as an if/else where the else contains the for loop, or > always call the Remove* call (so it's always remove-and-rewrite rather than > overwrite-in-place). I'm not so much worried about efficiency as about having > multiple phrasings of the condition being brittle to maintenance. > > [I guess I would lean towards if/else, because update-in-place is slightly more > efficient WRT the index.] Done. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:756: } On 2017/02/10 22:41:12, Scott Hess wrote: > I see that this follows the style in this file, so it's fine, but just FYI, both > of these cases mimic the implementation of: > ASSERT_TRUE(db.Execute(sql)); > Full statements are really only needed for bind parameters or reading off > results (other than success). Switched to use execute. https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:785: ASSERT_TRUE(s2.Run()); On 2017/02/10 22:41:12, Scott Hess wrote: > This could also be Execute(), unless the style of this file suggests otherwise. Done. https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... File components/test/data/history/history.32.sql (right): https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... components/test/data/history/history.32.sql:11: CREATE TABLE downloads_url_chains (id INTEGER NOT NULL,chain_index INTEGER NOT NULL,url LONGVARCHAR NOT NULL, PRIMARY KEY (id, chain_index) ); On 2017/02/10 22:41:12, Scott Hess wrote: > Perhaps I am missing something, but why does the migration test use > manually-created schema and data, rather than just leveraging this golden file? Done. Moved the insert statements here.
Changes LGTM, with a very minor nit that you can either ignore or just change and land it before I look at things again! https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... File components/test/data/history/history.32.sql (right): https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... components/test/data/history/history.32.sql:11: CREATE TABLE downloads_url_chains (id INTEGER NOT NULL,chain_index INTEGER NOT NULL,url LONGVARCHAR NOT NULL, PRIMARY KEY (id, chain_index) ); On 2017/02/11 00:23:54, qinmin wrote: > On 2017/02/10 22:41:12, Scott Hess wrote: > > Perhaps I am missing something, but why does the migration test use > > manually-created schema and data, rather than just leveraging this golden > file? > > Done. Moved the insert statements here. OK, that's great. I was torn about suggesting converting to a golden file, since some of the other code is already using inline insert statements and I dislike inconsistency. But I do think golden files provide a stronger signal for "If you're editing this, you're probably doing it wrong." https://codereview.chromium.org/2665243003/diff/200001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/200001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:763: ASSERT_TRUE(db.Execute(insert_statement.c_str())); Just use a const char[] rather than round-tripping to a std::string.
On 2017/02/11 00:21:45, qinmin wrote: > Thanks Scott for your thoughts. Based on our experiment with parallel > downloading, the results > are quite promising in many situations. Yes, we have considered not introducing > any > schema changes for end users. But based on the user feedbacksand the results we > have, > we believe this is something we should pursue in the near future. If things > really go against us ( > very unlikely but there is always the possibility), we can drop this table with > a schema change > and it won't have much impact on users since download can always restart from > the beginning. > > > On 2017/02/10 22:41:12, Scott Hess wrote: > > I think my suggestions are all in the "Obvious suggestion, you got this" or > > "Suggestion, you can ignore" bucket, so LGTM. > > > > BUT, and I feel really bad for bringing this up so late in the game, but I > don't > > see anyone in the prior comments asking about why are we laying down > persistence > > data for a performance feature who's effect hasn't really been quantified, > yet? > > I'm not asking this because I have any opinion on the actual feature, but > rather > > because once the data is persisted on disk, we have no control over cleanup, > so > > we have to consider it forever (or at least for 2-3 years). So if this is > just > > groundwork to do an experiment, I wonder if there's any way to get some/most > of > > the data we need without persisting state information to disk? > > > > I'm fine with "Yes, we considered that, but couldn't think of a way around > it." > > I don't really need to know the reasons, I just want to make sure you're > > appropriately reluctant to commit to long-term ownership of this schema > change! > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > File components/history/core/browser/download_database.cc (right): > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/download_database.cc:26: #include > > "download_slice_info.h" > > This needs the entire path. > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/download_database.cc:497: it != > info_map.end(); > > ++it) { > > This line is showing a whitespace-only change. > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/download_database.cc:560: > > RemoveDownloadSlices(data.id); > > I would rather see this as an if/else where the else contains the for loop, or > > always call the Remove* call (so it's always remove-and-rewrite rather than > > overwrite-in-place). I'm not so much worried about efficiency as about having > > multiple phrasings of the condition being brittle to maintenance. > > > > [I guess I would lean towards if/else, because update-in-place is slightly > more > > efficient WRT the index.] > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/download_database.cc:772: info.received_bytes > = > > statement_query.ColumnInt64(column++); > > Of course, now that this has half as many fields, it hardly matters what order > > things are in! > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > File components/history/core/browser/history_backend_db_unittest.cc (right): > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/history_backend_db_unittest.cc:756: } > > I see that this follows the style in this file, so it's fine, but just FYI, > both > > of these cases mimic the implementation of: > > ASSERT_TRUE(db.Execute(sql)); > > Full statements are really only needed for bind parameters or reading off > > results (other than success). > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/history/cor... > > components/history/core/browser/history_backend_db_unittest.cc:785: > > ASSERT_TRUE(s2.Run()); > > This could also be Execute(), unless the style of this file suggests > otherwise. > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... > > File components/test/data/history/history.32.sql (right): > > > > > https://codereview.chromium.org/2665243003/diff/180001/components/test/data/h... > > components/test/data/history/history.32.sql:11: CREATE TABLE > > downloads_url_chains (id INTEGER NOT NULL,chain_index INTEGER NOT NULL,url > > LONGVARCHAR NOT NULL, PRIMARY KEY (id, chain_index) ); > > Perhaps I am missing something, but why does the migration test use > > manually-created schema and data, rather than just leveraging this golden > file? OK, thanks for the explainer. Apologies if I seem over-sensitive. Usually the pattern is someone lands something, then goes on to work on something else, then something something and three years later I'm trying to figure things out. So I try to act as a bit of friction up front if I can.
LGTM
The CQ bit was checked by qinmin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
https://codereview.chromium.org/2665243003/diff/200001/components/history/cor... File components/history/core/browser/history_backend_db_unittest.cc (right): https://codereview.chromium.org/2665243003/diff/200001/components/history/cor... components/history/core/browser/history_backend_db_unittest.cc:763: ASSERT_TRUE(db.Execute(insert_statement.c_str())); On 2017/02/11 03:31:12, Scott Hess wrote: > Just use a const char[] rather than round-tripping to a std::string. Done.
The CQ bit was checked by qinmin@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asanka@chromium.org, dtrainor@chromium.org, holte@chromium.org, sky@chromium.org, shess@chromium.org Link to the patchset: https://codereview.chromium.org/2665243003/#ps220001 (title: "nit")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
On 2017/02/14 18:13:51, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) Doh! I seem to recall that iOS has an issue with test data files. I suspect there's either a BUILD.gn stanza which lists the data files, or there's special-purpose code in the tests around iOS (perhaps disabling certain tests on iOS).
On 2017/02/14 at 18:16:43, shess wrote: > On 2017/02/14 18:13:51, commit-bot: I haz the power wrote: > > Try jobs failed on following builders: > > ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, > > http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) > > Doh! > > I seem to recall that iOS has an issue with test data files. I suspect there's either a BUILD.gn stanza which lists the data files, or there's special-purpose code in the tests around iOS (perhaps disabling certain tests on iOS). https://cs.chromium.org/chromium/src/components/history/core/browser/BUILD.gn...
On 2017/02/14 21:31:28, asanka wrote: > On 2017/02/14 at 18:16:43, shess wrote: > > On 2017/02/14 18:13:51, commit-bot: I haz the power wrote: > > > Try jobs failed on following builders: > > > ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, > > > > http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) > > > > Doh! > > > > I seem to recall that iOS has an issue with test data files. I suspect > there's either a BUILD.gn stanza which lists the data files, or there's > special-purpose code in the tests around iOS (perhaps disabling certain tests on > iOS). > > https://cs.chromium.org/chromium/src/components/history/core/browser/BUILD.gn... added the data file to BUILD.gn
lgtm
The CQ bit was checked by qinmin@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asanka@chromium.org, dtrainor@chromium.org, sky@chromium.org, holte@chromium.org Link to the patchset: https://codereview.chromium.org/2665243003/#ps240001 (title: "add test file to BUILD.gn")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 240001, "attempt_start_ts": 1487117228048560, "parent_rev": "d182a1a7bfdd22c1d166df4ecbda86af18ca68e5", "commit_rev": "b2925e8fc7cacf09215b6c9c12b975fb7a9bdee5"}
Message was sent while issue was closed.
Description was changed from ========== add a download slices table into history download db To support parallel downloads, a download item may have multiple jobs. For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work. This CL adds a new table to store the slice information. This change also increases the history db version by 1 BUG=644352 ========== to ========== add a download slices table into history download db To support parallel downloads, a download item may have multiple jobs. For each job, it will write to the target file with a different offset. we need to store the information for each piece of slice so download resumption can work. This CL adds a new table to store the slice information. This change also increases the history db version by 1 BUG=644352 Review-Url: https://codereview.chromium.org/2665243003 Cr-Commit-Position: refs/heads/master@{#450606} Committed: https://chromium.googlesource.com/chromium/src/+/b2925e8fc7cacf09215b6c9c12b9... ==========
Message was sent while issue was closed.
Committed patchset #12 (id:240001) as https://chromium.googlesource.com/chromium/src/+/b2925e8fc7cacf09215b6c9c12b9... |