|
|
Chromium Code Reviews
Description[Sync] Start implementation of migration for USS.
This approach draws data and metadata from the server half of the
directory and injects it into USS as the initial GetUpdates response.
This CL does not actually enable migration in production because
an empty migrator function is passed instead of the real one.
BUG=658002
Committed: https://crrev.com/822cfab0275712dbd7d69673f68a32461b53c1bc
Cr-Commit-Position: refs/heads/master@{#428467}
Patch Set 1 #Patch Set 2 : Fix other tests. #
Total comments: 73
Patch Set 3 : Address comments. #Patch Set 4 : Fix broken test. #Patch Set 5 : Address comments and disable migration. #Patch Set 6 : Rebase. #
Total comments: 6
Messages
Total messages: 50 (34 generated)
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
maxbogue@chromium.org changed reviewers: + pavely@chromium.org, skym@chromium.org
maxbogue@chromium.org changed required reviewers: + pavely@chromium.org
Sky, Pavel, PTAL! I'm not very familiar with the directory so let me know if there's a better way to do any of these things...
This is way more simple than I thought it would be, awesome! https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/model_type_registry.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:147: activation_context->model_type_state.initial_sync_done(); Do we have anything that watches for having metadata but initial_sync_done to false and warns/complains? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:148: bool has_directory_data = directory()->InitialSyncEndedForType(type); So how expensive is this? If all 25 data types create their own read tx here on startup, does that matter at all? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:149: bool needs_migration = !initial_sync_done && has_directory_data; Should we have a finch feature param or something that lets us toggle if we want to attempt to migrate certain data types? Or would we just turn off USS for that type if something went wrong? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:184: if (needs_migration) { Can we get a TODO where you think we'd check to see if we can delete directory data? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:185: if (MigrateDirectoryData(type, user_share_, worker_ptr)) { Seems calling a static method that does lots of work makes it difficult to unit test the registry. (Also would be good to have a test case for this). https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:186: UMA_HISTOGRAM_BOOLEAN("Sync.USSMigrationResult", true); Any other metrics we care about? Time it takes? Memory it uses? Counts of objects? Type that was migrated? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:189: // Nudge for a real initial GetUpdates if migration failed. Can you elaborate on the state we're now in, how bad this is, etc. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/test_entry_factory.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:157: base::Time now = base::Time::Now(); Time::Now() is difficult to test. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:161: syncable::ReadTransaction trans(FROM_HERE, directory_); Why not move the WriteTransaction up higher and re-use it here? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:199: if (specifics.ByteSize() > 0) { How can specifics have the model type marker and yet can be 0 bytes? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:26: bool ExtractSyncEntity(ReadTransaction* trans, Should this be a const ref? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:30: if (read_node.InitByIdLookup(id) != BaseNode::INIT_OK) So why could this fail? How does this handle tombstones? How does this handle encrypted data? Is it possible the directory could be hanging onto some unapplied data that has not reached the model yet? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:41: entity->set_client_defined_unique_tag(entry.GetUniqueClientTag()); There's a lot of fields in Entry and/or SyncEntity that you're not copying (I'm assuming since the worker doesn't read them). Can you put a comment in here explaining this? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:44: // out of Entry. Do we need to special-case them when we ship those types? DCHECK not passwords? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:45: entity->mutable_specifics()->CopyFrom(entry.GetSpecifics()); Can you add a comment for why not ServerSpecifics or BaseServerSpecifics? Also, when we've enabled passphrase encryption, we still store them unencrypted on disk (directory), right? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:56: ReadTransaction trans(FROM_HERE, user_share); So what's the difference between syncer::syncable::ReadTransaction and syncer::ReadTransaction? I don't get it. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:60: LOG(ERROR) << "Missing root node for " << type_name << "."; I would drop the trailing period, don't need correct sentences in logging messages. Also, having a type_name variable kind of increases mental load reading this method. Might want to consider in-lining it in both places it is used. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:67: user_share->directory->GetDownloadProgress(type, &progress); These can't fail? How does ScopdKernelLock interact with transactions? And why does GetDataTypeContext take a trans, but GetDownloadedProgress doesn't? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:72: root.GetChildIds(&child_ids); DCHECK not bookmarks? Does look like this is handling more than one layer of children. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:81: // Process |batch_size| entities at a time to reduce memory usage. I think this is the wrong place for this comment. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:83: while (i < batch_limit && i < child_ids.size()) { Can you use 'for' here? I think it'd make it more clear what's going on. With two while loops that iterate over essentially the same thing, things are confusing who is actually iterating until you get to the i++ part. I think having that be higher/more obvious would be good. Also maybe min? for (; i < std::min(batch_limit, child_ids.size()); i++) { https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:87: return false; Can you comment about why you don't need to strip any data out of the worker. It is potentiality filled with lots of partial directory data. And then we're going to perform an initial download. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:94: worker->ProcessGetUpdatesResponse(progress, context, entity_ptrs, nullptr); It is kind of odd that we update progress/context every time. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:97: worker->PassiveApplyUpdates(nullptr); I'm worried about all the different places responsible for nudging/applying initial updates to the worker. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:51: class UssMigratorTest : public ::testing::Test { Test a failure case? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:96: MigrateDirectoryData(kModelType, user_share(), worker()); Check return value. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:110: EXPECT_FALSE(entity.creation_time.is_null()); Would be good to verify that time type conversions are not messing things up. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:124: MigrateDirectoryData(kModelType, user_share(), worker()); Check return value. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:141: MigrateDirectoryData(kModelType, user_share(), worker(), 2); Check return value. https://codereview.chromium.org/2442583003/diff/20001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2442583003/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:63456: + <summary>Whether a directory to USS migration attempt succeeded.</summary> Whether!!!!!!!
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
PTAL! https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/model_type_registry.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:147: activation_context->model_type_state.initial_sync_done(); On 2016/10/21 15:34:19, skym wrote: > Do we have anything that watches for having metadata but initial_sync_done to > false and warns/complains? Probably not? There could be something in SMTP::OnMetadataLoaded probs. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:148: bool has_directory_data = directory()->InitialSyncEndedForType(type); On 2016/10/21 15:34:19, skym wrote: > So how expensive is this? If all 25 data types create their own read tx here on > startup, does that matter at all? I mean, we might have to wait for the lock? But this is the sync thread so it doesn't matter. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:149: bool needs_migration = !initial_sync_done && has_directory_data; On 2016/10/21 15:34:19, skym wrote: > Should we have a finch feature param or something that lets us toggle if we want > to attempt to migrate certain data types? Or would we just turn off USS for that > type if something went wrong? If something was crashing we'd just turn off USS for the type. If migration is failing a lot we'd probably pause the rollout until we fix it, but since we fall back to a normal download I don't think there would be a benefit to rolling back. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:184: if (needs_migration) { On 2016/10/21 15:34:19, skym wrote: > Can we get a TODO where you think we'd check to see if we can delete directory > data? Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:185: if (MigrateDirectoryData(type, user_share_, worker_ptr)) { On 2016/10/21 15:34:19, skym wrote: > Seems calling a static method that does lots of work makes it difficult to unit > test the registry. (Also would be good to have a test case for this). Done: injected as a callback and test case on the registry added. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:186: UMA_HISTOGRAM_BOOLEAN("Sync.USSMigrationResult", true); On 2016/10/21 15:34:19, skym wrote: > Any other metrics we care about? Time it takes? Memory it uses? Counts of > objects? Type that was migrated? Time should be trivial because everything is in memory. Count of objects doesn't really help us. The type seems important, so I've switched to using that. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:189: // Nudge for a real initial GetUpdates if migration failed. On 2016/10/21 15:34:19, skym wrote: > Can you elaborate on the state we're now in, how bad this is, etc. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/test_entry_factory.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:157: base::Time now = base::Time::Now(); On 2016/10/21 15:34:20, skym wrote: > Time::Now() is difficult to test. Testing that it's not null seems fine. You'd rather I set up some time constants in this class? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:161: syncable::ReadTransaction trans(FROM_HERE, directory_); On 2016/10/21 15:34:20, skym wrote: > Why not move the WriteTransaction up higher and re-use it here? Didn't occur to me that you could use a WriteTransaction to read... done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:199: if (specifics.ByteSize() > 0) { On 2016/10/21 15:34:20, skym wrote: > How can specifics have the model type marker and yet can be 0 bytes? The default specifics already in the entry have the model type marker. This is checking if the new ones passed in are set or not to maintain backward compatibility with other tests. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:26: bool ExtractSyncEntity(ReadTransaction* trans, On 2016/10/21 15:34:20, skym wrote: > Should this be a const ref? ReadNode takes a non-const transaction pointer, so no. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:30: if (read_node.InitByIdLookup(id) != BaseNode::INIT_OK) On 2016/10/21 15:34:20, skym wrote: > So why could this fail? How does this handle tombstones? How does this handle > encrypted data? Is it possible the directory could be hanging onto some > unapplied data that has not reached the model yet? This line is just a failure to look up data. The rest of your comments probably apply below. Tombstones should just have IsDel set to true. Encrypted data... I'm not sure whether it's still encrypted or not in the specifics I'm getting, but it also shouldn't matter right? The worker can decrypt. If data was unapplied we still want to pass it along here, so the USS merge can apply it. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:41: entity->set_client_defined_unique_tag(entry.GetUniqueClientTag()); On 2016/10/21 15:34:20, skym wrote: > There's a lot of fields in Entry and/or SyncEntity that you're not copying (I'm > assuming since the worker doesn't read them). Can you put a comment in here > explaining this? Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:44: // out of Entry. Do we need to special-case them when we ship those types? On 2016/10/21 15:34:20, skym wrote: > DCHECK not passwords? Done in the main function. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:45: entity->mutable_specifics()->CopyFrom(entry.GetSpecifics()); On 2016/10/21 15:34:20, skym wrote: > Can you add a comment for why not ServerSpecifics or BaseServerSpecifics? > > Also, when we've enabled passphrase encryption, we still store them unencrypted > on disk (directory), right? Mmm... I think I mean to do server specifics here, whoops. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:56: ReadTransaction trans(FROM_HERE, user_share); On 2016/10/21 15:34:20, skym wrote: > So what's the difference between syncer::syncable::ReadTransaction and > syncer::ReadTransaction? I don't get it. ¯\_(ツ)_/¯ I also do not quite get it. It might just be a layering thing that my reorganization has completely outdated. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:60: LOG(ERROR) << "Missing root node for " << type_name << "."; On 2016/10/21 15:34:20, skym wrote: > I would drop the trailing period, don't need correct sentences in logging > messages. Also, having a type_name variable kind of increases mental load > reading this method. Might want to consider in-lining it in both places it is > used. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:67: user_share->directory->GetDownloadProgress(type, &progress); On 2016/10/21 15:34:20, skym wrote: > These can't fail? How does ScopdKernelLock interact with transactions? And why > does GetDataTypeContext take a trans, but GetDownloadedProgress doesn't? They don't seem to fail, no. It seems to interact fine since it works; I don't know exactly though. And I don't know; it looks like GetDataTypeContext literally doesn't use the transaction: https://cs.chromium.org/chromium/src/components/sync/syncable/directory.cc?l=912 https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:72: root.GetChildIds(&child_ids); On 2016/10/21 15:34:20, skym wrote: > DCHECK not bookmarks? Does look like this is handling more than one layer of > children. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:81: // Process |batch_size| entities at a time to reduce memory usage. On 2016/10/21 15:34:20, skym wrote: > I think this is the wrong place for this comment. Moved above the loop... hopefully that's better? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:83: while (i < batch_limit && i < child_ids.size()) { On 2016/10/21 15:34:20, skym wrote: > Can you use 'for' here? I think it'd make it more clear what's going on. With > two while loops that iterate over essentially the same thing, things are > confusing who is actually iterating until you get to the i++ part. I think > having that be higher/more obvious would be good. Also maybe min? > > for (; i < std::min(batch_limit, child_ids.size()); i++) { Done. Also I just did the min to set batch_limit :P https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:87: return false; On 2016/10/21 15:34:20, skym wrote: > Can you comment about why you don't need to strip any data out of the worker. It > is potentiality filled with lots of partial directory data. And then we're going > to perform an initial download. I've added an AbortMigration method to the worker that's called in the failure cases here. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:94: worker->ProcessGetUpdatesResponse(progress, context, entity_ptrs, nullptr); On 2016/10/21 15:34:20, skym wrote: > It is kind of odd that we update progress/context every time. Sure is... I don't really feel comfortable changing the interface to fix it though ¯\_(ツ)_/¯ https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:97: worker->PassiveApplyUpdates(nullptr); On 2016/10/21 15:34:20, skym wrote: > I'm worried about all the different places responsible for nudging/applying > initial updates to the worker. All the places? There's just this place and the normal place, isn't there? https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:51: class UssMigratorTest : public ::testing::Test { On 2016/10/21 15:34:21, skym wrote: > Test a failure case? Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:96: MigrateDirectoryData(kModelType, user_share(), worker()); On 2016/10/21 15:34:21, skym wrote: > Check return value. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:110: EXPECT_FALSE(entity.creation_time.is_null()); On 2016/10/21 15:34:21, skym wrote: > Would be good to verify that time type conversions are not messing things up. Seems out of scope for this CL, though I've made a note that our time.h doesn't seem to have unit tests... https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:124: MigrateDirectoryData(kModelType, user_share(), worker()); On 2016/10/21 15:34:21, skym wrote: > Check return value. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:141: MigrateDirectoryData(kModelType, user_share(), worker(), 2); On 2016/10/21 15:34:21, skym wrote: > Check return value. Done. https://codereview.chromium.org/2442583003/diff/20001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2442583003/diff/20001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:63456: + <summary>Whether a directory to USS migration attempt succeeded.</summary> On 2016/10/21 15:34:21, skym wrote: > Whether!!!!!!! IT'S EVERYWHERE
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/model_type_registry.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:149: bool needs_migration = !initial_sync_done && has_directory_data; On 2016/10/24 19:02:17, maxbogue wrote: > On 2016/10/21 15:34:19, skym wrote: > > Should we have a finch feature param or something that lets us toggle if we > want > > to attempt to migrate certain data types? Or would we just turn off USS for > that > > type if something went wrong? > > If something was crashing we'd just turn off USS for the type. If migration is > failing a lot we'd probably pause the rollout until we fix it, but since we fall > back to a normal download I don't think there would be a benefit to rolling > back. Okay, how do we pause the roll out? We don't really have a knob that allows to keep everyone that has migrated to USS migrated, but don't move anyone new forward, right? We'd stop increasing the % for the finch experiment, but there'd still be people migrating as they restarted their clients. It might make more sense to store some data in prefs to remember if this client has been migrated instead of relying solely on finch params. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/test_entry_factory.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:157: base::Time now = base::Time::Now(); On 2016/10/24 19:02:18, maxbogue wrote: > On 2016/10/21 15:34:20, skym wrote: > > Time::Now() is difficult to test. > > Testing that it's not null seems fine. You'd rather I set up some time constants > in this class? Yes, I think I have another comment about wanting them tested. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:199: if (specifics.ByteSize() > 0) { On 2016/10/24 19:02:18, maxbogue wrote: > On 2016/10/21 15:34:20, skym wrote: > > How can specifics have the model type marker and yet can be 0 bytes? > > The default specifics already in the entry have the model type marker. This is > checking if the new ones passed in are set or not to maintain backward > compatibility with other tests. Gotcha. Can you update this comment to mention where the default specifics live. I thought you were referring to the case where |specifics| was not initialized, and thus had default values. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:30: if (read_node.InitByIdLookup(id) != BaseNode::INIT_OK) On 2016/10/24 19:02:18, maxbogue wrote: > On 2016/10/21 15:34:20, skym wrote: > > So why could this fail? How does this handle tombstones? How does this handle > > encrypted data? Is it possible the directory could be hanging onto some > > unapplied data that has not reached the model yet? > > This line is just a failure to look up data. The rest of your comments probably > apply below. Tombstones should just have IsDel set to true. Encrypted data... > I'm not sure whether it's still encrypted or not in the specifics I'm getting, > but it also shouldn't matter right? The worker can decrypt. If data was > unapplied we still want to pass it along here, so the USS merge can apply it. I guess what I was suggesting, poorly, was lets look at https://cs.chromium.org/chromium/src/components/sync/syncable/write_node.cc?q... and think about the reasons we can get into != BaseNode::INIT_OK https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:94: worker->ProcessGetUpdatesResponse(progress, context, entity_ptrs, nullptr); On 2016/10/24 19:02:18, maxbogue wrote: > On 2016/10/21 15:34:20, skym wrote: > > It is kind of odd that we update progress/context every time. > > Sure is... I don't really feel comfortable changing the interface to fix it > though ¯\_(ツ)_/¯ That's reasonable. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:97: worker->PassiveApplyUpdates(nullptr); On 2016/10/24 19:02:18, maxbogue wrote: > On 2016/10/21 15:34:20, skym wrote: > > I'm worried about all the different places responsible for nudging/applying > > initial updates to the worker. > > All the places? There's just this place and the normal place, isn't there? And on migration failure in the worker. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:110: EXPECT_FALSE(entity.creation_time.is_null()); On 2016/10/24 19:02:19, maxbogue wrote: > On 2016/10/21 15:34:21, skym wrote: > > Would be good to verify that time type conversions are not messing things up. > > Seems out of scope for this CL, though I've made a note that our time.h doesn't > seem to have unit tests... I disagree. You are using the time conversions. If your use of them is messed up somehow, things would break. I think it's reasonable for both the code and the unittests to use the same underlying functions, and I agree with you that testing them is out of scope.
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
PTAL. I've updated this CL to no longer actually perform migration until I can address the initial GU issues in a follow-up. This is done by passing an empty migrator function into the registry. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/model_type_registry.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/model_type_registry.cc:149: bool needs_migration = !initial_sync_done && has_directory_data; On 2016/10/25 15:48:11, skym wrote: > On 2016/10/24 19:02:17, maxbogue wrote: > > On 2016/10/21 15:34:19, skym wrote: > > > Should we have a finch feature param or something that lets us toggle if we > > want > > > to attempt to migrate certain data types? Or would we just turn off USS for > > that > > > type if something went wrong? > > > > If something was crashing we'd just turn off USS for the type. If migration is > > failing a lot we'd probably pause the rollout until we fix it, but since we > fall > > back to a normal download I don't think there would be a benefit to rolling > > back. > > Okay, how do we pause the roll out? We don't really have a knob that allows to > keep everyone that has migrated to USS migrated, but don't move anyone new > forward, right? We'd stop increasing the % for the finch experiment, but there'd > still be people migrating as they restarted their clients. > > It might make more sense to store some data in prefs to remember if this client > has been migrated instead of relying solely on finch params. I've added a TODO for this. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/test_entry_factory.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:157: base::Time now = base::Time::Now(); On 2016/10/25 15:48:12, skym wrote: > On 2016/10/24 19:02:18, maxbogue wrote: > > On 2016/10/21 15:34:20, skym wrote: > > > Time::Now() is difficult to test. > > > > Testing that it's not null seems fine. You'd rather I set up some time > constants > > in this class? > > Yes, I think I have another comment about wanting them tested. Acknowledged. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/test_entry_factory.cc:199: if (specifics.ByteSize() > 0) { On 2016/10/25 15:48:11, skym wrote: > On 2016/10/24 19:02:18, maxbogue wrote: > > On 2016/10/21 15:34:20, skym wrote: > > > How can specifics have the model type marker and yet can be 0 bytes? > > > > The default specifics already in the entry have the model type marker. This is > > checking if the new ones passed in are set or not to maintain backward > > compatibility with other tests. > > Gotcha. Can you update this comment to mention where the default specifics live. > I thought you were referring to the case where |specifics| was not initialized, > and thus had default values. Done. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:94: worker->ProcessGetUpdatesResponse(progress, context, entity_ptrs, nullptr); On 2016/10/25 15:48:12, skym wrote: > On 2016/10/24 19:02:18, maxbogue wrote: > > On 2016/10/21 15:34:20, skym wrote: > > > It is kind of odd that we update progress/context every time. > > > > Sure is... I don't really feel comfortable changing the interface to fix it > > though ¯\_(ツ)_/¯ > > That's reasonable. Acknowledged. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator.cc:97: worker->PassiveApplyUpdates(nullptr); On 2016/10/25 15:48:12, skym wrote: > On 2016/10/24 19:02:18, maxbogue wrote: > > On 2016/10/21 15:34:20, skym wrote: > > > I'm worried about all the different places responsible for nudging/applying > > > initial updates to the worker. > > > > All the places? There's just this place and the normal place, isn't there? > > And on migration failure in the worker. Ah right, we nudge there. That's a different problem because the nudge doesn't do anything, so I think for now it's fine to leave it be. https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/20001/components/sync/engine_... components/sync/engine_impl/uss_migrator_unittest.cc:110: EXPECT_FALSE(entity.creation_time.is_null()); On 2016/10/25 15:48:12, skym wrote: > On 2016/10/24 19:02:19, maxbogue wrote: > > On 2016/10/21 15:34:21, skym wrote: > > > Would be good to verify that time type conversions are not messing things > up. > > > > Seems out of scope for this CL, though I've made a note that our time.h > doesn't > > seem to have unit tests... > > I disagree. You are using the time conversions. If your use of them is messed up > somehow, things would break. I think it's reasonable for both the code and the > unittests to use the same underlying functions, and I agree with you that > testing them is out of scope. I added better tests here. I just read the ctime from the directory and check that it equals the ctime and mtime the processor received.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== [Sync] Implement migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. BUG=658002 ========== to ========== [Sync] Start implement of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ==========
Description was changed from ========== [Sync] Start implement of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ========== to ========== [Sync] Start implementation of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by maxbogue@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
lgtm % comment in uss_migrator.cc https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator.cc:49: entity->mutable_specifics()->CopyFrom(entry.GetServerSpecifics()); If you were offline for a while then your local specifics would be newer than server specifics. Taking server specifics for Migration would cause data loss in this case. I would take local specifics. https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator_unittest.cc:100: base::MessageLoop message_loop_; I don't think you need message loop.
https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator.cc:49: entity->mutable_specifics()->CopyFrom(entry.GetServerSpecifics()); On 2016/10/27 23:20:19, pavely wrote: > If you were offline for a while then your local specifics would be newer than > server specifics. > Taking server specifics for Migration would cause data loss in this case. I > would take local specifics. Wouldn't any local changes be in the model type's local storage and need to be migrated separately? The purpose of this migration is to prevent the need for an initial GetUpdates from the server, so I figured having the data align with the progress marker was most important. https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator_unittest.cc:100: base::MessageLoop message_loop_; On 2016/10/27 23:20:19, pavely wrote: > I don't think you need message loop. Sadly, I do: http://pastebin.com/raw/3twtnzck
https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator.cc:49: entity->mutable_specifics()->CopyFrom(entry.GetServerSpecifics()); On 2016/10/28 16:17:41, maxbogue wrote: > On 2016/10/27 23:20:19, pavely wrote: > > If you were offline for a while then your local specifics would be newer than > > server specifics. > > Taking server specifics for Migration would cause data loss in this case. I > > would take local specifics. > > Wouldn't any local changes be in the model type's local storage and need to be > migrated separately? The purpose of this migration is to prevent the need for an > initial GetUpdates from the server, so I figured having the data align with the > progress marker was most important. You are right. I agree. https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... File components/sync/engine_impl/uss_migrator_unittest.cc (right): https://codereview.chromium.org/2442583003/diff/100001/components/sync/engine... components/sync/engine_impl/uss_migrator_unittest.cc:100: base::MessageLoop message_loop_; On 2016/10/28 16:17:41, maxbogue wrote: > On 2016/10/27 23:20:19, pavely wrote: > > I don't think you need message loop. > > Sadly, I do: http://pastebin.com/raw/3twtnzck Got it.
The CQ bit was checked by maxbogue@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
maxbogue@chromium.org changed reviewers: + isherman@chromium.org
+isherman for histograms.xml
metrics lgtm
The CQ bit was checked by maxbogue@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== [Sync] Start implementation of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ========== to ========== [Sync] Start implementation of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001)
Message was sent while issue was closed.
Description was changed from ========== [Sync] Start implementation of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 ========== to ========== [Sync] Start implementation of migration for USS. This approach draws data and metadata from the server half of the directory and injects it into USS as the initial GetUpdates response. This CL does not actually enable migration in production because an empty migrator function is passed instead of the real one. BUG=658002 Committed: https://crrev.com/822cfab0275712dbd7d69673f68a32461b53c1bc Cr-Commit-Position: refs/heads/master@{#428467} ==========
Message was sent while issue was closed.
Patchset 6 (id:??) landed as https://crrev.com/822cfab0275712dbd7d69673f68a32461b53c1bc Cr-Commit-Position: refs/heads/master@{#428467} |
