|
|
Chromium Code Reviews
DescriptionRedirect handling in resource prefetch predictor
We are going to save information about redirects in resource prefetch predictor to use it later and to prefetch resources associated with the end of redirect chain immediately.
BUG=
Patch Set 1 #Patch Set 2 : test #
Total comments: 10
Patch Set 3 : Intermediate redirects aren't tracked #Patch Set 4 : Database for redirects + tests for it #Patch Set 5 : Update some comments #
Total comments: 12
Patch Set 6 : git cl format #Patch Set 7 : Redirects database schema changed #
Total comments: 20
Messages
Total messages: 26 (3 generated)
alexilin@chromium.org changed reviewers: + mattcary@chromium.org
Hi Matt, I've started working on redirect handling in prefetch predictor. I haven't written tests for it yet but it'll be great if you check out the way of collecting data about redirections in resource_prefetch_predictor.cc
A brief comment, I need to spend some time with the code before verifying correctness. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:440: VLOG(1) << "Requested URL: " << request.navigation_id.main_frame_url; These logging messages are probably too verbose for real code and should be removed when you're done. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.h:196: typedef std::map<NavigationID, linked_ptr<std::vector<GURL> > > RedirectsMap; Should use unique_ptr instead, see https://www.chromium.org/developers/smart-pointer-guidelines For bonus points, update other linked_ptr usages to unique_ptr.
A couple more comments. I'll wait until you've finished the CL, or at least the non-test code, before commenting more as it looks like there's a lot in flux. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:1037: const std::vector<std::string>& keys, I wonder if it's necessary to store data on all the redirect steps. In the following I'll use the example of a redirect chain A -> B -> C (so A is the first url and C is the final one). A redirect map entry will be referred to as {url, final_redirect}. If I understand the code correctly, you update both {A, C} and {B, C}. That's correct, but possibly excessive. The intermediate redirects may not be used much in practice; if they are navigated to directly then they'll add their own cache entry. So I propose just updating {A, C}; in other words, LearnRedirects just takes a const string& key and you strip off the outer loop, below. Indeed, with the current method of recording all steps in the chain as keys we kind of inflate things: if a user navigates once to A and another time to B, the redirect B -> C will get counted twice. I think that may make it harder to reason about counts if we have to consider this double-counting. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.h:193: typedef std::map<NavigationID, linked_ptr<std::vector<URLRequestSummary> > > To be consistent with other code, should remove spaces between angle brackets (...URLRequestSummary>>>) https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor_tables.cc (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:475: // TODO: UMA? Yes! https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor_tables.h (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.h:110: typedef std::vector<RedirectStatRow> RedirectStatRows; Would prefer not to typedef the vector. You don't save that much space versus the explicit type, and it obscures things like having a vector in the copyable RedirectData structure. It's a little different than typedef'ing the map type, because that encodes additional type information---eg, what the key is. In other words, there's multiple ways to make a map of PrefetchData, so it's nice to standardize. That's not true with a vector.
Some thoughts about handling intermediate redirects. I'll add information about it to discussion https://docs.google.com/a/google.com/document/d/16xZ_DbaXXJK60_zS5oVZdRXJkOka.... https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:440: VLOG(1) << "Requested URL: " << request.navigation_id.main_frame_url; On 2016/09/09 09:16:11, mattcary wrote: > These logging messages are probably too verbose for real code and should be > removed when you're done. Sure, I added them just to find out what's what. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:1037: const std::vector<std::string>& keys, On 2016/09/09 13:19:21, mattcary wrote: > I wonder if it's necessary to store data on all the redirect steps. In the > following I'll use the example of a redirect chain A -> B -> C (so A is the > first url and C is the final one). A redirect map entry will be referred to as > {url, final_redirect}. > > If I understand the code correctly, you update both {A, C} and {B, C}. That's > correct, but possibly excessive. The intermediate redirects may not be used much > in practice; if they are navigated to directly then they'll add their own cache > entry. So I propose just updating {A, C}; in other words, LearnRedirects just > takes a const string& key and you strip off the outer loop, below. > > Indeed, with the current method of recording all steps in the chain as keys we > kind of inflate things: if a user navigates once to A and another time to B, the > redirect B -> C will get counted twice. I think that may make it harder to > reason about counts if we have to consider this double-counting. Yes, you understand the code correctly. But in the case when we ignore {B, C}, we'll probably lose useful information. For example, consider following chain A = http://fb.com B = http://facebook.com C = https://facebook.com As I think, user could use both of A,B in practice. However, I'm not very familiar with users behavior patterns. I don't see problems with double-counting because this will reflect reality. In your example we actually have B -> C redirect encountered twice and we can assume this redirect is more "permanent". I agree that ignoring intermediate redirects will make the code more clear, so for V1 we could use simpler model and figure out later if it's worth adding this functionality. https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.h:196: typedef std::map<NavigationID, linked_ptr<std::vector<GURL> > > RedirectsMap; On 2016/09/09 09:16:11, mattcary wrote: > Should use unique_ptr instead, see > > https://www.chromium.org/developers/smart-pointer-guidelines > > For bonus points, update other linked_ptr usages to unique_ptr. Yep, I used linked_ptr like in a rest of code but I've already seen that it's deprecated now. Ok, I'll update all pointers.
https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/20001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:1037: const std::vector<std::string>& keys, On 2016/09/09 13:59:19, alexilin wrote: > On 2016/09/09 13:19:21, mattcary wrote: > > I wonder if it's necessary to store data on all the redirect steps. In the > > following I'll use the example of a redirect chain A -> B -> C (so A is the > > first url and C is the final one). A redirect map entry will be referred to as > > {url, final_redirect}. > > > > If I understand the code correctly, you update both {A, C} and {B, C}. That's > > correct, but possibly excessive. The intermediate redirects may not be used > much > > in practice; if they are navigated to directly then they'll add their own > cache > > entry. So I propose just updating {A, C}; in other words, LearnRedirects just > > takes a const string& key and you strip off the outer loop, below. > > > > Indeed, with the current method of recording all steps in the chain as keys we > > kind of inflate things: if a user navigates once to A and another time to B, > the > > redirect B -> C will get counted twice. I think that may make it harder to > > reason about counts if we have to consider this double-counting. > > Yes, you understand the code correctly. > But in the case when we ignore {B, C}, we'll probably lose useful information. > For example, consider following chain > A = http://fb.com > B = http://facebook.com > C = https://facebook.com > As I think, user could use both of A,B in practice. However, I'm not very > familiar with users behavior patterns. > > I don't see problems with double-counting because this will reflect reality. In > your example we actually have B -> C redirect encountered twice and we can > assume this redirect is more "permanent". > > I agree that ignoring intermediate redirects will make the code more clear, so > for V1 we could use simpler model and figure out later if it's worth adding this > functionality. Yeah, I can't disagree with anything you said. +1 to being simpler for V1, simpler is always better. BenoƮt or Egor will probably have comments too when they get back :)
Hi Matt! I've finished with collecting and storing information about redirects. I'm not happy with the resulting resource_prefetch_predictor_tables class, because it is responsible for 9(!) sqlite tables and contains too much duplicated code (the same operations over different types of data). I would prefer to have 1 table - 1 object matching. And also I would like to hide table cache inside storage class. I'll be glad to hear any suggestions!
On 2016/09/14 12:14:28, alexilin wrote: > Hi Matt! > > I've finished with collecting and storing information about redirects. > I'm not happy with the resulting resource_prefetch_predictor_tables class, > because it is responsible for 9(!) sqlite tables and contains too much > duplicated code (the same operations over different types of data). I would > prefer to have 1 table - 1 object matching. And also I would like to hide table > cache inside storage class. > I'll be glad to hear any suggestions! Cool, I'll take a look.
> I'm not happy with the resulting resource_prefetch_predictor_tables class, > because it is responsible for 9(!) sqlite tables and contains too much > duplicated code (the same operations over different types of data). I would > prefer to have 1 table - 1 object matching. And also I would like to hide table > cache inside storage class. Yeah, I agree, the tables class does conflate data cache manipulation with db and table management. One issue is that we want to avoid round-trips to the DB thread, so somewhere there's going to have to be, eg, a GetAllData method. But I agree that it could be refactored a bit. I think that the nested Redirect* structures have gotten too complex as simple nested structs. I actually wonder how much we need them if we have a proto now---do we gain anything by marshelling things into and out of the proto, or should we just pass around the protos instead of structures? (The addition of protos to this code is quite new and it's possible we didn't get it right the first time). That also might help to regularlize the tables so that sql handling will be a little easier to deal with.
https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:480: if (nav_it != inflight_navigations_.end()) Run git cl format; the brace should be on the same line as the if statement. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.h:108: std::unique_ptr<std::vector<URLRequestSummary>> requests; Why not just a vector, avoiding the extra redirection (the vector's internal storage will be heap-allocated anyway, no?). If it's to manage assignment/copying, then probably Navigation should be a class and not a structure (as our convention for structures assumes POD semantics, which is already a little tenuous here). https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor_tables.cc (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:58: "final_redirect TEXT, " I'm not sure why final_redirect is part of the key rather than just being in the proto as a list of all redirects. This also seems to contradict how we represent the data once it's extracted to the map (eg, it's keyed by the main URL and we have to combine the final urls together). https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:484: DeleteRedirectDataHelper(key_type, std::vector<std::string>(1, key)); wouldn't std::vector<std::string>{key} be a little easier to read of an initializer? https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:558: PrefetchKeyType key_type, I think I'd prefer to rename the function rather than just relying on overloading, since the behavior between the two GetAllDataHelpers is so different.
Description was changed from ========== Starting work on redirect handling in resource prefetch predictor BUG= ========== to ========== Redirect handling in resource prefetch predictor We are going to save information about redirects in resource prefetch predictor to use it later and to prefetch resources associated with the end of redirect chain immediately. BUG= ==========
On 2016/09/14 15:37:00, mattcary wrote: > > I'm not happy with the resulting resource_prefetch_predictor_tables class, > > because it is responsible for 9(!) sqlite tables and contains too much > > duplicated code (the same operations over different types of data). I would > > prefer to have 1 table - 1 object matching. And also I would like to hide > table > > cache inside storage class. > > Yeah, I agree, the tables class does conflate data cache manipulation with db > and table management. > > One issue is that we want to avoid round-trips to the DB thread, so somewhere > there's going to have to be, eg, a GetAllData method. But I agree that it could > be refactored a bit. > > I think that the nested Redirect* structures have gotten too complex as simple > nested structs. I actually wonder how much we need them if we have a proto > now---do we gain anything by marshelling things into and out of the proto, or > should we just pass around the protos instead of structures? (The addition of > protos to this code is quite new and it's possible we didn't get it right the > first time). > > That also might help to regularlize the tables so that sql handling will be a > little easier to deal with. I agree that the tables class was designed that way to reduce amount of round-trips to the DB thread. It also has UpdateData method that allows update all tables in a single call but this possibility wasn't used. And because of this, hack with data with empty key is required (end of LearnNavigation/LearnRedirect method). I think that it's possible to combine all UpdateData calls into one. Passing protos around could simplify sql-related code. I should try!
Hello Matt, I worked on your comments and also changed redirects table schema. Plz take a look, when you'll have time. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.cc:480: if (nav_it != inflight_navigations_.end()) On 2016/09/14 15:37:08, mattcary wrote: > Run git cl format; the brace should be on the same line as the if statement. Oops... Done. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor.h:108: std::unique_ptr<std::vector<URLRequestSummary>> requests; On 2016/09/14 15:37:08, mattcary wrote: > Why not just a vector, avoiding the extra redirection (the vector's internal > storage will be heap-allocated anyway, no?). > > If it's to manage assignment/copying, then probably Navigation should be a class > and not a structure (as our convention for structures assumes POD semantics, > which is already a little tenuous here). Yep, my fault. I forgot to remove it after I had simplified code. Done. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor_tables.cc (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:58: "final_redirect TEXT, " On 2016/09/14 15:37:09, mattcary wrote: > I'm not sure why final_redirect is part of the key rather than just being in the > proto as a list of all redirects. This also seems to contradict how we represent > the data once it's extracted to the map (eg, it's keyed by the main URL and we > have to combine the final urls together). I just copied structure of resource table. It's a little bit tricky. map<main_frame_url, vector<resource_row>> (as in cache) transforms into several table rows: [{main_frame_url, resource_url1, resource_row1}, {main_frame_url, resource_url2, resource_row2}, ...]. So when we update or delete some resource data (containing a vector<resource_row>) it really affects many table rows. I don't know what's the point because it seems that all resources of specific main_frame_url is always modified together. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:484: DeleteRedirectDataHelper(key_type, std::vector<std::string>(1, key)); On 2016/09/14 15:37:08, mattcary wrote: > wouldn't std::vector<std::string>{key} be a little easier to read of an > initializer? Actually, we could use just {key}. Or do we need more explicitness? https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:558: PrefetchKeyType key_type, On 2016/09/14 15:37:09, mattcary wrote: > I think I'd prefer to rename the function rather than just relying on > overloading, since the behavior between the two GetAllDataHelpers is so > different. Done.
Still going through your changes but I wanted to respond quickly to two comments. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... File chrome/browser/predictors/resource_prefetch_predictor_tables.cc (right): https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:58: "final_redirect TEXT, " On 2016/09/14 18:59:55, alexilin wrote: > On 2016/09/14 15:37:09, mattcary wrote: > > I'm not sure why final_redirect is part of the key rather than just being in > the > > proto as a list of all redirects. This also seems to contradict how we > represent > > the data once it's extracted to the map (eg, it's keyed by the main URL and we > > have to combine the final urls together). > > I just copied structure of resource table. It's a little bit tricky. > map<main_frame_url, vector<resource_row>> (as in cache) transforms into several > table rows: [{main_frame_url, resource_url1, resource_row1}, {main_frame_url, > resource_url2, resource_row2}, ...]. So when we update or delete some resource > data (containing a vector<resource_row>) it really affects many table rows. > I don't know what's the point because it seems that all resources of specific > main_frame_url is always modified together. Yeah, that's what I mean. If you always change multiple rows, you should just have it all in the same proto. That will make management a lot easier, with less shipping of stuff between the DB and internal data structures. We aren't really using the DB completely, ie, doing interesting queries. It's just a key-data store. https://codereview.chromium.org/2321343002/diff/80001/chrome/browser/predicto... chrome/browser/predictors/resource_prefetch_predictor_tables.cc:484: DeleteRedirectDataHelper(key_type, std::vector<std::string>(1, key)); On 2016/09/14 18:59:55, alexilin wrote: > On 2016/09/14 15:37:08, mattcary wrote: > > wouldn't std::vector<std::string>{key} be a little easier to read of an > > initializer? > > Actually, we could use just {key}. Or do we need more explicitness? {key} works even better. I'm still a little weak on C++11-isms!
https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor_tables.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor_tables.h:104: // Used in the UrlRedirectTable and HostRedirectTable to store redirects For example of how to simplify everything, the only difference between a RedirectRow and the proto seems to be the score? That hardly seems worth all the marshelling for. The score could be in the proto, and cleared before storing, or stored in a parallel map with the caches.
https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor_tables.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor_tables.h:104: // Used in the UrlRedirectTable and HostRedirectTable to store redirects On 2016/09/15 09:40:47, mattcary wrote: > For example of how to simplify everything, the only difference between a > RedirectRow and the proto seems to be the score? That hardly seems worth all the > marshelling for. The score could be in the proto, and cleared before storing, or > stored in a parallel map with the caches. I didn't consider that the use of proto-generated types is allowed in business code. However, I see a few pros and cons. Pros: - We get rid of one layer of abstraction. Less conversions, less code. Cons: - We don't have control over automatically generated type. We couldn't add own methods, fields which should not be stored directly. - The type could be used in some higher level classes which shouldn't know much storage structure and be dependent on protobuf library. - Proto-generated types could contain only restricted list of types (I don't talk about other user-defined proto messages, enums, etc.), for example base::Time should be converted into uint64. - We probably don't want to have a deal with proto-style containers. Of course, we have a simple structure for now, but I wrote all of this because we could expand RedirectRow structure in the future and face a some of these problems.
https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor_tables.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor_tables.h:104: // Used in the UrlRedirectTable and HostRedirectTable to store redirects > Pros: > - We get rid of one layer of abstraction. Less conversions, less code. > Yes, exactly. This is a pretty big pro. All the boilerplate conversion from the DB to an internal structure is annoying and error-prone. > Cons: > - We don't have control over automatically generated type. We couldn't add own > methods, fields which should not be stored directly. I don't think this is much of a problem. Generally one doesn't want to add much that isn't stored anyway. Not being able to extend the class also isn't usually a problem in practice. The idea is to use a proto instead of a struct---we wouldn't want to add a lot of machinery to a struct, either. > - The type could be used in some higher level classes which shouldn't know much > storage structure and be dependent on protobuf library. Being dependent on the protobuf library isn't a problem---it's already linked in---and anyway most use is going to be through the tables class. To put it a different way, RedirectRow is very implementation dependent so we wouldn't want it to leak too far out either. > - Proto-generated types could contain only restricted list of types (I don't > talk about other user-defined proto messages, enums, etc.), for example > base::Time should be converted into uint64. This also turns out not to be a big deal. In fact, it is something of an advantage, as anything will have to be serialized to store in the DB anyway, so we wouldn't want to use a complex type anyway. > - We probably don't want to have a deal with proto-style containers. I'm not sure what you mean here. Again, the issue is that we can't serialize complex data structures anyway, so we'd want to think carefully about relying on containers. We have to build an index to anything we pull out of the DB. > > Of course, we have a simple structure for now, but I wrote all of this because > we could expand RedirectRow structure in the future and face a some of these > problems. That's a good thing to think of generally, but in this case because RedirectRow is something that is going to be serialized into the DB, we have essential constraints on how we'd expand the type in the future. To put it a different way, this sort of situation (data that is serialized and also manipulated) is exactly what protos were created for, and so it's likely that long-term usage will work out okay as well.
https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor_tables.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor_tables.h:104: // Used in the UrlRedirectTable and HostRedirectTable to store redirects > > Cons: > > - We don't have control over automatically generated type. We couldn't add own > > methods, fields which should not be stored directly. > I don't think this is much of a problem. Generally one doesn't want to add much > that isn't stored anyway. Not being able to extend the class also isn't usually > a problem in practice. The idea is to use a proto instead of a struct---we > wouldn't want to add a lot of machinery to a struct, either. Yeah, now I see that this is even good if we treat the struct as POD. You know everything about this struct after you looked at its proto declaration. Keeps it simple and clear for readers. > > - We probably don't want to have a deal with proto-style containers. > I'm not sure what you mean here. Again, the issue is that we can't serialize > complex data structures anyway, so we'd want to think carefully about relying on > containers. We have to build an index to anything we pull out of the DB. I've meant "repeated" fields in proto. For example, if we get rid of RedirectData struct (this is wrapper over RedirectProto message) then users of this class will have to use proto-style accessors to manipulate repeated data instead of std::vector. > > Of course, we have a simple structure for now, but I wrote all of this because > > we could expand RedirectRow structure in the future and face a some of these > > problems. > > That's a good thing to think of generally, but in this case because RedirectRow > is something that is going to be serialized into the DB, we have essential > constraints on how we'd expand the type in the future. > > To put it a different way, this sort of situation (data that is serialized and > also manipulated) is exactly what protos were created for, and so it's likely > that long-term usage will work out okay as well. Great, thanks for the explanation! So, should I get rid of all similar classes? (ResourceRow, ResourceData, RedirectData)
> > > - We probably don't want to have a deal with proto-style containers. > > I'm not sure what you mean here. Again, the issue is that we can't serialize > > complex data structures anyway, so we'd want to think carefully about relying > on > > containers. We have to build an index to anything we pull out of the DB. > > I've meant "repeated" fields in proto. For example, if we get rid of > RedirectData struct (this is wrapper over RedirectProto message) then users of > this class will have to use proto-style accessors to manipulate repeated data > instead of std::vector. > That's true, although protos have gotten a lot better and do C11 range iteration and such (at least in google3, hopefully the version we use in chrome is recent enough...) > Great, thanks for the explanation! So, should I get rid of all similar classes? > (ResourceRow, ResourceData, RedirectData) Yeah, that's my opinion.
alexilin@chromium.org changed reviewers: + lizeb@chromium.org
Hi Benoit, Here is a CL that I was working for last two weeks. I haven't made changes concerning "move everything in proto" yet. I'm still working on that.
On 2016/09/19 08:01:54, alexilin wrote: > Hi Benoit, > > Here is a CL that I was working for last two weeks. I haven't made changes > concerning "move everything in proto" yet. I'm still working on that. FYI, this is the CL that removed the intermediate objects. https://codereview.chromium.org/2268283005/ Per offline discussion with pasko, chose to (temporarily) shelve it.
On 2016/09/19 12:13:47, Benoit L wrote: > On 2016/09/19 08:01:54, alexilin wrote: > > Hi Benoit, > > > > Here is a CL that I was working for last two weeks. I haven't made changes > > concerning "move everything in proto" yet. I'm still working on that. > > FYI, this is the CL that removed the intermediate objects. > https://codereview.chromium.org/2268283005/ > > Per offline discussion with pasko, chose to (temporarily) shelve it. I already have a lot of uncommited code concerning removal of all intermediate structures. In comparison with your CL, I also removed PrefetchData structure as well and keep all ResourceRows in one sqlite row. Anyway, probably it worth to finish with this CL and come back to database refactoring later. Could you please review current changes?
Per offline discussion, a refactor CL will land first, then an evolution of this one. A few comments, most of them really minor. Code looks fine, and I understand why you feel there is too much duplication here :-) https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_common.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_common.h:107: // The maximum number of redirects to store per entry. I'm not 100% sure that we really need a setting for this. Generally speaking, less knobs is better. The settings above were used to tune parameters for experiments. Unless we have a good reason to do otherwise, maybe we can start with keeping a small fixed number of entries. My suspicion is that the redirect chains are going to be either "monomorphic" or "megamorphic". The second case we don't care about. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:448: std::unique_ptr<Navigation>(new Navigation(initial_url)))); nit: MakeUnique is preferred to this. (it's like C++-14's std::make_unique). https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:488: navigation.reset(new Navigation(response.navigation_id.main_frame_url)); nit: no braces when it fits on a single line (per the style guide). https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:645: base::Bind(&ResourcePrefetchPredictorTables::GetAllData, tables_, nit: Why not just url_data.get() here (and so on for the other ones)? Seems like the raw pointers are only used here. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:787: for (RedirectDataMap::iterator it = data_map->begin(); it != data_map->end(); nit: Can you use a range for loop here? https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:960: for (std::vector<ResourceRow>::iterator it = resources.begin(); tiny nit: this is one of the trivial cases where auto makes things easier. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:1009: RemoveOldestEntryInRedirectDataMap(key_type, redirect_map); nit: no braces. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:1038: row_to_add.url = final_redirect; The key is the final redirect url? https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.h:103: struct Navigation { Navigation and requests are vague, overloaded words. I guess that Navigation is fine here, but "requests" doesn't tell me what this relates to. I suspect it's the vector of main resource requests due to redirects. What about main_resource_requests? Also, it feels a bit weird for Navigation and NavigationID to be two entirely unrelated objects. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.h:108: std::vector<URLRequestSummary> requests; Is there some invariant here, such as: - requests is not empty - requests[0].url == initial_url? If so, is it DCHECK()-d somewhere? https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor.proto (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.proto:66: // Represents single redirect chain endpoint nit: Represents a.
On 2016/09/19 13:53:48, Benoit L wrote: > Per offline discussion, a refactor CL will land first, then an evolution of this > one. > > > A few comments, most of them really minor. Code looks fine, and I understand why > you feel there is too much duplication here :-) > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > File chrome/browser/predictors/resource_prefetch_common.h (right): > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_common.h:107: // The maximum number > of redirects to store per entry. > I'm not 100% sure that we really need a setting for this. Generally speaking, > less knobs is better. > > The settings above were used to tune parameters for experiments. Unless we have > a good reason to do otherwise, maybe we can start with keeping a small fixed > number of entries. > > My suspicion is that the redirect chains are going to be either "monomorphic" or > "megamorphic". The second case we don't care about. > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > File chrome/browser/predictors/resource_prefetch_predictor.cc (right): > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:448: > std::unique_ptr<Navigation>(new Navigation(initial_url)))); > nit: MakeUnique is preferred to this. (it's like C++-14's std::make_unique). > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:488: > navigation.reset(new Navigation(response.navigation_id.main_frame_url)); > nit: no braces when it fits on a single line (per the style guide). > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:645: > base::Bind(&ResourcePrefetchPredictorTables::GetAllData, tables_, > nit: Why not just > url_data.get() here (and so on for the other ones)? > > Seems like the raw pointers are only used here. > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:787: for > (RedirectDataMap::iterator it = data_map->begin(); it != data_map->end(); > nit: Can you use a range for loop here? > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:960: for > (std::vector<ResourceRow>::iterator it = resources.begin(); > tiny nit: this is one of the trivial cases where auto makes things easier. > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:1009: > RemoveOldestEntryInRedirectDataMap(key_type, redirect_map); > nit: no braces. > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.cc:1038: row_to_add.url = > final_redirect; > The key is the final redirect url? > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > File chrome/browser/predictors/resource_prefetch_predictor.h (right): > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.h:103: struct Navigation { > Navigation and requests are vague, overloaded words. I guess that Navigation is > fine here, but "requests" doesn't tell me what this relates to. > > I suspect it's the vector of main resource requests due to redirects. What about > main_resource_requests? > > Also, it feels a bit weird for Navigation and NavigationID to be two entirely > unrelated objects. > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.h:108: > std::vector<URLRequestSummary> requests; > Is there some invariant here, such as: > - requests is not empty > - requests[0].url == initial_url? > > If so, is it DCHECK()-d somewhere? > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > File chrome/browser/predictors/resource_prefetch_predictor.proto (right): > > https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... > chrome/browser/predictors/resource_prefetch_predictor.proto:66: // Represents > single redirect chain endpoint > nit: Represents a. Also, I haven't reviewed everything. I guess you can tell where I stopped.
https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor.cc (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:645: base::Bind(&ResourcePrefetchPredictorTables::GetAllData, tables_, On 2016/09/19 13:53:47, Benoit L wrote: > nit: Why not just > url_data.get() here (and so on for the other ones)? > > Seems like the raw pointers are only used here. I'm still not very confident with this "passing pointers between threads" thing, so here I just repeated existing code. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:787: for (RedirectDataMap::iterator it = data_map->begin(); it != data_map->end(); On 2016/09/19 13:53:47, Benoit L wrote: > nit: Can you use a range for loop here? Of course! https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.cc:1038: row_to_add.url = final_redirect; On 2016/09/19 13:53:47, Benoit L wrote: > The key is the final redirect url? No. RedirectRow doesn't have a key at all, it is stored in vector. I didn't split redirect rows belonging the same initial url into several db rows. https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... File chrome/browser/predictors/resource_prefetch_predictor.h (right): https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.h:103: struct Navigation { > Navigation and requests are vague, overloaded words. I guess that Navigation is > fine here, but "requests" doesn't tell me what this relates to. > > I suspect it's the vector of main resource requests due to redirects. What about > main_resource_requests? This is actually an opposite thing :) This is a vector of subresource requests only within single navigation. So I think that I should call it subresource_requests. This is exactly the same vector that was stored in inflight_navigations_ map earlier. > Also, it feels a bit weird for Navigation and NavigationID to be two entirely > unrelated objects. Navigation is stored in NavigationMap by NavigationID key. I could add NavigationID field in Navigation structure but NavigationID is used only as a key for the map. What kind of relation do you expect? https://codereview.chromium.org/2321343002/diff/120001/chrome/browser/predict... chrome/browser/predictors/resource_prefetch_predictor.h:108: std::vector<URLRequestSummary> requests; On 2016/09/19 13:53:48, Benoit L wrote: > Is there some invariant here, such as: > - requests is not empty > - requests[0].url == initial_url? > > If so, is it DCHECK()-d somewhere? Related with misunderstanding in the comment above. Vector could be empty, it means that we didn't see any subresources within this navigation. |
