|
|
DescriptionUpdate ThreatDOMDetails to be able to collect non-resource HTML Elements based on their attributes, and make sure they can be rolled up into the DOM in ThreatDetails.
BUG=674618
Review-Url: https://codereview.chromium.org/2713233002
Cr-Commit-Position: refs/heads/master@{#453994}
Committed: https://chromium.googlesource.com/chromium/src/+/d9d61ec69bdeb53703521c67c4292e148201f084
Patch Set 1 #
Total comments: 24
Patch Set 2 : Address comments #
Total comments: 16
Patch Set 3 : Address comments (in particular change map to vector<struct>) #Patch Set 4 : Sync #
Messages
Total messages: 31 (19 generated)
The CQ bit was checked by lpz@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lpz@chromium.org changed reviewers: + jialiul@chromium.org, nparker@chromium.org
Description was changed from ========== Update ThreatDOMDetails to be able to collect non-resource HTML Elements based on their attributes. BUG=674618 ========== to ========== Update ThreatDOMDetails to be able to collect non-resource HTML Elements based on their attributes, and make sure they can be rolled up into the DOM in ThreatDetails. BUG=674618 ==========
vakh@chromium.org changed reviewers: + vakh@chromium.org
Didn't look at the tests in too much detail. LGTM overall. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:51: std::map<std::string, std::unordered_set<std::string>>; You might want to consider using: std::map<std::string, std::vector<std::string>> Reason: When you read the map, you iterate over all the elements in the unordered_set anyway so the only advantage you using an unordered_set over a vector is to avoid having multiple elements with same values. But that's not really a concern because: 1. You, the Googler, control that value and therefore there's a very slim chance that you'd make that mistake. 2. Even if there were repeated values in the vector, it won't make any functional difference. The gains are: - std:vector has better cache locality - uses less memory See: http://lafstern.org/matt/col1.pdf More optional: In fact, I would encourage you to use: struct TagAndAttributes { std::string tag; std::vector<std::string> attributes; }; std::vector<TagAndAttributes> tag_and_attributes_list; And sort tag_and_attributes_list at the end of ParseTagAndAttributeParams https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:59: if (!tag_to_attributes_map) { Would this condition mask a real problem? It can only happen if the programmer fails to call the method correctly and I think, in that case, the code should fail loud and quick. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:104: // Update summary node with the URL if this element has one. Code is clear enough so the comment is not particularly useful. See go/tott-465 https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:109: // Create the child node. Same here https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:147: const TagToAttributesMap& tag_to_attribute_map) { nit: s/tag_to_attribute_map/tag_to_attributes_map to be consistent with the other uses of this map.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:35: // This Feature specifies which HTML Elements to collect based on their tag and In addition to "iframe", "frame", "embed" and "script"? https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:73: for (size_t i = 0; i < split.size(); i += 2) { Maybe do a DCHECK if split.size() is an even number? If it is no even, we don't need to bother parsing it. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:43: } empty line before "} // namespace" https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:46: typedef ChromeRenderViewTest ThreatDOMDetailsTest; Since you're using "using" in other places, maybe change "typedef" to "using" too?
https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:73: for (size_t i = 0; i < split.size(); i += 2) { On 2017/02/25 00:46:18, Jialiu Lin wrote: > Maybe do a DCHECK if split.size() is an even number? If it is no even, we don't > need to bother parsing it. Good idea, but if you do this, might as well do this for production builds also (not just DEBUG).
The CQ bit was checked by lpz@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thank you for the thorough feedback! https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:35: // This Feature specifies which HTML Elements to collect based on their tag and On 2017/02/25 00:46:18, Jialiu Lin wrote: > In addition to "iframe", "frame", "embed" and "script"? clarified that it's only for "non-resource" elements. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:51: std::map<std::string, std::unordered_set<std::string>>; On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > You might want to consider using: > std::map<std::string, std::vector<std::string>> > > Reason: When you read the map, you iterate over all the elements in the > unordered_set anyway so the only advantage you using an unordered_set over a > vector is to avoid having multiple elements with same values. But that's not > really a concern because: > 1. You, the Googler, control that value and therefore there's a very slim chance > that you'd make that mistake. > 2. Even if there were repeated values in the vector, it won't make any > functional difference. > > The gains are: > - std:vector has better cache locality > - uses less memory > > See: http://lafstern.org/matt/col1.pdf > > > More optional: > In fact, I would encourage you to use: > struct TagAndAttributes { > std::string tag; > std::vector<std::string> attributes; > }; > > std::vector<TagAndAttributes> tag_and_attributes_list; > > And sort tag_and_attributes_list at the end of ParseTagAndAttributeParams Thanks - good point about using a vector, I don't actually need the deduping of the set. As for the struct, what benefit do you see here? The map gives easy lookup of tag names at least. With tag_and_attribute_list, you'd need to sort and also std::find inside it I guess? How do those compare to you? https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:59: if (!tag_to_attributes_map) { On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > Would this condition mask a real problem? > It can only happen if the programmer fails to call the method correctly and I > think, in that case, the code should fail loud and quick. Yes, switched to a dcheck https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:73: for (size_t i = 0; i < split.size(); i += 2) { On 2017/02/25 00:48:07, vakh (Varun Khaneja) wrote: > On 2017/02/25 00:46:18, Jialiu Lin wrote: > > Maybe do a DCHECK if split.size() is an even number? If it is no even, we > don't > > need to bother parsing it. > > Good idea, but if you do this, might as well do this for production builds also > (not just DEBUG). Done. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:104: // Update summary node with the URL if this element has one. On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > Code is clear enough so the comment is not particularly useful. See go/tott-465 Done. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:109: // Create the child node. On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > Same here Done. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:147: const TagToAttributesMap& tag_to_attribute_map) { On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > nit: s/tag_to_attribute_map/tag_to_attributes_map > > to be consistent with the other uses of this map. Done. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:43: } On 2017/02/25 00:46:18, Jialiu Lin wrote: > empty line before "} // namespace" Done. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:46: typedef ChromeRenderViewTest ThreatDOMDetailsTest; On 2017/02/25 00:46:18, Jialiu Lin wrote: > Since you're using "using" in other places, maybe change "typedef" to "using" > too? Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:51: std::map<std::string, std::unordered_set<std::string>>; On 2017/02/27 15:46:43, lpz wrote: > On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > > You might want to consider using: > > std::map<std::string, std::vector<std::string>> > > > > Reason: When you read the map, you iterate over all the elements in the > > unordered_set anyway so the only advantage you using an unordered_set over a > > vector is to avoid having multiple elements with same values. But that's not > > really a concern because: > > 1. You, the Googler, control that value and therefore there's a very slim > chance > > that you'd make that mistake. > > 2. Even if there were repeated values in the vector, it won't make any > > functional difference. > > > > The gains are: > > - std:vector has better cache locality > > - uses less memory > > > > See: http://lafstern.org/matt/col1.pdf > > > > > > More optional: > > In fact, I would encourage you to use: > > struct TagAndAttributes { > > std::string tag; > > std::vector<std::string> attributes; > > }; > > > > std::vector<TagAndAttributes> tag_and_attributes_list; > > > > And sort tag_and_attributes_list at the end of ParseTagAndAttributeParams > > Thanks - good point about using a vector, I don't actually need the deduping of > the set. > > As for the struct, what benefit do you see here? The map gives easy lookup of > tag names at least. With tag_and_attribute_list, you'd need to sort and also > std::find inside it I guess? How do those compare to you? The advantage is the same: cache locality (big win). Plus, with a map (red black tree), you always have the possibility of hash collisions leading to re-hashing or chaining (depending on how the map is implemented). Correct me if I'm wrong, but the sorting would need to be done only once -- at startup, right? If that is so, IMHO, sorting isn't a concern because I expect this list is going to be small (< 100) items in total so sorting it one time is going to be quick. find should also be quick since all find operations would hit the cache and therefore be super fast. There is no correct answer here since it really depends on size of list, how often you'd get/set elements in it, etc so, I'll let you decide what's best. I'm not sure vector would be clearly better here, but it may be. That's why I said more optional :) https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:219: ParseTagAndAttributeParams(&tag_to_attributes_map); Can't this be done just once, during init/start-up? https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:55: DCHECK(tag_to_attributes_map); optional and nit: this will fail on line 59 spectacularly anyway so a DCHECK doesn't add much value.
lgtm mostly nits and some questions. https://codereview.chromium.org/2713233002/diff/20001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/threat_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/browser/safe_bro... chrome/browser/safe_browsing/threat_details.cc:503: if (!node.tag_name.empty()) { When would there a URL but no tag name? Maybe I'm misunderstanding tag name. https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:37: // list of pairs. For example: "tag1,id,tag1,height,tag2,foo" - this will Should they be lowercased? https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:55: DCHECK(tag_to_attributes_map); On 2017/02/27 17:24:35, vakh (Varun Khaneja) wrote: > optional and nit: this will fail on line 59 spectacularly anyway so a DCHECK > doesn't add much value. ...It does add a little value since it'd wouldn't fail w/o the feature enabled... gives more test coverage on debug builds. https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:97: element.document().completeURL(element.getAttribute("src")); What does getAttritribute("src") do if there is no "src"? https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:128: // It's possible that the direct parent of this node wasn't handled, so it What's the use case for this? Will this link a sub-sub-frame to the top-level doc if the mid-frame wasn't yet handled? https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:143: if ((element.hasHTMLTagName("iframe") || element.hasHTMLTagName("frame") || An aside: Are there any other tags we might want by default? I found a list of url-having tags here: http://stackoverflow.com/questions/2725156/complete-list-of-html-tag-attribut... https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:137: "<html><head><div foo=1><div bar=1><div baz=1></div>" This doesn't strictly verify that "div" is compared against this list. How about adding <img foo=1> to verify it doesn't get added?
The CQ bit was checked by lpz@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:51: std::map<std::string, std::unordered_set<std::string>>; On 2017/02/27 17:24:35, vakh (Varun Khaneja) wrote: > On 2017/02/27 15:46:43, lpz wrote: > > On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > > > You might want to consider using: > > > std::map<std::string, std::vector<std::string>> > > > > > > Reason: When you read the map, you iterate over all the elements in the > > > unordered_set anyway so the only advantage you using an unordered_set over a > > > vector is to avoid having multiple elements with same values. But that's not > > > really a concern because: > > > 1. You, the Googler, control that value and therefore there's a very slim > > chance > > > that you'd make that mistake. > > > 2. Even if there were repeated values in the vector, it won't make any > > > functional difference. > > > > > > The gains are: > > > - std:vector has better cache locality > > > - uses less memory > > > > > > See: http://lafstern.org/matt/col1.pdf > > > > > > > > > More optional: > > > In fact, I would encourage you to use: > > > struct TagAndAttributes { > > > std::string tag; > > > std::vector<std::string> attributes; > > > }; > > > > > > std::vector<TagAndAttributes> tag_and_attributes_list; > > > > > > And sort tag_and_attributes_list at the end of ParseTagAndAttributeParams > > > > Thanks - good point about using a vector, I don't actually need the deduping > of > > the set. > > > > > > As for the struct, what benefit do you see here? The map gives easy lookup of > > tag names at least. With tag_and_attribute_list, you'd need to sort and also > > std::find inside it I guess? How do those compare to you? > > The advantage is the same: cache locality (big win). Plus, with a map (red black > tree), you always have the possibility of hash collisions leading to re-hashing > or chaining (depending on how the map is implemented). > > Correct me if I'm wrong, but the sorting would need to be done only once -- at > startup, right? > If that is so, IMHO, sorting isn't a concern because I expect this list is going > to be small (< 100) items in total so sorting it one time is going to be quick. > find should also be quick since all find operations would hit the cache and > therefore be super fast. > > There is no correct answer here since it really depends on size of list, how > often you'd get/set elements in it, etc so, I'll let you decide what's best. I'm > not sure vector would be clearly better here, but it may be. That's why I said > more optional :) Thanks Varun, PTAL. Re: lifetime, yes I believe these objects live as long as each renderer, so parsing once up-front makes more sense. Re: data struct: Based on the doc you pointed at, the vector is probably a better fit Unfortunately it does introduce a bunch of additional boilerplate (struct is "complex" so it needs out-of-line constrcutors and destructors, and you need a predicates to support searching/sorting by tag). The list will live for much longer than it will be used, though, so the memory benefit alone is probably worth it. https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:219: ParseTagAndAttributeParams(&tag_to_attributes_map); On 2017/02/27 17:24:35, vakh (Varun Khaneja) wrote: > Can't this be done just once, during init/start-up? Done. https://codereview.chromium.org/2713233002/diff/20001/chrome/browser/safe_bro... File chrome/browser/safe_browsing/threat_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/browser/safe_bro... chrome/browser/safe_browsing/threat_details.cc:503: if (!node.tag_name.empty()) { On 2017/02/27 23:00:36, Nathan Parker wrote: > When would there a URL but no tag name? Maybe I'm misunderstanding tag name. The last Node in each IPC is a "summary" node whose URL is the render frame's document URL and its children are a flat list of all resources in the frame. It never has a tag name. Added a comment. https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:37: // list of pairs. For example: "tag1,id,tag1,height,tag2,foo" - this will On 2017/02/27 23:00:36, Nathan Parker wrote: > Should they be lowercased? Yes - the code and the comment both mention that. Do you see a problem with this? https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:55: DCHECK(tag_to_attributes_map); On 2017/02/27 23:00:36, Nathan Parker wrote: > On 2017/02/27 17:24:35, vakh (Varun Khaneja) wrote: > > optional and nit: this will fail on line 59 spectacularly anyway so a DCHECK > > doesn't add much value. > > ...It does add a little value since it'd wouldn't fail w/o the feature > enabled... gives more test coverage on debug builds. Ack, leaving the dcheck for doc'ing purposes https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:97: element.document().completeURL(element.getAttribute("src")); On 2017/02/27 23:00:36, Nathan Parker wrote: > What does getAttritribute("src") do if there is no "src"? It behaves nicely - returns an empty object (url or some flavour of string depending where you look) https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:128: // It's possible that the direct parent of this node wasn't handled, so it On 2017/02/27 23:00:36, Nathan Parker wrote: > What's the use case for this? Will this link a sub-sub-frame to the top-level > doc if the mid-frame wasn't yet handled? imagine something like: <div foo> <div style-stuff> <iframe src=url> We are configured to capture foo because it's interesting, but there's some uninteresting style-stuff that contains an iframe. Iframe's parent is style-stuff, but since we're skipping style-stuff it won't be in our map. But we still want to bubble iframe up to foo since that is its real container. https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:143: if ((element.hasHTMLTagName("iframe") || element.hasHTMLTagName("frame") || On 2017/02/27 23:00:36, Nathan Parker wrote: > An aside: Are there any other tags we might want by default? I found a list of > url-having tags here: > http://stackoverflow.com/questions/2725156/complete-list-of-html-tag-attribut... Not sure of the answer but, at a glance, there are some candidates in that list. Good news is that we can use the new feature to expand the list of collected resources without code changes (eg: just include "audio,src" in our params). https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details_browsertest.cc:137: "<html><head><div foo=1><div bar=1><div baz=1></div>" On 2017/02/27 23:00:36, Nathan Parker wrote: > This doesn't strictly verify that "div" is compared against this list. How > about adding <img foo=1> to verify it doesn't get added? Good idea, done.
lgtm https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/1/chrome/renderer/safe_browsi... chrome/renderer/safe_browsing/threat_dom_details.cc:51: std::map<std::string, std::unordered_set<std::string>>; On 2017/02/28 22:53:28, lpz wrote: > On 2017/02/27 17:24:35, vakh (Varun Khaneja) wrote: > > On 2017/02/27 15:46:43, lpz wrote: > > > On 2017/02/24 19:08:38, vakh (Varun Khaneja) wrote: > > > > You might want to consider using: > > > > std::map<std::string, std::vector<std::string>> > > > > > > > > Reason: When you read the map, you iterate over all the elements in the > > > > unordered_set anyway so the only advantage you using an unordered_set over > a > > > > vector is to avoid having multiple elements with same values. But that's > not > > > > really a concern because: > > > > 1. You, the Googler, control that value and therefore there's a very slim > > > chance > > > > that you'd make that mistake. > > > > 2. Even if there were repeated values in the vector, it won't make any > > > > functional difference. > > > > > > > > The gains are: > > > > - std:vector has better cache locality > > > > - uses less memory > > > > > > > > See: http://lafstern.org/matt/col1.pdf > > > > > > > > > > > > More optional: > > > > In fact, I would encourage you to use: > > > > struct TagAndAttributes { > > > > std::string tag; > > > > std::vector<std::string> attributes; > > > > }; > > > > > > > > std::vector<TagAndAttributes> tag_and_attributes_list; > > > > > > > > And sort tag_and_attributes_list at the end of ParseTagAndAttributeParams > > > > > > Thanks - good point about using a vector, I don't actually need the deduping > > of > > > the set. > > > > > > > > > > As for the struct, what benefit do you see here? The map gives easy lookup > of > > > tag names at least. With tag_and_attribute_list, you'd need to sort and also > > > std::find inside it I guess? How do those compare to you? > > > > The advantage is the same: cache locality (big win). Plus, with a map (red > black > > tree), you always have the possibility of hash collisions leading to > re-hashing > > or chaining (depending on how the map is implemented). > > > > Correct me if I'm wrong, but the sorting would need to be done only once -- at > > startup, right? > > If that is so, IMHO, sorting isn't a concern because I expect this list is > going > > to be small (< 100) items in total so sorting it one time is going to be > quick. > > find should also be quick since all find operations would hit the cache and > > therefore be super fast. > > > > There is no correct answer here since it really depends on size of list, how > > often you'd get/set elements in it, etc so, I'll let you decide what's best. > I'm > > not sure vector would be clearly better here, but it may be. That's why I said > > more optional :) > > Thanks Varun, PTAL. > Re: lifetime, yes I believe these objects live as long as each renderer, so > parsing once up-front makes more sense. > > Re: data struct: Based on the doc you pointed at, the vector is probably a > better fit Unfortunately it does introduce a bunch of additional boilerplate > (struct is "complex" so it needs out-of-line constrcutors and destructors, and > you need a predicates to support searching/sorting by tag). The list will live > for much longer than it will be used, though, so the memory benefit alone is > probably worth it. Acknowledged. Thanks for evaluating both approaches.
lgtm https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... File chrome/renderer/safe_browsing/threat_dom_details.cc (right): https://codereview.chromium.org/2713233002/diff/20001/chrome/renderer/safe_br... chrome/renderer/safe_browsing/threat_dom_details.cc:37: // list of pairs. For example: "tag1,id,tag1,height,tag2,foo" - this will On 2017/02/28 22:53:28, lpz wrote: > On 2017/02/27 23:00:36, Nathan Parker wrote: > > Should they be lowercased? > > Yes - the code and the comment both mention that. Do you see a problem with > this? SGTM
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by lpz@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from jialiul@chromium.org, nparker@chromium.org, vakh@chromium.org Link to the patchset: https://codereview.chromium.org/2713233002/#ps60001 (title: "Sync")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 60001, "attempt_start_ts": 1488390238328310, "parent_rev": "54620e7a10bd00609a7c428a3f734ac121abcacb", "commit_rev": "d9d61ec69bdeb53703521c67c4292e148201f084"}
Message was sent while issue was closed.
Description was changed from ========== Update ThreatDOMDetails to be able to collect non-resource HTML Elements based on their attributes, and make sure they can be rolled up into the DOM in ThreatDetails. BUG=674618 ========== to ========== Update ThreatDOMDetails to be able to collect non-resource HTML Elements based on their attributes, and make sure they can be rolled up into the DOM in ThreatDetails. BUG=674618 Review-Url: https://codereview.chromium.org/2713233002 Cr-Commit-Position: refs/heads/master@{#453994} Committed: https://chromium.googlesource.com/chromium/src/+/d9d61ec69bdeb53703521c67c429... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://chromium.googlesource.com/chromium/src/+/d9d61ec69bdeb53703521c67c429... |