|
|
Created:
5 years, 2 months ago by Michael Hablich Modified:
5 years, 1 month ago Reviewers:
jochen (gone - plz use gerrit), Hannes Payer (out of office), Alexei Svitkine (slow), battre CC:
asvitkine+watch_chromium.org, chromium-reviews, darin-cc_chromium.org, jam, mkwst+moarreviews-renderer_chromium.org, mlamouri+watch-content_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionCollect separate V8 histograms for Top10 non-Google sites
BUG=
R=jochen@chromium.org,hpayer@chromium.org,asvitkine@chromium.org
Committed: https://crrev.com/d6a4f129ef76ddd685d4d9ae6f549cc1fdb9e126
Cr-Commit-Position: refs/heads/master@{#356530}
Patch Set 1 #Patch Set 2 : Only top10 sites #Patch Set 3 : Master changes included #
Total comments: 2
Patch Set 4 : Added Jochen's suggestions #Patch Set 5 : Reformatted code #
Total comments: 5
Patch Set 6 : Made requested changes #
Total comments: 2
Patch Set 7 : Added UnitTests + Reformatted code #
Total comments: 6
Patch Set 8 : Took care of nits #Patch Set 9 : Changed UnitTests so they don't need to convert bool #
Messages
Total messages: 32 (7 generated)
PTAL
asvitkine@chromium.org changed reviewers: + battre@chromium.org
+battre from Chrome Privacy team on whether this is OK
LGTM
On 2015/10/14 16:30:06, Hannes Payer wrote: > LGTM I had some fruitful discussions with battre on this. I am going to do the following thing: 1.) Add youtube individually 2.) Add an aggregation of 10 non-Google top sites taken from Alexa Topic 1.) is quite straight-forward. I am concerned regarding 2.) because the hostnames are highly dependent on the site opened e.g. www.amazon.com and www.amazon.de should both report into this bucket. In order to accomplish this I need to only check for the domain name. I am now a little bit worried if this is a too high performance burden during runtime or is this negligible?
Description was changed from ========== Collect separate V8 histograms for Youtube and Top10 sites BUG= R=jochen@chromium.org,hpayer@chromium.org,asvitkine@chromium.org ========== to ========== Collect separate V8 histograms for Top10 non-Google sites BUG= R=jochen@chromium.org,hpayer@chromium.org,asvitkine@chromium.org ==========
lgtm
On 2015/10/20 13:56:57, battre wrote: > lgtm Sorry, NOT LGTM. I clicked on the wrong link.
On 2015/10/20 13:58:05, battre wrote: > On 2015/10/20 13:56:57, battre wrote: > > lgtm > > Sorry, NOT LGTM. I clicked on the wrong link. jochen@ and battre@ PTAL PS#3.
https://codereview.chromium.org/1399853004/diff/40001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/40001/content/renderer/render... content/renderer/render_thread_impl.cc:310: (host_tokens[1] == "linkedin")) according to http://www.alexa.com/topsites linkedin is #11 (excluding google sites), but wikipedia is on the list I'm not sure we should ignore the TLDs, I also don't understand why you require three components? what about .co.uk? if you really want to filter out the tld, use net::registry_controlled_domains::GetDomainAndRegistry() to extract this information. I would vote for just hardcoding the names. also, please add { } for a multi-line if
PTAL PS#5. battre@, is the list of pages ok from a privacy point of view? https://codereview.chromium.org/1399853004/diff/40001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/40001/content/renderer/render... content/renderer/render_thread_impl.cc:310: (host_tokens[1] == "linkedin")) On 2015/10/21 13:17:57, jochen wrote: > according to http://www.alexa.com/topsites linkedin is #11 (excluding google > sites), but wikipedia is on the list > > I'm not sure we should ignore the TLDs, I also don't understand why you require > three components? > > what about .co.uk? if you really want to filter out the tld, use > net::registry_controlled_domains::GetDomainAndRegistry() to extract this > information. > > I would vote for just hardcoding the names. > > also, please add { } for a multi-line if You are right: Removed linkedin, added wikipedia Using GetDomainAndRegistry now, thanks I still against hardcoding the names because just have a look at wikipedia: de.wikipedia.org, jp.wikipedia.org and the reverse is also happening like in Amazon's case: amazon.com, amazon.de. I think we do not only want US-centric data.
Privacy wise, this LGTM as it covers huge sites into one bucket and does not disclose much about individual browsing.
https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... content/renderer/render_thread_impl.cc:318: return ".top10"; Can you make a helper function IsAlexaTop10Site() and have the logic in this function just do: if (IsAlexaTop10Site(host)) return ".top10";
https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... content/renderer/render_thread_impl.cc:308: if (host_tokens.size() >= 2) { why >= 2? what's wrong with e.g. amazon.com? given that none of those sites owns all TLDs, I still think it would be better to hardcode the set of URLs that are actually in the top10
https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... content/renderer/render_thread_impl.cc:308: if (host_tokens.size() >= 2) { On 2015/10/23 12:16:21, jochen wrote: > why >= 2? > > what's wrong with e.g. amazon.com? > > given that none of those sites owns all TLDs, I still think it would be better > to hardcode the set of URLs that are actually in the top10 Do you expect that the traffic to sites not owned by the brands would be noticeable?
Did the requested changes. I created UnitTests during development. Unfortunately I don't think it is a good idea to add them because in that case I would need to expose the function etc. https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... content/renderer/render_thread_impl.cc:308: if (host_tokens.size() >= 2) { On 2015/10/23 17:49:59, battre wrote: > On 2015/10/23 12:16:21, jochen wrote: > > why >= 2? > > > > what's wrong with e.g. amazon.com? > > > > given that none of those sites owns all TLDs, I still think it would be better > > to hardcode the set of URLs that are actually in the top10 > > Do you expect that the traffic to sites not owned by the brands would be > noticeable? on '.size() >= 2': Well, it will trigger for 'amazon.com' or 'amazon.co.uk'. Length != index or something has changed in the definition of size(). On the filtering of TLDs: -Redid it a little bit. Now the TLD is only ignored if this scheme applies to the particular site. https://codereview.chromium.org/1399853004/diff/80001/content/renderer/render... content/renderer/render_thread_impl.cc:318: return ".top10"; On 2015/10/22 16:43:06, Alexei Svitkine (slow) wrote: > Can you make a helper function IsAlexaTop10Site() and have the logic in this > function just do: > > if (IsAlexaTop10Site(host)) > return ".top10"; Done.
https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... content/renderer/render_thread_impl.cc:282: bool IsAlexaTop10NonGoogleSite(const std::string& host) { just make this a method of HistogramCustomizer, then you can include your unit tests https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... content/renderer/render_thread_impl.cc:289: net::registry_controlled_domains::GetDomainAndRegistry( please use "git cl format" on this cl
I think it's totally worth having a unit test, even if it means you have to expose a method. You can always expose it with a ForTesting suffix, or in an internal namespace, to make it clear that other code outside of tests shouldn't use it. On Mon, Oct 26, 2015 at 11:29 AM, <jochen@chromium.org> wrote: > > > https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... > File content/renderer/render_thread_impl.cc (right): > > > https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... > content/renderer/render_thread_impl.cc:282: bool > IsAlexaTop10NonGoogleSite(const std::string& host) { > just make this a method of HistogramCustomizer, then you can include > your unit tests > > > https://codereview.chromium.org/1399853004/diff/100001/content/renderer/rende... > content/renderer/render_thread_impl.cc:289: > net::registry_controlled_domains::GetDomainAndRegistry( > please use "git cl format" on this cl > > https://codereview.chromium.org/1399853004/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
PTAL. I also took the liberty to move HostToCustomHistogramSuffix into HistogramCustomizer.
lgtm
lgtm https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:506: if (this->IsAlexaTop10NonGoogleSite(host)) Nit: No need for this-> https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:514: // The Top10 sites have different TLD and/or subdomains Nit: Some of the words on the line below can fit here. https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:515: // depending on the localization Nit: Terminate with a .
https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... File content/renderer/render_thread_impl.cc (right): https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:506: if (this->IsAlexaTop10NonGoogleSite(host)) On 2015/10/27 15:20:04, Alexei Svitkine (slow) wrote: > Nit: No need for this-> Done. https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:514: // The Top10 sites have different TLD and/or subdomains On 2015/10/27 15:20:04, Alexei Svitkine (slow) wrote: > Nit: Some of the words on the line below can fit here. Done. https://codereview.chromium.org/1399853004/diff/120001/content/renderer/rende... content/renderer/render_thread_impl.cc:515: // depending on the localization On 2015/10/27 15:20:04, Alexei Svitkine (slow) wrote: > Nit: Terminate with a . Done.
The CQ bit was checked by hablich@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from hpayer@chromium.org, battre@chromium.org, jochen@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1399853004/#ps140001 (title: "Took care of nits")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1399853004/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1399853004/140001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_chromium_gn_compile_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/android_chromiu...)
The CQ bit was checked by hablich@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from jochen@chromium.org, hpayer@chromium.org, asvitkine@chromium.org, battre@chromium.org Link to the patchset: https://codereview.chromium.org/1399853004/#ps160001 (title: "Changed UnitTests so they don't need to convert bool")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1399853004/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1399853004/160001
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/d6a4f129ef76ddd685d4d9ae6f549cc1fdb9e126 Cr-Commit-Position: refs/heads/master@{#356530} |