|
|
Created:
6 years, 8 months ago by bolian Modified:
6 years, 7 months ago Reviewers:
jar (doing other things), Ilya Sherman, Matt Welsh, bengr, Alexei Svitkine (slow), pauljensen, mef CC:
chromium-reviews, jar (doing other things), asvitkine+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@operator_id Visibility:
Public. |
DescriptionRecord Mobile Operator ID in UMA report. This is Android only for now. See the linked bug for more background info.
Adding this field has been approved by privacy team (see #5 in the linked bug).
This CL depends on
https://codereview.chromium.org/238793007/
BUG=355604
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=268731
Patch Set 1 #Patch Set 2 : . #
Total comments: 4
Patch Set 3 : addressed comments. #
Total comments: 7
Patch Set 4 : revert proto change and emit a new histogram #Patch Set 5 : . #
Total comments: 8
Patch Set 6 : addressed comments. #
Total comments: 1
Patch Set 7 : log on connection type change. #Patch Set 8 : . #
Total comments: 2
Patch Set 9 : addressed comments. #Patch Set 10 : . #Patch Set 11 : log non-mobile code as well. #
Total comments: 4
Patch Set 12 : addressed comments. #Messages
Total messages: 53 (0 generated)
Hello Jim, Alex and Ben, Please review this CL that adds operator ID into the UMA report. It depends on https://codereview.chromium.org/238793007/. Please see the linked bug for background info. Thanks, Bolian
Can you record this data as a histogram rather than including it in the systemprofile message?
The purpose is to associate this data with existing histograms, for example, histograms for data saving and page load times through the data reduction proxy, so that we can understand the impact of the carrier's network on those histograms. Reporting it as a separate histogram does not serve the purpose -- unless I misread your suggestion.
ping?
Some questions for you below. By the way, before this can land in Chromium, you should land the change server-side first. (Though up to you which review you want to have the conversation on.) Also, have you asked Chrome Privacy team about logging this? https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... File chrome/common/metrics/proto/system_profile.proto (right): https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:294: // the current registered operator. What is the "current registered operator"? Can this change while Chrome is running? If so, what will be logged? (This should be documented.) Is there a reason this is a separate message rather than using the id below? https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:295: optional string id = 1; Should this field be called mmc_mnc? I think that's the terminology used in other products' logs.
I should have mentioned this in the description. Yes, the privacy team has agreed to add this. See #5 in the bug. I will make the server side change before landing this. BTW, is the server side just a proto change? https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... File chrome/common/metrics/proto/system_profile.proto (right): https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:294: // the current registered operator. I removed the message type since there is only one field for now. I was thinking more fields (like operator name) might be needed. But we can do that later. Yes, the value can change - there is no mobile network at the spot. Updated comment. https://codereview.chromium.org/246553003/diff/20001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:295: optional string id = 1; Done. Changed id to mcc_mnc (you mean mcc_mnc, right?). MCC and MNC are the terms used the Android SDK API that provides this data.
At minimum, the server-side changes would be to update the protos. If you want the UMA dashboard to support filtering by this, then more changes would be required (though these can be done later). You can reach out to uma-team@ for the details about those after. https://codereview.chromium.org/246553003/diff/40001/chrome/common/metrics/pr... File chrome/common/metrics/proto/system_profile.proto (right): https://codereview.chromium.org/246553003/diff/40001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:293: // If there is no mobile network, the value is the empty string. How about when it changes during the the course of an UMA log? I think in that case it would be best to also report an empty string. This would require a bit more involved changes to the metrics code - e.g. keeping a timestamp of when it changed and compare that to the start of the log. If you look at the code for synthetic trials, they do something similar.
We don't need a dashboard for this. Our cron job will pick up the proto change. https://codereview.chromium.org/246553003/diff/40001/chrome/common/metrics/pr... File chrome/common/metrics/proto/system_profile.proto (right): https://codereview.chromium.org/246553003/diff/40001/chrome/common/metrics/pr... chrome/common/metrics/proto/system_profile.proto:293: // If there is no mobile network, the value is the empty string. Is MetricsLog::RecordEnvironment eventually called when I call, for example, UMA_HISTOGRAM_COUNTS, so that the environment is recorded for that particular metric? Or, MetricsLog::RecordEnvironment is actually called when reporting the metrics through WiFi? If it is the first case, I think the chance to record a metric during network transition is low, and we can accept some errors. If it is the latter case, I have to reconsider.
I still think that it would be more appropriate to emit to a histogram for this metric, especially if you aren't interested in filtering by this field on the UMA dashboard. It should be possible to write a Dremel query to break down one histogram by the values reported in another. uma-team@ can help you write such a query, if you're unsure of how to proceed.
Do you mean log the value each time the network changes in a histogram say, OperatorID. Then, for the same user, the PLT.PT_BeginToFinish values can be categorized into groups based on the timestamps of the OperatorID metric values?
On 2014/04/24 04:39:31, bolian wrote: > Do you mean log the value each time the network changes in a histogram say, > OperatorID. Then, for the same user, the PLT.PT_BeginToFinish values can be > categorized into groups based on the timestamps of the OperatorID metric > values? Yes, that's the idea. Except, histograms don't have timestamps, but they're uploaded every 10 or 15 minutes on mobile, so most of the time the user won't have switched networks during a single upload, and hence you can associate all other histograms from that same upload with the mobile network the user was using. If you happen to see the user switch networks during a single recording window, then you'll of course have some ambiguity; but that's no worse than if you were to include the field in the system profile.
>> If you happen to see the user switch networks during a single recording >> window, then you'll of course have some ambiguity; but that's no worse than if >> you were to include the field in the system profile. So, the system profile is populated only when uploading the data not and associated with each upload, not each data point in a histogram? I heard that upload happens on WiFi (right?), then isn't the connection_type in the network message will always be CONNECTION_WIFI? I want to isolate the cases when 3G is used from when WiFi is used.
On 2014/04/24 05:53:24, bolian wrote: > >> If you happen to see the user switch networks during a single recording > >> window, then you'll of course have some ambiguity; but that's no worse than > if > >> you were to include the field in the system profile. > So, the system profile is populated only when uploading the data not and > associated with each upload, not each data point in a histogram? Correct, there is only a single system profile per upload. > I heard that upload happens on WiFi (right?), then isn't the connection_type in > the network message will always be CONNECTION_WIFI? Well, I think we might only *send* uploads when WiFi is available. However, I think we cut a log every 10 or 15 minutes. If WiFi isn't available, that log is saved to disk. Once WiFi is available, all the persisted logs should be uploaded. There's a maximum number of uploads that we're willing to save to disk, but other than that, everything should work fine for users who connect to WiFi at least with moderate frequency.
Got it. Thanks for explaining this. However, it is still not clear to me why you think it is more appropriate to emit a new histogram. Conceptually, the network operator ID is very similar to the current connection_type, etc fields in the network message in system_profile.proto. It seems also easier to analyze if the operator id is in the network message. For example, our analysis job would look at connection_type and operator ID together. Are you concerned about the bandwidth/storage impact? Anything else?
On 2014/04/24 20:34:47, bolian wrote: > However, it is still not clear to me why you think it is more appropriate to > emit a new histogram. Conceptually, the network operator ID is very similar to > the current connection_type, etc fields in the network message in > system_profile.proto. It seems also easier to analyze if the operator id is in > the network message. For example, our analysis job would look at connection_type > and operator ID together. Are you concerned about the bandwidth/storage impact? > Anything else? I'm concerned about all of the client-side code involved in adding this information to the system_profile.proto. In retrospect, I think that adding the connection_type to SystemProfile was also a mistake -- my bad for not coming to this realization sooner. Histograms have a few key advantages: (1) They don't require tight coupling with the general purpose metrics code under //chrome/browser/metrics. It's easy for client code to emit arbitrary histograms, without adding new dependencies into the general-purpose metrics code. (2) Histograms are naturally more expressive. For example, in the case that the network properties change during a single metrics gathering session, a histogram can show the full set of carriers that were used during that session. That's harder to represent in the SystemProfile. (3) At the same time, histograms make it hard to emit potentially PII data. For example, are we certain that mcc_mnc is never PII? I bet there are some networks that have very few users.
Thanks, Ilya. This sounds good. I can work on the change to emit a histogram. However, from what I see, a histogram should be a number or a count, what I want is a string. Of course, I can convert the string to a number and log that, but it is not intuitive. Using enum will be too much. Any better suggestion? Thanks for bearing with me for these questions.
On 2014/04/25 17:52:57, bolian wrote: > Thanks, Ilya. This sounds good. I can work on the change to emit a histogram. > However, from what I see, a histogram should be a number or a count, what I want > is a string. Of course, I can convert the string to a number and log that, but > it is not intuitive. Using enum will be too much. Any better suggestion? > Thanks for bearing with me for these questions. I was expecting that you could emit to an enumerated histogram. Why would using an enum be too much? Note that one of the advantages that I listed to using histograms over writing a string to the system profile is that histograms make it hard to log PII. The converse of this statement is that strings make it easy to accidentally log PII.
The MCC and MNC list is large (http://en.wikipedia.org/wiki/Mobile_country_code) We can log MCC and MNC separately, but it is still pretty large. We can start with a few large carriers I guess and grow the list as needed, what do you think?
On 2014/04/25 18:46:18, bolian wrote: > The MCC and MNC list is large (http://en.wikipedia.org/wiki/Mobile_country_code) > We can log MCC and MNC separately, but it is still pretty large. > We can start with a few large carriers I guess and grow the list as needed, what > do you think? Yes, let's start with the carriers that we're interested in, and expand as needed. Alternately, you can use a sparse histogram. Since both MCC and MNC are numeric, that would allow you to record data for all the carriers. You could then add human-readable labels just for major carriers.
https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_network_observer.cc (right): https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_network_observer.cc:21: return device_info.GetNetworkOperator(); Sorry to have not caught this in your last CL, but why is DeviceTelephonyInfo a class? It holds no state. https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_network_observer.cc:134: void MetricsNetworkObserver::OnNetworkOperatorID(std::string id) { const std::string& https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_network_observer.h (right): https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_network_observer.h:57: void OnNetworkOperatorID(std::string); Can this be OnNetworkOperatorID(const std::string& id) ?
Hi Ilya, Alexei, and Ben, I reverted the proto change and added the ID as a new sparse UMA under net. This will require moving the telephony API from content/ to net/, which I made a separate CL: https://codereview.chromium.org/254823005 If we can agree on the right place to add the UMA, I will try to push the dependent CL and add reviewers for net/. Thanks, Bolian https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... File chrome/browser/metrics/metrics_network_observer.cc (right): https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_network_observer.cc:21: return device_info.GetNetworkOperator(); It is existing code. I added a method. Anyway, it is reverted in the new CL. https://codereview.chromium.org/246553003/diff/40001/chrome/browser/metrics/m... chrome/browser/metrics/metrics_network_observer.cc:134: void MetricsNetworkObserver::OnNetworkOperatorID(std::string id) { reverted in the new CL.
https://codereview.chromium.org/246553003/diff/80001/net/base/network_change_... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/80001/net/base/network_change_... net/base/network_change_notifier.cc:263: // MCC and MNC codes are each 3 digits. From scanning the Wikipedia article, it looked like MCC was always 3 digits, but MNC could be anywhere between 1 and 3 digits. Is the length being normalized internally to android::GetTelephonyNetworkOperator()? https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10582: +<histogram name="NCN.NetworkOperatorMCCMNC" units="count"> nit: No need for such a generic unit. Units are nice to add when they clarify semantics, but not required otherwise. https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10589: + less than six digits, read it as prefix padding it with zeros. nit: There are several doubled-up spaces that should be single spaces, e.g. between "it" and "with" on this line. https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10589: + less than six digits, read it as prefix padding it with zeros. I don't see how the decimal integer could have less than six digits, unless it's the pair (0, 0) representing an unknown carrier (which, btw, you're not currently logging in the C++ code -- is that intentional)? All of the MCCs seem to be three-digit numbers that do not start with zero.
https://codereview.chromium.org/246553003/diff/80001/net/base/network_change_... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/80001/net/base/network_change_... net/base/network_change_notifier.cc:263: // MCC and MNC codes are each 3 digits. My bad. Now, I think I should only check the return is not empty. https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10582: +<histogram name="NCN.NetworkOperatorMCCMNC" units="count"> On 2014/04/26 02:29:12, Ilya Sherman wrote: > nit: No need for such a generic unit. Units are nice to add when they clarify > semantics, but not required otherwise. Done. https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10589: + less than six digits, read it as prefix padding it with zeros. On 2014/04/26 02:29:12, Ilya Sherman wrote: > nit: There are several doubled-up spaces that should be single spaces, e.g. > between "it" and "with" on this line. Done. https://codereview.chromium.org/246553003/diff/80001/tools/metrics/histograms... tools/metrics/histograms/histograms.xml:10589: + less than six digits, read it as prefix padding it with zeros. I was thinking the case when MCC is 001, so when converting it to integer the prefix "00" will be lost. But given that is only the test network, I should ignore that. Yes, I intent to not logging unknown or no operator case. I think that does not provide useful information.
Histograms LGTM with the inline comment addressed. Thanks! https://codereview.chromium.org/246553003/diff/100001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/246553003/diff/100001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:10588: + operator. Please mention that the first three digits correspond to the MCC, and the rest to the MNC. Please also provide a link to some document that helps decode these values. You might also want to associate an enum that provides human-readable labels for some of the most common network operators.
Thanks. I updated the CL to log the UMA on connection type change instead of network change and updated the metric name. As chatted with Ilya through email. I will have a separate CL to log operator code for each upload right before cut.
Hi Misha, Could you take a look at the net/ part of the change? Also, this change requires moving the GetNetworkOperator API from content/ to net/, which I sent you a separate CL: https://codereview.chromium.org/254823005/ Thanks, Bolian
lgtm
+pauljensen, FYI as possibly relevant to your project. -m
https://codereview.chromium.org/246553003/diff/140001/net/base/network_change... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/140001/net/base/network_change... net/base/network_change_notifier.cc:240: int value = atoi(mcc_mnc.c_str()); Nit: Use StringToUint() from base/strings/string_number_conversions.h instead. If you use that and check its return value, you don't need the empty() check either.
https://codereview.chromium.org/246553003/diff/140001/net/base/network_change... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/140001/net/base/network_change... net/base/network_change_notifier.cc:240: int value = atoi(mcc_mnc.c_str()); On 2014/04/29 16:35:44, Alexei Svitkine wrote: > Nit: Use StringToUint() from base/strings/string_number_conversions.h instead. > > If you use that and check its return value, you don't need the empty() check > either. Done.
lgtm
As the primary user of this data, I'm not too happy about this being made into a separate histogram which only enumerates only certain carriers. This seems like the wrong design to me, for several reasons. Primarily we need to be able to write queries and slice up UMA metrics (whatever they are) by different properties of the device: - Device type - Chrome version - Network connection type - Carrier We currently have the first three, and are missing the last one. All of our UMA processing pipeline code pulls these values out of system_profile and having to special case the carrier really doesn't make sense to me. It is going to be a lot more work for us to deal with joining against a separate histogram than it is to pluck the information we need right out of the proto field where it belongs. You guys are not the only ones writing code that processes UMA protos! Second, I disagree with the assertion in #11 that "most of the time the user will not have changed networks during an upload". There are several problems here. UMA metrics aren't uploaded on 3G networks at all, so I'm not quite clear on how you would expect us to tease apart UMA reports from the same user based on which network they were on *at the time the metric was collected* (as opposed to when it is uploaded), since by definition the user's network would be "WiFi" when they are in the process of uploading. This seems very brittle to me. Third, if the concern is that adding this field will somehow break other tools that ingest UMA protos (and it shouldn't), I am willing to take on whatever work is needed to plumb this information though those tools. Thanks.
On 2014/04/29 22:55:12, Matt Welsh wrote: > As the primary user of this data, I'm not too happy about this being made into a > separate histogram which only enumerates only certain carriers. This seems like > the wrong design to me, for several reasons. > > Primarily we need to be able to write queries and slice up UMA metrics (whatever > they are) by different properties of the device: > - Device type > - Chrome version > - Network connection type > - Carrier > > We currently have the first three, and are missing the last one. All of our UMA > processing pipeline code pulls these values out of system_profile and having to > special case the carrier really doesn't make sense to me. It is going to be a > lot more work for us to deal with joining against a separate histogram than it > is to pluck the information we need right out of the proto field where it > belongs. You guys are not the only ones writing code that processes UMA protos! I don't quite follow why it's much harder to join against a separate histogram than to read from the SystemProfile. Is it just the difference between repeated and non-repeated fields? FWIW, plenty of other people have been successful at joining over multiple histograms. > Second, I disagree with the assertion in #11 that "most of the time the user > will not have changed networks during an upload". There are several problems > here. UMA metrics aren't uploaded on 3G networks at all, so I'm not quite clear > on how you would expect us to tease apart UMA reports from the same user based > on which network they were on *at the time the metric was collected* (as opposed > to when it is uploaded), since by definition the user's network would be "WiFi" > when they are in the process of uploading. This seems very brittle to me. This is independent of whether or not the field is included in the SystemProfile. UMA metrics are only uploaded on WiFi, but I believe that logs are still cut and serialized once every 5 minutes. The SystemProfile or histogram value would be captured at the time the log is cut, not at the time of the upload. Hence, my assertion is really that during most 5-minute windows of use, users are not going to be changing networks. > Third, if the concern is that adding this field will somehow break other tools > that ingest UMA protos (and it shouldn't), I am willing to take on whatever work > is needed to plumb this information though those tools. The concern is not that, but rather that the SystemProfile is the wrong tool for the job. Many fields were added to it before we had support for sparse histograms. Now that sparse histograms are available, it ought to be relatively rare to need to make changes to the SystemProfile message.
I may be unclear on what you have in mind in terms of how I would effectively join these records together. Our MapReduce code which processes UMA data currently ingests each individual ChromeUserMetricsExtension proto and processes it to extract values from specific histograms we care about. We also extract values from the system_profile message to determine the Chrome version, network connection type, and set of field trials the user is in. This can be done trivially within the mapper for each individual record and no joining across records is needed. Your proposal seems to be to scan the records in time series order, pull some data out of the proposed "carrier histogram", remember that for the user, and then apply that carrier information to subsequent records coming from that same user at the time of upload. Is that correct? If so, this seems really unnecessarily complicated. In particular, how do we determine that records are coming from the same user? (Is there a unique ID? IP address is not good enough given that many users can share the same IP behind a NAT.) Also, how do we determine the correspondence between the "carrier histogram" and other histograms of interest (e.g., PLT.PT_BeginToFinish, etc.)? Is there a guarantee of the order in which they are uploaded? Or some way to detect the segmentation? Do you have code for this already? Finally, how do you propose to ensure that all carriers are included in this proposed histogram? We need data from all carriers, not just a few. Thanks.
It should be possible to make a histogram that's logged on every UMA upload. That way, you won't need to look at any other UMA records to get at that data. You'd just need to find that histogram and extract the value from it (similar to what you would do from the system profile except with a bit of iteration through histograms to find it). The histogram could still log all the carriers. A sparse histogram will log all values that it gets, regardless of whether all the enums are in histograms.xml. You just won't get pretty names for the ones that aren't in the XML file on the UMA dashboard. (But you can expand the XML list over time as you see popular carriers in the data). Does that make sense?
On 2014/05/01 18:32:43, Alexei Svitkine wrote: > It should be possible to make a histogram that's logged on every UMA upload. > That way, you won't need to look at any other UMA records to get at that data. > You'd just need to find that histogram and extract the value from it (similar to > what you would do from the system profile except with a bit of iteration through > histograms to find it). > > The histogram could still log all the carriers. A sparse histogram will log all > values that it gets, regardless of whether all the enums are in histograms.xml. > You just won't get pretty names for the ones that aren't in the XML file on the > UMA dashboard. (But you can expand the XML list over time as you see popular > carriers in the data). > > Does that make sense? OK, it wasn't clear to me that one could assume that a given UMA report contained all of the histograms from a given upload. I don't really understand the upload process and how the data can be segmented. Since we are now talking about a data dependency *across histograms* in order to interpret things appropriately, is that a guarantee you're willing to commit to long term?
On 2014/05/01 18:34:46, Matt Welsh wrote: > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > It should be possible to make a histogram that's logged on every UMA upload. > > That way, you won't need to look at any other UMA records to get at that data. > > You'd just need to find that histogram and extract the value from it (similar > to > > what you would do from the system profile except with a bit of iteration > through > > histograms to find it). > > > > The histogram could still log all the carriers. A sparse histogram will log > all > > values that it gets, regardless of whether all the enums are in > histograms.xml. > > You just won't get pretty names for the ones that aren't in the XML file on > the > > UMA dashboard. (But you can expand the XML list over time as you see popular > > carriers in the data). > > > > Does that make sense? > > OK, it wasn't clear to me that one could assume that a given UMA report > contained all of the histograms from a given upload. > I don't really understand the upload process and how the data can be segmented. > Since we are now talking about a data > dependency *across histograms* in order to interpret things appropriately, is > that a guarantee you're willing to commit to > long term? Yes, each upload/report is a single ChromeUserMetricsExtension message. This is indeed a guaranteed aspect of the UMA design. If we ever change this, it'll be a lot of work, and we'll be careful to work with stakeholders to migrate them over... but I really don't anticipate us wanting to change this anytime in the foreseeable future.
On 2014/05/01 18:39:55, Ilya Sherman wrote: > On 2014/05/01 18:34:46, Matt Welsh wrote: > > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > > It should be possible to make a histogram that's logged on every UMA upload. > > > That way, you won't need to look at any other UMA records to get at that > data. > > > You'd just need to find that histogram and extract the value from it > (similar > > to > > > what you would do from the system profile except with a bit of iteration > > through > > > histograms to find it). > > > > > > The histogram could still log all the carriers. A sparse histogram will log > > all > > > values that it gets, regardless of whether all the enums are in > > histograms.xml. > > > You just won't get pretty names for the ones that aren't in the XML file on > > the > > > UMA dashboard. (But you can expand the XML list over time as you see popular > > > carriers in the data). > > > > > > Does that make sense? > > > > OK, it wasn't clear to me that one could assume that a given UMA report > > contained all of the histograms from a given upload. > > I don't really understand the upload process and how the data can be > segmented. > > Since we are now talking about a data > > dependency *across histograms* in order to interpret things appropriately, is > > that a guarantee you're willing to commit to > > long term? > > Yes, each upload/report is a single ChromeUserMetricsExtension message. This is > indeed a guaranteed aspect of the UMA design. If we ever change this, it'll be > a lot of work, and we'll be careful to work with stakeholders to migrate them > over... but I really don't anticipate us wanting to change this anytime in the > foreseeable future. Is there a size limit on the individual ChromeUserMetricsExtension message? (Would it ever get split up?)
On 2014/05/01 19:01:10, Matt Welsh wrote: > On 2014/05/01 18:39:55, Ilya Sherman wrote: > > On 2014/05/01 18:34:46, Matt Welsh wrote: > > > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > > > It should be possible to make a histogram that's logged on every UMA > upload. > > > > That way, you won't need to look at any other UMA records to get at that > > data. > > > > You'd just need to find that histogram and extract the value from it > > (similar > > > to > > > > what you would do from the system profile except with a bit of iteration > > > through > > > > histograms to find it). > > > > > > > > The histogram could still log all the carriers. A sparse histogram will > log > > > all > > > > values that it gets, regardless of whether all the enums are in > > > histograms.xml. > > > > You just won't get pretty names for the ones that aren't in the XML file > on > > > the > > > > UMA dashboard. (But you can expand the XML list over time as you see > popular > > > > carriers in the data). > > > > > > > > Does that make sense? > > > > > > OK, it wasn't clear to me that one could assume that a given UMA report > > > contained all of the histograms from a given upload. > > > I don't really understand the upload process and how the data can be > > segmented. > > > Since we are now talking about a data > > > dependency *across histograms* in order to interpret things appropriately, > is > > > that a guarantee you're willing to commit to > > > long term? > > > > Yes, each upload/report is a single ChromeUserMetricsExtension message. This > is > > indeed a guaranteed aspect of the UMA design. If we ever change this, it'll > be > > a lot of work, and we'll be careful to work with stakeholders to migrate them > > over... but I really don't anticipate us wanting to change this anytime in the > > foreseeable future. > > Is there a size limit on the individual ChromeUserMetricsExtension message? > (Would it ever get split up?) There is a theoretical size limit, yes -- the server will reject unreasonably large uploads. In practice, this is not a concern for histograms. The best way to think about this change is as though we are adding a repeated histogram field to the SystemProfile message. If we were to ever split up uploads, we would duplicate the SystemProfile across all of the fragmented uploads, and we would do the same for any histograms that are included in each upload.
On 2014/05/01 19:08:42, Ilya Sherman wrote: > On 2014/05/01 19:01:10, Matt Welsh wrote: > > On 2014/05/01 18:39:55, Ilya Sherman wrote: > > > On 2014/05/01 18:34:46, Matt Welsh wrote: > > > > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > > > > It should be possible to make a histogram that's logged on every UMA > > upload. > > > > > That way, you won't need to look at any other UMA records to get at that > > > data. > > > > > You'd just need to find that histogram and extract the value from it > > > (similar > > > > to > > > > > what you would do from the system profile except with a bit of iteration > > > > through > > > > > histograms to find it). > > > > > > > > > > The histogram could still log all the carriers. A sparse histogram will > > log > > > > all > > > > > values that it gets, regardless of whether all the enums are in > > > > histograms.xml. > > > > > You just won't get pretty names for the ones that aren't in the XML file > > on > > > > the > > > > > UMA dashboard. (But you can expand the XML list over time as you see > > popular > > > > > carriers in the data). > > > > > > > > > > Does that make sense? > > > > > > > > OK, it wasn't clear to me that one could assume that a given UMA report > > > > contained all of the histograms from a given upload. > > > > I don't really understand the upload process and how the data can be > > > segmented. > > > > Since we are now talking about a data > > > > dependency *across histograms* in order to interpret things appropriately, > > is > > > > that a guarantee you're willing to commit to > > > > long term? > > > > > > Yes, each upload/report is a single ChromeUserMetricsExtension message. > This > > is > > > indeed a guaranteed aspect of the UMA design. If we ever change this, it'll > > be > > > a lot of work, and we'll be careful to work with stakeholders to migrate > them > > > over... but I really don't anticipate us wanting to change this anytime in > the > > > foreseeable future. > > > > Is there a size limit on the individual ChromeUserMetricsExtension message? > > (Would it ever get split up?) > > There is a theoretical size limit, yes -- the server will reject unreasonably > large uploads. In practice, this is not a concern for histograms. > > The best way to think about this change is as though we are adding a repeated > histogram field to the SystemProfile message. If we were to ever split up > uploads, we would duplicate the SystemProfile across all of the fragmented > uploads, and we would do the same for any histograms that are included in each > upload. I'm still unclear how we will encode the carrier name in this histogram - can you explain?
On 2014/05/01 19:09:50, Matt Welsh wrote: > On 2014/05/01 19:08:42, Ilya Sherman wrote: > > On 2014/05/01 19:01:10, Matt Welsh wrote: > > > On 2014/05/01 18:39:55, Ilya Sherman wrote: > > > > On 2014/05/01 18:34:46, Matt Welsh wrote: > > > > > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > > > > > It should be possible to make a histogram that's logged on every UMA > > > upload. > > > > > > That way, you won't need to look at any other UMA records to get at > that > > > > data. > > > > > > You'd just need to find that histogram and extract the value from it > > > > (similar > > > > > to > > > > > > what you would do from the system profile except with a bit of > iteration > > > > > through > > > > > > histograms to find it). > > > > > > > > > > > > The histogram could still log all the carriers. A sparse histogram > will > > > log > > > > > all > > > > > > values that it gets, regardless of whether all the enums are in > > > > > histograms.xml. > > > > > > You just won't get pretty names for the ones that aren't in the XML > file > > > on > > > > > the > > > > > > UMA dashboard. (But you can expand the XML list over time as you see > > > popular > > > > > > carriers in the data). > > > > > > > > > > > > Does that make sense? > > > > > > > > > > OK, it wasn't clear to me that one could assume that a given UMA report > > > > > contained all of the histograms from a given upload. > > > > > I don't really understand the upload process and how the data can be > > > > segmented. > > > > > Since we are now talking about a data > > > > > dependency *across histograms* in order to interpret things > appropriately, > > > is > > > > > that a guarantee you're willing to commit to > > > > > long term? > > > > > > > > Yes, each upload/report is a single ChromeUserMetricsExtension message. > > This > > > is > > > > indeed a guaranteed aspect of the UMA design. If we ever change this, > it'll > > > be > > > > a lot of work, and we'll be careful to work with stakeholders to migrate > > them > > > > over... but I really don't anticipate us wanting to change this anytime in > > the > > > > foreseeable future. > > > > > > Is there a size limit on the individual ChromeUserMetricsExtension message? > > > (Would it ever get split up?) > > > > There is a theoretical size limit, yes -- the server will reject unreasonably > > large uploads. In practice, this is not a concern for histograms. > > > > The best way to think about this change is as though we are adding a repeated > > histogram field to the SystemProfile message. If we were to ever split up > > uploads, we would duplicate the SystemProfile across all of the fragmented > > uploads, and we would do the same for any histograms that are included in each > > upload. > > I'm still unclear how we will encode the carrier name in this histogram - can > you explain? The way that is already implemented in this CL: leading three digits for MCC, remaining digits for MNC.
OK, I thought there was a comment above about only including certain carriers and not others. If we can include all carriers using a sparse histogram listing MCC/MNC then I don't have any objections to this approach -- it is a bit more work to process on the MR end but not that bad. Thanks. On Thu May 01 2014 at 12:22:01 PM, <isherman@chromium.org> wrote: > On 2014/05/01 19:09:50, Matt Welsh wrote: > > On 2014/05/01 19:08:42, Ilya Sherman wrote: > > > On 2014/05/01 19:01:10, Matt Welsh wrote: > > > > On 2014/05/01 18:39:55, Ilya Sherman wrote: > > > > > On 2014/05/01 18:34:46, Matt Welsh wrote: > > > > > > On 2014/05/01 18:32:43, Alexei Svitkine wrote: > > > > > > > It should be possible to make a histogram that's logged on > > every UMA > > > > upload. > > > > > > > That way, you won't need to look at any other UMA records to > > get at > > that > > > > > data. > > > > > > > You'd just need to find that histogram and extract the value > > from it > > > > > (similar > > > > > > to > > > > > > > what you would do from the system profile except with a bit of > > iteration > > > > > > through > > > > > > > histograms to find it). > > > > > > > > > > > > > > The histogram could still log all the carriers. A sparse > > histogram > > will > > > > log > > > > > > all > > > > > > > values that it gets, regardless of whether all the enums are in > > > > > > histograms.xml. > > > > > > > You just won't get pretty names for the ones that aren't in the > > XML > > file > > > > on > > > > > > the > > > > > > > UMA dashboard. (But you can expand the XML list over time as > > you see > > > > popular > > > > > > > carriers in the data). > > > > > > > > > > > > > > Does that make sense? > > > > > > > > > > > > OK, it wasn't clear to me that one could assume that a given UMA > report > > > > > > contained all of the histograms from a given upload. > > > > > > I don't really understand the upload process and how the data can > > be > > > > > segmented. > > > > > > Since we are now talking about a data > > > > > > dependency *across histograms* in order to interpret things > > appropriately, > > > > is > > > > > > that a guarantee you're willing to commit to > > > > > > long term? > > > > > > > > > > Yes, each upload/report is a single ChromeUserMetricsExtension > > message. > > > This > > > > is > > > > > indeed a guaranteed aspect of the UMA design. If we ever change > > this, > > it'll > > > > be > > > > > a lot of work, and we'll be careful to work with stakeholders to > > migrate > > > them > > > > > over... but I really don't anticipate us wanting to change this > > anytime > in > > > the > > > > > foreseeable future. > > > > > > > > Is there a size limit on the individual ChromeUserMetricsExtension > message? > > > > (Would it ever get split up?) > > > > > > There is a theoretical size limit, yes -- the server will reject > unreasonably > > > large uploads. In practice, this is not a concern for histograms. > > > > > > The best way to think about this change is as though we are adding a > repeated > > > histogram field to the SystemProfile message. If we were to ever split > > up > > > uploads, we would duplicate the SystemProfile across all of the > > fragmented > > > uploads, and we would do the same for any histograms that are included > > in > each > > > upload. > > > I'm still unclear how we will encode the carrier name in this histogram - > > can > > you explain? > > The way that is already implemented in this CL: leading three digits for > MCC, > remaining digits for MNC. > > https://codereview.chromium.org/246553003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
I want to pointed out that this CL by itself is not complete, it is only log the sparse metric on connection type change. No metric logged if no connection change. As Ilya and I have chatted, we need associated each upload with another metric. CL https://codereview.chromium.org/253203002/ starts the conversion for that. I am synthetic trial, but we can discuss that.
As discussed in the meeting, I now also log the operator code when it is 0 for unknown, non-mobile, etc.
lgtm
https://codereview.chromium.org/246553003/diff/190001/net/base/network_change... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/190001/net/base/network_change... net/base/network_change_notifier.cc:8: #include "base/metrics/sparse_histogram.h" Move these to the #if defined(OS_ANDROID) below. https://codereview.chromium.org/246553003/diff/190001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/246553003/diff/190001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:10688: + zero means the connection is changed to a non-mobile network or the operator It would be nice to have a code for Wi-Fi that's separate from, say, a code for ethernet.
https://codereview.chromium.org/246553003/diff/190001/net/base/network_change... File net/base/network_change_notifier.cc (right): https://codereview.chromium.org/246553003/diff/190001/net/base/network_change... net/base/network_change_notifier.cc:8: #include "base/metrics/sparse_histogram.h" On 2014/05/05 21:57:58, bengr1 wrote: > Move these to the #if defined(OS_ANDROID) below. Done. https://codereview.chromium.org/246553003/diff/190001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/246553003/diff/190001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:10688: + zero means the connection is changed to a non-mobile network or the operator You will have NCN.CM.TimeOnEthernet, NCN.CM.TimeOnWifi, etc to know that has happened.
lgtm
The CQ bit was checked by bolian@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bolian@chromium.org/246553003/210001
FYI, CQ is re-trying this CL (attempt #1). Please consider checking whether the failures are real, and report flakes to chrome-troopers@google.com. The failing builders are: win_chromium_compile_dbg on tryserver.chromium
Message was sent while issue was closed.
Change committed as 268731 |