chrome/common/metrics/entropy_provider_unittest.cc - Issue 10830318: Use a different algorithm with the low entropy source for field trials.

Side by Side Diff: chrome/common/metrics/entropy_provider_unittest.cc

Issue 10830318: Use a different algorithm with the low entropy source for field trials. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src/

Patch Set: Created 8 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

« chrome/common/metrics/entropy_provider.cc ('K') | « chrome/common/metrics/entropy_provider.cc ('k') | chrome/common/metrics/variations/variations_util_unittest.cc » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
(Empty)
	1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.

	2 // Use of this source code is governed by a BSD-style license that can be

	3 // found in the LICENSE file.

	4

	5 #include <limits>

	6

	7 #include "base/basictypes.h"

	8 #include "base/memory/scoped_ptr.h"

	9 #include "base/rand_util.h"

	10 #include "base/string_number_conversions.h"

	11 #include "chrome/common/metrics/entropy_provider.h"

	12 #include "testing/gtest/include/gtest/gtest.h"

	13

	14 class EntropyProviderTest : public testing::Test {

	15 public:

	16 // Computes SHA1-based entropy for the given \|trial_name\| based on

	17 // \|entropy_source\|

	18 double GenerateSHA1Entropy(const std::string& entropy_source,

	19 const std::string& trial_name) {

	20 SHA1EntropyProvider sha1_provider(entropy_source);

	21 return sha1_provider.GetEntropyForTrial(trial_name);

	22 }

	23

	24 // Generates permutation-based entropy for the given \|trial_name\| based on

	25 // \|entropy_source\| which must be in the range [0, entropy_max).

	26 double GeneratePermutedEntropy(uint16 entropy_source,

	27 size_t entropy_max,

	28 const std::string& trial_name) {

	29 PermutedEntropyProvider permuted_provider(entropy_source, entropy_max);

	30 return permuted_provider.GetEntropyForTrial(trial_name);

	31 }

	32 };

	33

	34

	35 TEST_F(EntropyProviderTest, UseOneTimeRandomizationSHA1) {

	36 // Simply asserts that two trials using one-time randomization

	37 // that have different names, normally generate different results.

	38 //

	39 // Note that depending on the one-time random initialization, they

	40 // _might_ actually give the same result, but we know that given

	41 // the particular client_id we use for unit tests they won't.

	42 base::FieldTrialList field_trial_list(new SHA1EntropyProvider("client_id"));

	43 scoped_refptr<base::FieldTrial> trials[] = {

	44 base::FieldTrialList::FactoryGetFieldTrial("one", 100, "default",
	jar (doing other things) 2012/08/17 19:06:19 nit: 4 character indent (I think) for this line ov nit: 4 character indent (I think) for this line overflow. Alexei Svitkine (slow) 2012/08/20 15:57:40 Done. Show quoted text On 2012/08/17 19:06:19, jar wrote: > nit: 4 character indent (I think) for this line overflow. Done.
	45 base::FieldTrialList::kExpirationYearInFuture, 1, 1, NULL),

	46 base::FieldTrialList::FactoryGetFieldTrial("two", 100, "default",

	47 base::FieldTrialList::kExpirationYearInFuture, 1, 1, NULL),

	48 };
	jar (doing other things) 2012/08/17 19:06:19 nit: This curly is not a block closure, and is mor nit: This curly is not a block closure, and is more like a paren, which should appear at the end of line 48. Alexei Svitkine (slow) 2012/08/20 15:57:40 That suggestion doesn't seem consistent with the s Show quoted text On 2012/08/17 19:06:19, jar wrote: > nit: This curly is not a block closure, and is more like a paren, which should > appear at the end of line 48. On 2012/08/17 19:06:19, jar wrote: > nit: This curly is not a block closure, and is more like a paren. (the same > later in the file). That suggestion doesn't seem consistent with the style used elsewhere in Chromium codebase where arrays are initialized like this. For example, see some of the results of this code search: http://code.google.com/searchframe#search/&exact_package=chromium&q=test_case... So I'm leaving it as is. jar (doing other things) 2012/08/23 01:38:30 I checked with BrettW, and WillChan (Google readab I checked with BrettW, and WillChan (Google readability reviewers). Brett indicated "usually indent these things 4 spaces since it's a wrapped statement rather than a block." Will was airborne, and not sure, but thought the trailing curly could go on its own line (but presumably indented 4 also?) Other code in other files is probably not a good justification (though I do readily admit it is reasonable... I just don't think it is Chrome style). On 2012/08/20 15:57:40, Alexei Svitkine wrote: Show quoted text > On 2012/08/17 19:06:19, jar wrote: > > nit: This curly is not a block closure, and is more like a paren, which should > > appear at the end of line 48. > > On 2012/08/17 19:06:19, jar wrote: > > nit: This curly is not a block closure, and is more like a paren. (the same > > later in the file). > > That suggestion doesn't seem consistent with the style used elsewhere in > Chromium codebase where arrays are initialized like this. > > For example, see some of the results of this code search: > > http://code.google.com/searchframe#search/&exact_package=chromium&q=test_case... > > So I'm leaving it as is. Alexei Svitkine (slow) 2012/08/23 15:10:43 Done. Show quoted text On 2012/08/23 01:38:30, jar wrote: > I checked with BrettW, and WillChan (Google readability reviewers). Brett > indicated > "usually indent these things 4 spaces since it's a wrapped statement rather than > a block." > > Will was airborne, and not sure, but thought the trailing curly could go on its > own line (but presumably indented 4 also?) > > Other code in other files is probably not a good justification (though I do > readily admit it is reasonable... I just don't think it is Chrome style). Done.
	49

	50 for (size_t i = 0; i < arraysize(trials); ++i) {

	51 trials[i]->UseOneTimeRandomization();

	52

	53 for (int j = 0; j < 100; ++j)

	54 trials[i]->AppendGroup("", 1);

	55 }

	56

	57 // The trials are most likely to give different results since they have

	58 // different names.

	59 ASSERT_NE(trials[0]->group(), trials[1]->group());

	60 ASSERT_NE(trials[0]->group_name(), trials[1]->group_name());

	61 }

	62

	63 TEST_F(EntropyProviderTest, UseOneTimeRandomizationPermuted) {

	64 // Simply asserts that two trials using one-time randomization

	65 // that have different names, normally generate different results.

	66 //

	67 // Note that depending on the one-time random initialization, they

	68 // _might_ actually give the same result, but we know that given

	69 // the particular client_id we use for unit tests they won't.

	70 const size_t kMaxEntropySize = (1 << 13);

	71 base::FieldTrialList field_trial_list(

	72 new PermutedEntropyProvider(1234, kMaxEntropySize));

	73 scoped_refptr<base::FieldTrial> trials[] = {

	74 base::FieldTrialList::FactoryGetFieldTrial("one", 100, "default",
	jar (doing other things) 2012/08/17 19:06:19 nit: 4 character indent nit: 4 character indent Alexei Svitkine (slow) 2012/08/20 15:57:40 Done. Show quoted text On 2012/08/17 19:06:19, jar wrote: > nit: 4 character indent Done.
	75 base::FieldTrialList::kExpirationYearInFuture, 1, 1, NULL),

	76 base::FieldTrialList::FactoryGetFieldTrial("two", 100, "default",

	77 base::FieldTrialList::kExpirationYearInFuture, 1, 1, NULL),

	78 };
	jar (doing other things) 2012/08/17 19:06:19 nit: This curly is not a block closure, and is mor nit: This curly is not a block closure, and is more like a paren. (the same later in the file). Alexei Svitkine (slow) 2012/08/20 15:57:40 That suggestion doesn't seem consistent with the s Show quoted text On 2012/08/17 19:06:19, jar wrote: > nit: This curly is not a block closure, and is more like a paren. (the same > later in the file). That suggestion doesn't seem consistent with the style used elsewhere in Chromium codebase where arrays are initialized like this. For example, see some of the results of this code search: http://code.google.com/searchframe#search/&exact_package=chromium&q=test_case... So I'm leaving it as is.
	79

	80 for (size_t i = 0; i < arraysize(trials); ++i) {

	81 trials[i]->UseOneTimeRandomization();

	82

	83 for (int j = 0; j < 100; ++j)

	84 trials[i]->AppendGroup("", 1);

	85 }

	86

	87 // The trials are most likely to give different results since they have

	88 // different names.

	89 ASSERT_NE(trials[0]->group(), trials[1]->group());

	90 ASSERT_NE(trials[0]->group_name(), trials[1]->group_name());

	91 }

	92

	93 TEST_F(EntropyProviderTest, SHA1Entropy) {

	94 double results[] = {

	95 GenerateSHA1Entropy("hi", "1"),
	jar (doing other things) 2012/08/17 19:06:19 nit: indent 4. nit: indent 4. Alexei Svitkine (slow) 2012/08/20 15:57:40 Done. Show quoted text On 2012/08/17 19:06:19, jar wrote: > nit: indent 4. Done.
	96 GenerateSHA1Entropy("there", "1"),

	97 };

	98 ASSERT_NE(results[0], results[1]);

	99 for (size_t i = 0; i < arraysize(results); ++i) {

	100 ASSERT_LE(0.0, results[i]);

	101 ASSERT_GT(1.0, results[i]);

	102 }

	103

	104 ASSERT_EQ(GenerateSHA1Entropy("yo", "1"),

	105 GenerateSHA1Entropy("yo", "1"));

	106 ASSERT_NE(GenerateSHA1Entropy("yo", "something"),

	107 GenerateSHA1Entropy("yo", "else"));

	108 }

	109

	110 TEST_F(EntropyProviderTest, PermutedEntropy) {

	111 const size_t kMaxEntropySize = (1 << 13);

	112 double results[] = {

	113 GeneratePermutedEntropy(1234, kMaxEntropySize, "1"),

	114 GeneratePermutedEntropy(4321, kMaxEntropySize, "1"),

	115 };

	116 ASSERT_NE(results[0], results[1]);

	117 for (size_t i = 0; i < arraysize(results); ++i) {

	118 ASSERT_LE(0.0, results[i]);

	119 ASSERT_GT(1.0, results[i]);

	120 }

	121

	122 ASSERT_EQ(GeneratePermutedEntropy(1234, kMaxEntropySize, "1"),

	123 GeneratePermutedEntropy(1234, kMaxEntropySize, "1"));

	124 ASSERT_NE(GeneratePermutedEntropy(1234, kMaxEntropySize, "something"),

	125 GeneratePermutedEntropy(1234, kMaxEntropySize, "else"));

	126 }

	127

	128 TEST_F(EntropyProviderTest, SHA1EntropyIsUniform) {

	129 // Choose a random start number but go sequentially from there, so

	130 // that each test tries a different range but we never provide uniformly

	131 // distributed input data.

	132 int current_number = base::RandInt(0, std::numeric_limits<int>::max());

	133

	134 // The expected value of a random distribution is the average over all

	135 // samples as the number of samples approaches infinity. For a uniform

	136 // distribution from [0.0, 1.0) this would be 0.5.

	137 //

	138 // We do kSamplesBetweenChecks at a time and check if the value has converged

	139 // to a narrow interval around 0.5. A non-uniform distribution would likely

	140 // converge at something different, or not converge consistently within this

	141 // range (i.e. the test would start timing out occasionally).

	142 int kSamplesBetweenChecks = 300;

	143 int num_samples = 0;

	144 double total_value = 0.0;

	145 while (true) {

	146 for (int i = 0; i < kSamplesBetweenChecks; ++i) {

	147 total_value += GenerateSHA1Entropy(

	148 base::IntToString(current_number++), "salt");

	149 num_samples++;

	150 }

	151

	152 double average = total_value / num_samples;

	153 double kExpectedMin = 0.48;

	154 double kExpectedMax = 0.52;

	155

	156 if (num_samples > 1000 &&

	157 (average < kExpectedMin \|\| average > kExpectedMax)) {

	158 // Only printed once we have enough samples that it's very unlikely

	159 // things haven't converged.

	160 printf("After %d samples, the average was %f, outside the expected\n"

	161 "range (%f, %f). We will add more samples and check after every\n"

	162 "%d samples. If the average does not converge, something\n"

	163 "is broken. If it does converge, the test will pass.\n",

	164 num_samples, average,

	165 kExpectedMin, kExpectedMax, kSamplesBetweenChecks);

	166 } else {

	167 // Success.

	168 break;

	169 }

	170 }

	171 }

	172

	173 TEST_F(EntropyProviderTest, PermutedEntropyIsUniform) {

	174 // Choose a random start number but go sequentially from there, so

	175 // that each test tries a different range but we never provide uniformly

	176 // distributed input data.

	177 const size_t kMaxEntropySize = (1 << 13);

	178 int current_number = base::RandInt(0, kMaxEntropySize - 1);

	179

	180 // The expected value of a random distribution is the average over all

	181 // samples as the number of samples approaches infinity. For a uniform

	182 // distribution from [0.0, 1.0) this would be 0.5.
	hfung 2012/08/17 17:35:40 So you aren't testing the distribution among diffe So you aren't testing the distribution among different buckets (which is what the actual problem we had) such as the min/max number of samples that fall within 100 different buckets? Would that be a stricter test? Alexei Svitkine (slow) 2012/08/17 17:57:40 Ilya had the same concern / suggestion. (Note: Thi Show quoted text On 2012/08/17 17:35:40, hfung wrote: > So you aren't testing the distribution among different buckets (which is what > the actual problem we had) such as the min/max number of samples that fall > within 100 different buckets? Would that be a stricter test? Ilya had the same concern / suggestion. (Note: This test is actually an existing test that was moved over from field_trial_unittest.cc and adjusted to also test the permuted entropy source.) I'll write a new test that better verifies that the actual issue we saw.
	183 //

	184 // We do kSamplesBetweenChecks at a time and check if the value has converged

	185 // to a narrow interval around 0.5. A non-uniform distribution would likely

	186 // converge at something different, or not converge consistently within this

	187 // range (i.e. the test would start timing out occasionally).

	188 int kSamplesBetweenChecks = 300;

	189 int num_samples = 0;

	190 double total_value = 0.0;

	191 while (true) {

	192 for (int i = 0; i < kSamplesBetweenChecks; ++i) {

	193 total_value += GeneratePermutedEntropy(current_number++ % kMaxEntropySize,

	194 kMaxEntropySize, "salt");

	195 num_samples++;

	196 }

	197

	198 double average = total_value / num_samples;

	199 double kExpectedMin = 0.48;

	200 double kExpectedMax = 0.52;

	201

	202 if (num_samples > 1000 &&

	203 (average < kExpectedMin \|\| average > kExpectedMax)) {

	204 // Only printed once we have enough samples that it's very unlikely

	205 // things haven't converged.

	206 printf("After %d samples, the average was %f, outside the expected\n"

	207 "range (%f, %f). We will add more samples and check after every\n"

	208 "%d samples. If the average does not converge, something\n"

	209 "is broken. If it does converge, the test will pass.\n",

	210 num_samples, average,

	211 kExpectedMin, kExpectedMax, kSamplesBetweenChecks);

	212 } else {

	213 // Success.

	214 break;

	215 }

	216 }

	217 }

	218

	219 TEST_F(EntropyProviderTest, SeededRandGeneratorIsUniform) {

	220 // Verifies. that SeededRandGenerator has a uniform distribution.
	hfung 2012/08/17 17:35:40 extra period? extra period? Alexei Svitkine (slow) 2012/08/17 17:57:40 Done. Show quoted text On 2012/08/17 17:35:40, hfung wrote: > extra period? Done.
	221 //

	222 // Mirrors RandUtilTest.RandGeneratorIsUniform in base/rand_util_unittest.cc.

	223

	224 const uint32 kTopOfRange = (std::numeric_limits<uint32>::max() / 4ULL) * 3ULL;

	225 const uint32 kExpectedAverage = kTopOfRange / 2ULL;

	226 const uint32 kAllowedVariance = kExpectedAverage / 50ULL; // +/- 2%

	227 const int kMinAttempts = 1000;

	228 const int kMaxAttempts = 1000000;

	229

	230 const std::string trial_names[] = {

	231 "TestTrial",

	232 "AnotherTestTrial",

	233 "NewTabButton",

	234 };

	235

	236 for (size_t i = 0; i < arraysize(trial_names); ++i) {

	237 const uint32 seed = internal::HashName(trial_names[i]);

	238 internal::SeededRandGenerator rand_generator(seed);

	239

	240 double cumulative_average = 0.0;

	241 int count = 0;

	242 while (count < kMaxAttempts) {

	243 uint32 value = rand_generator(kTopOfRange);

	244 cumulative_average = (count * cumulative_average + value) / (count + 1);

	245

	246 // Don't quit too quickly for things to start converging, or we may have

	247 // a false positive.

	248 if (count > kMinAttempts &&

	249 kExpectedAverage - kAllowedVariance < cumulative_average &&

	250 cumulative_average < kExpectedAverage + kAllowedVariance) {

	251 break;

	252 }

	253

	254 ++count;

	255 }

	256

	257 ASSERT_LT(count, kMaxAttempts) << "Expected average was " <<

	258 kExpectedAverage << ", average ended at " << cumulative_average;

	259 }

	260 }

	261

OLD	NEW