Issue 2221193002: Add testing configs for ParseHTMLOnMainThread experiment

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-08 20:51:44 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/1

4 years, 4 months ago (2016-08-08 20:52:09 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-08 22:15:58 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/205944)

4 years, 4 months ago (2016-08-08 22:16:01 UTC) #4

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-08 23:12:21 UTC) #5

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/20001

4 years, 4 months ago (2016-08-08 23:13:11 UTC) #6

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-08 23:31:09 UTC) #7

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/214526)

4 years, 4 months ago (2016-08-08 23:31:10 UTC) #8

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-09 00:47:43 UTC) #9

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/40001

4 years, 4 months ago (2016-08-09 00:48:07 UTC) #10

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-09 00:55:42 UTC) #11

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-generic_chromium_compile_only_ng/builds/179974) chromeos_x86-generic_chromium_compile_only_ng on ...

4 years, 4 months ago (2016-08-09 00:55:43 UTC) #12

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-09 01:33:43 UTC) #13

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/40001

4 years, 4 months ago (2016-08-09 01:34:05 UTC) #14

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-09 01:42:40 UTC) #15

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linux/builds/203936)

4 years, 4 months ago (2016-08-09 01:42:41 UTC) #16

Charlie Harrison

csharrison@chromium.org changed reviewers: + dpranke@chromium.org, rdevlin.cronin@chromium.org

4 years, 4 months ago (2016-08-09 14:13:49 UTC) #17

Charlie Harrison

rdevlin, do the extension test changes seem reasonable? I think these are just race conditions ...

4 years, 4 months ago (2016-08-09 14:13:49 UTC) #18

Devlin

https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html File chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html (right): https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html#newcode2 chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html:2: window.onload = function() { Why does this no longer ...

4 years, 4 months ago (2016-08-09 20:23:38 UTC) #19

Charlie Harrison

https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html File chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html (right): https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html#newcode2 chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html:2: window.onload = function() { On 2016/08/09 20:23:38, Devlin wrote: ...

4 years, 4 months ago (2016-08-09 20:27:41 UTC) #20

Devlin

https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html File chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html (right): https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html#newcode2 chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html:2: window.onload = function() { On 2016/08/09 20:27:41, Charlie Harrison ...

4 years, 4 months ago (2016-08-09 20:47:46 UTC) #21

Charlie Harrison

https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html File chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html (right): https://codereview.chromium.org/2221193002/diff/40001/chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html#newcode2 chrome/test/data/extensions/api_test/extension_resource_request_policy/web_accessible/accessible_redirect_resource.html:2: window.onload = function() { On 2016/08/09 20:47:45, Devlin wrote: ...

4 years, 4 months ago (2016-08-09 20:49:24 UTC) #22

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-09 21:57:43 UTC) #23

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/60001

4 years, 4 months ago (2016-08-09 21:58:23 UTC) #24

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 4 months ago (2016-08-09 22:05:58 UTC) #25

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/234307)

4 years, 4 months ago (2016-08-09 22:06:00 UTC) #26

Dirk Pranke

dpranke@chromium.org changed reviewers: + isherman@chromium.org

4 years, 4 months ago (2016-08-09 23:49:11 UTC) #27

Dirk Pranke

it's probably better to have a //testing/variations/ OWNER approve this, as I'm not all that ...

4 years, 4 months ago (2016-08-09 23:49:11 UTC) #28

Ilya Sherman

https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fieldtrial_testing_config_linux.json File testing/variations/fieldtrial_testing_config_linux.json (right): https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fieldtrial_testing_config_linux.json#newcode167 testing/variations/fieldtrial_testing_config_linux.json:167: } I think the best practice is to test ...

4 years, 4 months ago (2016-08-10 05:53:13 UTC) #29

Charlie Harrison

https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fieldtrial_testing_config_linux.json File testing/variations/fieldtrial_testing_config_linux.json (right): https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fieldtrial_testing_config_linux.json#newcode167 testing/variations/fieldtrial_testing_config_linux.json:167: } On 2016/08/10 05:53:13, Ilya Sherman wrote: > I ...

4 years, 4 months ago (2016-08-10 12:05:31 UTC) #30

Ilya Sherman

isherman@chromium.org changed reviewers: + asvitkine@chromium.org, rkaplow@chromium.org

4 years, 4 months ago (2016-08-10 18:53:20 UTC) #31

Ilya Sherman

+Rob and Alexei for their thoughts on field trial configs with multiple experimental groups (PTAL ...

4 years, 4 months ago (2016-08-10 18:53:21 UTC) #32

+Rob and Alexei for their thoughts on field trial configs with multiple
experimental groups (PTAL at the CL and past discussion)

https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
File testing/variations/fieldtrial_testing_config_linux.json (right):

https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
testing/variations/fieldtrial_testing_config_linux.json:167: }
On 2016/08/10 12:05:31, Charlie Harrison wrote:
> On 2016/08/10 05:53:13, Ilya Sherman wrote:
> > I think the best practice is to test the configuration that you are most
> likely
> > to ship, and to test that configuration on each platform, rather than
testing
> > one configuration per platform.  This allows us to catch, e.g. performance
> > regressions, even if they happen only on a single platform.
> 
> That makes sense, but I'd argue that shipping experiment groups that are not
> tested (or tested on a small subset of tests) is worse than missing platform
> specific behavior of an experiment group.
> 
> In this case I think we can reach a compromise, as most of the code this
touches
> is tested by layout tests which can have virtual test suites added. However as
> discovered by this CL this doesn't catch everything :)
> 
> WDYT?

In general, each feature's tests should test all the configurations for that
feature.  These config files are for running the *entire* test suite with the
most likely field trial configuration for shipping, to see if there are any
unexpected regressions detected by more general tests.  Thus, we prefer to test
a single configuration, the one that is actually going to be launched to users
assuming that nothing looks wrong.  We only require/recommend adding these
configs once the field trial is reaching the Beta channel, on the assumption
that by that point, it's usually pretty clear which configuration is the most
promising one.

Now, there's a lot of assumptions in the above, which aren't necessarily correct
-- most notably, that there's enough data prior to testing on Beta to determine
which configuration is most promising.  For this particular experiment, which
configuration are you *expecting* to ship?  Is there a single candidate, or
multiple ones?

rkaplow

On 2016/08/10 18:53:21, Ilya Sherman wrote: > +Rob and Alexei for their thoughts on field ...

4 years, 4 months ago (2016-08-10 19:14:54 UTC) #33

On 2016/08/10 18:53:21, Ilya Sherman wrote:
> +Rob and Alexei for their thoughts on field trial configs with multiple
> experimental groups (PTAL at the CL and past discussion)
> 
>
https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
> File testing/variations/fieldtrial_testing_config_linux.json (right):
> 
>
https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
> testing/variations/fieldtrial_testing_config_linux.json:167: }
> On 2016/08/10 12:05:31, Charlie Harrison wrote:
> > On 2016/08/10 05:53:13, Ilya Sherman wrote:
> > > I think the best practice is to test the configuration that you are most
> > likely
> > > to ship, and to test that configuration on each platform, rather than
> testing
> > > one configuration per platform.  This allows us to catch, e.g. performance
> > > regressions, even if they happen only on a single platform.
> > 
> > That makes sense, but I'd argue that shipping experiment groups that are not
> > tested (or tested on a small subset of tests) is worse than missing platform
> > specific behavior of an experiment group.
> > 
> > In this case I think we can reach a compromise, as most of the code this
> touches
> > is tested by layout tests which can have virtual test suites added. However
as
> > discovered by this CL this doesn't catch everything :)
> > 
> > WDYT?
> 
> In general, each feature's tests should test all the configurations for that
> feature.  These config files are for running the *entire* test suite with the
> most likely field trial configuration for shipping, to see if there are any
> unexpected regressions detected by more general tests.  Thus, we prefer to
test
> a single configuration, the one that is actually going to be launched to users
> assuming that nothing looks wrong.  We only require/recommend adding these
> configs once the field trial is reaching the Beta channel, on the assumption
> that by that point, it's usually pretty clear which configuration is the most
> promising one.
> 
> Now, there's a lot of assumptions in the above, which aren't necessarily
correct
> -- most notably, that there's enough data prior to testing on Beta to
determine
> which configuration is most promising.  For this particular experiment, which
> configuration are you *expecting* to ship?  Is there a single candidate, or
> multiple ones?

Agree with everything Ilya has said. I don't like the idea of using platforms to
fake testing multiple experiment groups - I would prefer correctly testing the
most likely experiment. It would be nice if we had the option to test multiple
groups, but since we don't have that option yet, I'd prefer using the model of
testing the best candidate. Usually at the time that this part gets setup, there
has already been experimentation on canary/dev, so generally at this point the
groups are only slightly different - maybe differences in parameters which are
less risky.

If the group that we expect to launch ends up changing, it will then change here
before shipping, of course, so any bugs will get shaken out here in either case.

Charlie Harrison

On 2016/08/10 at 19:14:54, rkaplow wrote: > On 2016/08/10 18:53:21, Ilya Sherman wrote: > > ...

4 years, 4 months ago (2016-08-10 19:20:22 UTC) #34

On 2016/08/10 at 19:14:54, rkaplow wrote:
> On 2016/08/10 18:53:21, Ilya Sherman wrote:
> > +Rob and Alexei for their thoughts on field trial configs with multiple
> > experimental groups (PTAL at the CL and past discussion)
> > 
> >
https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
> > File testing/variations/fieldtrial_testing_config_linux.json (right):
> > 
> >
https://codereview.chromium.org/2221193002/diff/60001/testing/variations/fiel...
> > testing/variations/fieldtrial_testing_config_linux.json:167: }
> > On 2016/08/10 12:05:31, Charlie Harrison wrote:
> > > On 2016/08/10 05:53:13, Ilya Sherman wrote:
> > > > I think the best practice is to test the configuration that you are most
> > > likely
> > > > to ship, and to test that configuration on each platform, rather than
> > testing
> > > > one configuration per platform.  This allows us to catch, e.g.
performance
> > > > regressions, even if they happen only on a single platform.
> > > 
> > > That makes sense, but I'd argue that shipping experiment groups that are
not
> > > tested (or tested on a small subset of tests) is worse than missing
platform
> > > specific behavior of an experiment group.
> > > 
> > > In this case I think we can reach a compromise, as most of the code this
> > touches
> > > is tested by layout tests which can have virtual test suites added.
However as
> > > discovered by this CL this doesn't catch everything :)
> > > 
> > > WDYT?
> > 
> > In general, each feature's tests should test all the configurations for that
> > feature.  These config files are for running the *entire* test suite with
the
> > most likely field trial configuration for shipping, to see if there are any
> > unexpected regressions detected by more general tests.  Thus, we prefer to
test
> > a single configuration, the one that is actually going to be launched to
users
> > assuming that nothing looks wrong.  We only require/recommend adding these
> > configs once the field trial is reaching the Beta channel, on the assumption
> > that by that point, it's usually pretty clear which configuration is the
most
> > promising one.
> > 
> > Now, there's a lot of assumptions in the above, which aren't necessarily
correct
> > -- most notably, that there's enough data prior to testing on Beta to
determine
> > which configuration is most promising.  For this particular experiment,
which
> > configuration are you *expecting* to ship?  Is there a single candidate, or
> > multiple ones?
> 
> Agree with everything Ilya has said. I don't like the idea of using platforms
to fake testing multiple experiment groups - I would prefer correctly testing
the most likely experiment. It would be nice if we had the option to test
multiple groups, but since we don't have that option yet, I'd prefer using the
model of testing the best candidate. Usually at the time that this part gets
setup, there has already been experimentation on canary/dev, so generally at
this point the groups are only slightly different - maybe differences in
parameters which are less risky.
> 
> If the group that we expect to launch ends up changing, it will then change
here before shipping, of course, so any bugs will get shaken out here in either
case.

Thank you both for the excellent explanations. A few points:
 - These experiments are in Dev, I only wanted this to increase coverage due to
my own nervousness after finding a pretty serious bug in a group that was
non-tested.
 - There is a way to increase coverage without abusing multiple platforms:
virtual test suites for LayoutTests pointing at a subset of webkit_tests. I will
do that instead.
 - I think we will be able to use Dev data to inform us of the proper Beta
groups (and the most likely candidate to ship).

All in all, I think what you are saying makes sense for this experiment, and I
was being too aggressive with this CL. I'll update the CL.

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org to run a CQ dry run

4 years, 4 months ago (2016-08-10 19:59:53 UTC) #35

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/80001

4 years, 4 months ago (2016-08-10 20:00:53 UTC) #36

Charlie Harrison

Ok I have made the changes to the configs and added virtual test suites to ...

4 years, 4 months ago (2016-08-10 20:13:12 UTC) #37

Devlin

extensions lgtm with nit https://codereview.chromium.org/2221193002/diff/80001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc File chrome/browser/extensions/extension_resource_request_policy_apitest.cc (left): https://codereview.chromium.org/2221193002/diff/80001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc#oldcode269 chrome/browser/extensions/extension_resource_request_policy_apitest.cc:269: ASSERT_TRUE(content::ExecuteScriptAndExtractString( Using JS to get ...

4 years, 4 months ago (2016-08-10 20:20:09 UTC) #38

Charlie Harrison

Thanks Devlin https://codereview.chromium.org/2221193002/diff/80001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc File chrome/browser/extensions/extension_resource_request_policy_apitest.cc (left): https://codereview.chromium.org/2221193002/diff/80001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc#oldcode269 chrome/browser/extensions/extension_resource_request_policy_apitest.cc:269: ASSERT_TRUE(content::ExecuteScriptAndExtractString( On 2016/08/10 20:20:08, Devlin wrote: > ...

4 years, 4 months ago (2016-08-10 20:35:35 UTC) #39

Devlin

(s lgtm) https://codereview.chromium.org/2221193002/diff/100001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc File chrome/browser/extensions/extension_resource_request_policy_apitest.cc (right): https://codereview.chromium.org/2221193002/diff/100001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc#newcode241 chrome/browser/extensions/extension_resource_request_policy_apitest.cc:241: .AppendASCII("web_accessible")); nitty nit: leave in an ASSERT_TRUE(extension) ...

4 years, 4 months ago (2016-08-10 20:42:35 UTC) #40

Charlie Harrison

https://codereview.chromium.org/2221193002/diff/100001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc File chrome/browser/extensions/extension_resource_request_policy_apitest.cc (right): https://codereview.chromium.org/2221193002/diff/100001/chrome/browser/extensions/extension_resource_request_policy_apitest.cc#newcode241 chrome/browser/extensions/extension_resource_request_policy_apitest.cc:241: .AppendASCII("web_accessible")); On 2016/08/10 20:42:35, Devlin wrote: > nitty nit: ...

4 years, 4 months ago (2016-08-10 20:45:35 UTC) #41

Charlie Harrison

csharrison@chromium.org changed reviewers: + kouhei@chromium.org

4 years, 4 months ago (2016-08-10 21:02:03 UTC) #43

Charlie Harrison

+kouhei, would you take a look at the virtual tests?

4 years, 4 months ago (2016-08-10 21:02:03 UTC) #44

Charlie Harrison

The CQ bit was checked by csharrison@chromium.org

4 years, 4 months ago (2016-08-13 04:17:01 UTC) #46

Charlie Harrison

The patchset sent to the CQ was uploaded after l-g-t-m from rdevlin.cronin@chromium.org Link to the ...

4 years, 4 months ago (2016-08-13 04:17:02 UTC) #47

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2221193002/120001

4 years, 4 months ago (2016-08-13 04:17:11 UTC) #48

commit-bot: I haz the power

Description was changed from ========== Add testing configs for ParseHTMLOnMainThread experiment BUG=623165 ========== to ========== ...

4 years, 4 months ago (2016-08-13 06:19:05 UTC) #50

commit-bot: I haz the power

4 years, 4 months ago (2016-08-13 06:19:06 UTC) #51

Message was sent while issue was closed.

Patchset 7 (id:??) landed as
https://crrev.com/46be1b831ffec878df1b258a4f26872451d7795e
Cr-Commit-Position: refs/heads/master@{#411880}

Issue 2221193002: Add testing configs for ParseHTMLOnMainThread experiment (Closed)

Description

Patch Set 1 #

Patch Set 2 : fix bug in sync case #

Patch Set 3 : Fix test #

Patch Set 4 : Edit browsertest #

Patch Set 5 : add virtual test suites and make field trial configs for Enabled group only #

Patch Set 6 : Devlin review #

Patch Set 7 : nits #

Messages