|
|
Chromium Code Reviews|
Created:
4 years ago by rnephew (Reviews Here) Modified:
4 years ago CC:
catapult-reviews_chromium.org, telemetry-reviews_chromium.org, perezju, hjd Target Ref:
refs/heads/master Project:
catapult Visibility:
Public. |
Description[Telemetry][Android] Wait for device under test to cool between pages.
If the platform supports waiting for the device temperature to drop, wait.
Currently this only implements andorid.
BUG=chromium:669923
Review-Url: https://codereview.chromium.org/2541843007
Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/70f42a7c55ca69cdeb9aa6ec7e40ff3f155040b9
Patch Set 1 #
Total comments: 2
Patch Set 2 : add as cmdline flag #
Total comments: 2
Patch Set 3 : convert to tenths at android level #
Total comments: 2
Patch Set 4 : [Telemetry][Android] Wait for device under test to cool between pages. #Patch Set 5 : Get rid of reference to deleted cmdline arg #Patch Set 6 : Fix tests #Patch Set 7 : rebase #Patch Set 8 : fix more tests #
Messages
Total messages: 42 (18 generated)
rnephew@chromium.org changed reviewers: + nednguyen@google.com, sullivan@chromium.org
https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... File telemetry/telemetry/internal/platform/platform_backend.py (right): https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... telemetry/telemetry/internal/platform/platform_backend.py:304: pass Debating if I should add a CanWaitForTemperature method that defaults to false, having each platform that implements it override it to true, and having this raise NotImplementedError, but that also seems like unnecessary boilerplate code when we can just make in a NOP on platforms that do not support it yet (or platforms that have no need to support it). https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... File telemetry/telemetry/page/shared_page_state.py (right): https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... telemetry/telemetry/page/shared_page_state.py:189: # Make sure device under test is at a suitable temperature. Decided to put it here, because doing it here makes it wait between pages. This will make sure that on long running tests, such as system health, that we do not start overheating partway through. I do expect this to increase the amount of time single benchmarks take since it will cool between pages, but I do not expect this to increase the amount of time a test run takes in total though, since the android test runner already waits between benchmarks for the temp to drop.
On 2016/12/01 18:38:00, rnephew (Reviews Here) wrote: > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > File telemetry/telemetry/internal/platform/platform_backend.py (right): > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > telemetry/telemetry/internal/platform/platform_backend.py:304: pass > Debating if I should add a CanWaitForTemperature method that defaults to false, > having each platform that implements it override it to true, and having this > raise NotImplementedError, but that also seems like unnecessary boilerplate code > when we can just make in a NOP on platforms that do not support it yet (or > platforms that have no need to support it). > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > File telemetry/telemetry/page/shared_page_state.py (right): > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > telemetry/telemetry/page/shared_page_state.py:189: # Make sure device under test > is at a suitable temperature. > Decided to put it here, because doing it here makes it wait between pages. This > will make sure that on long running tests, such as system health, that we do not > start overheating partway through. I do expect this to increase the amount of > time single benchmarks take since it will cool between pages, but I do not > expect this to increase the amount of time a test run takes in total though, > since the android test runner already waits between benchmarks for the temp to > drop. I think this really should be a commandline flag (--wait-to-cool-down-between-runs) or a benchmark property (wait_to_cool_down_between_runs). We don't want to enable this by default for all benchmark since it can be expensive, and benchmarks like power probably don't care much about the temperature.
On 2016/12/01 19:01:46, nednguyen wrote: > On 2016/12/01 18:38:00, rnephew (Reviews Here) wrote: > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > File telemetry/telemetry/internal/platform/platform_backend.py (right): > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > telemetry/telemetry/internal/platform/platform_backend.py:304: pass > > Debating if I should add a CanWaitForTemperature method that defaults to > false, > > having each platform that implements it override it to true, and having this > > raise NotImplementedError, but that also seems like unnecessary boilerplate > code > > when we can just make in a NOP on platforms that do not support it yet (or > > platforms that have no need to support it). > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > File telemetry/telemetry/page/shared_page_state.py (right): > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > telemetry/telemetry/page/shared_page_state.py:189: # Make sure device under > test > > is at a suitable temperature. > > Decided to put it here, because doing it here makes it wait between pages. > This > > will make sure that on long running tests, such as system health, that we do > not > > start overheating partway through. I do expect this to increase the amount of > > time single benchmarks take since it will cool between pages, but I do not > > expect this to increase the amount of time a test run takes in total though, > > since the android test runner already waits between benchmarks for the temp to > > drop. > > I think this really should be a commandline flag > (--wait-to-cool-down-between-runs) or a benchmark property > (wait_to_cool_down_between_runs). We don't want to enable this by default for > all benchmark since it can be expensive, and benchmarks like power probably > don't care much about the temperature. It would help decrease noise though if we do it between each page always. And as stated above, in theory it shouldn't increase run times very much since its just moving where it waits for temperatures to drop. I'll look into adding it as a flag though.
On 2016/12/01 19:14:02, rnephew (Reviews Here) wrote: > On 2016/12/01 19:01:46, nednguyen wrote: > > On 2016/12/01 18:38:00, rnephew (Reviews Here) wrote: > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > > File telemetry/telemetry/internal/platform/platform_backend.py (right): > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > > telemetry/telemetry/internal/platform/platform_backend.py:304: pass > > > Debating if I should add a CanWaitForTemperature method that defaults to > > false, > > > having each platform that implements it override it to true, and having this > > > raise NotImplementedError, but that also seems like unnecessary boilerplate > > code > > > when we can just make in a NOP on platforms that do not support it yet (or > > > platforms that have no need to support it). > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > > File telemetry/telemetry/page/shared_page_state.py (right): > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > > telemetry/telemetry/page/shared_page_state.py:189: # Make sure device under > > test > > > is at a suitable temperature. > > > Decided to put it here, because doing it here makes it wait between pages. > > This > > > will make sure that on long running tests, such as system health, that we do > > not > > > start overheating partway through. I do expect this to increase the amount > of > > > time single benchmarks take since it will cool between pages, but I do not > > > expect this to increase the amount of time a test run takes in total though, > > > since the android test runner already waits between benchmarks for the temp > to > > > drop. > > > > I think this really should be a commandline flag > > (--wait-to-cool-down-between-runs) or a benchmark property > > (wait_to_cool_down_between_runs). We don't want to enable this by default for > > all benchmark since it can be expensive, and benchmarks like power probably > > don't care much about the temperature. > > It would help decrease noise though if we do it between each page always. And as > stated above, in theory it shouldn't increase run times very much since its just > moving where it waits for temperatures to drop. I'll look into adding it as a > flag though. We currently also have _WaitForThermalThrottling if needed in https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/.... If you work on this, let take this slow & do it right. A few questions I have are: 1) How is the method you're implementing different from _WaitForThermalThrottling? Can we merge the two to into 1 thing? 2) On a typical long running benchmark like system_health.mobile_memory, how much time does WaitingForDeviceToCool between story add to the overall benchmark run? 3) Can we also implement this on desktop. IIRC, Charlie was having some problem with the power increasing across runs on Mac. Can you make a design doc about this work & answering these questions?
Description was changed from ========== [Telemetry][Android] Wait for device under test to cool between pages. If the platform supports waiting for the device temperature to drop, wait. Currently this only implements andorid. BUG=chromium:669923 ========== to ========== [Telemetry][Android] Wait for device under test to cool between pages. If the platform supports waiting for the device temperature to drop, wait. Currently this only implements andorid. BUG=chromium:669923 ==========
nednguyen@google.com changed reviewers: + charliea@chromium.org
On 2016/12/01 19:21:41, nednguyen wrote: > On 2016/12/01 19:14:02, rnephew (Reviews Here) wrote: > > On 2016/12/01 19:01:46, nednguyen wrote: > > > On 2016/12/01 18:38:00, rnephew (Reviews Here) wrote: > > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > > > File telemetry/telemetry/internal/platform/platform_backend.py (right): > > > > > > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/interna... > > > > telemetry/telemetry/internal/platform/platform_backend.py:304: pass > > > > Debating if I should add a CanWaitForTemperature method that defaults to > > > false, > > > > having each platform that implements it override it to true, and having > this > > > > raise NotImplementedError, but that also seems like unnecessary > boilerplate > > > code > > > > when we can just make in a NOP on platforms that do not support it yet (or > > > > platforms that have no need to support it). > > > > > > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > > > File telemetry/telemetry/page/shared_page_state.py (right): > > > > > > > > > > > > > > https://codereview.chromium.org/2541843007/diff/1/telemetry/telemetry/page/sh... > > > > telemetry/telemetry/page/shared_page_state.py:189: # Make sure device > under > > > test > > > > is at a suitable temperature. > > > > Decided to put it here, because doing it here makes it wait between pages. > > > This > > > > will make sure that on long running tests, such as system health, that we > do > > > not > > > > start overheating partway through. I do expect this to increase the amount > > of > > > > time single benchmarks take since it will cool between pages, but I do not > > > > expect this to increase the amount of time a test run takes in total > though, > > > > since the android test runner already waits between benchmarks for the > temp > > to > > > > drop. > > > > > > I think this really should be a commandline flag > > > (--wait-to-cool-down-between-runs) or a benchmark property > > > (wait_to_cool_down_between_runs). We don't want to enable this by default > for > > > all benchmark since it can be expensive, and benchmarks like power probably > > > don't care much about the temperature. > > > > It would help decrease noise though if we do it between each page always. And > as > > stated above, in theory it shouldn't increase run times very much since its > just > > moving where it waits for temperatures to drop. I'll look into adding it as a > > flag though. > > We currently also have _WaitForThermalThrottling if needed in > https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/.... > If you work on this, let take this slow & do it right. > > A few questions I have are: > 1) How is the method you're implementing different from > _WaitForThermalThrottling? Can we merge the two to into 1 thing? > 2) On a typical long running benchmark like system_health.mobile_memory, how > much time does WaitingForDeviceToCool between story add to the overall benchmark > run? > 3) Can we also implement this on desktop. IIRC, Charlie was having some problem > with the power increasing across runs on Mac. > > Can you make a design doc about this work & answering these questions? If we already have _WaitForThermalThrottling, then maybe thermal throttling isn't the issue in crbug.com/669923 making this unnecessary? I dont know enough about mac hardware to implement it there atm. Since this (if this is even what is causing it) is causing problems on bisects for android, I'd prefer to roll it out on android first, then work on the other platforms so we can get unstuck there. I dont think this requires a full DD, but I'll make a 1-pager for it.
I added a command line flag and moved where it calls LetTemperatureCool to reflect where the thermal throttling already is called.
https://codereview.chromium.org/2541843007/diff/20001/telemetry/telemetry/int... File telemetry/telemetry/internal/story_runner.py (right): https://codereview.chromium.org/2541843007/diff/20001/telemetry/telemetry/int... telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for between pages. In tenths of' Can we accept and pass around the temperature value as actual degrees celsius, and only convert it at the last minute? We can mention here that the precision is limited to 1/10 of a degree C, but it seems like this would be a lot more intuitive to users.
https://codereview.chromium.org/2541843007/diff/20001/telemetry/telemetry/int... File telemetry/telemetry/internal/story_runner.py (right): https://codereview.chromium.org/2541843007/diff/20001/telemetry/telemetry/int... telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for between pages. In tenths of' On 2016/12/02 16:03:12, charliea wrote: > Can we accept and pass around the temperature value as actual degrees celsius, > and only convert it at the last minute? We can mention here that the precision > is limited to 1/10 of a degree C, but it seems like this would be a lot more > intuitive to users. Done.
Ping.
https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... File telemetry/telemetry/internal/story_runner.py (right): https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for between pages. In degrees C.') I think we should not expose this. Instead, can you do a local run of system health benchmark on a test phone & see whether this has big effect on runtime? If not, we should just go ahead & replace _WaitForThermalThrottlingIfNeeded with this new mechanism
On 2016/12/15 15:38:11, nednguyen wrote: > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > File telemetry/telemetry/internal/story_runner.py (right): > > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for > between pages. In degrees C.') > I think we should not expose this. Instead, can you do a local run of system > health benchmark on a test phone & see whether this has big effect on runtime? > > If not, we should just go ahead & replace _WaitForThermalThrottlingIfNeeded with > this new mechanism I dont think any amount of local testing will adequately warm a device like it would be in the labs. We typically do not see temperature issues out side of lab conditions, which are nearly impossible to replicate in the lab. I also expect a single benchmark run to take longer, its the overall telemetry run that should not take longer. Old: New: [Test 1 start] [Test 1 start] [page 1] [page 1] [page 2] [Cool device] [page 3] [page 2] [Cool device] [Cool device] [Cool device] [Page 3] [Cool device] [Cool device] That is about how I expect it to change. Moving all the cooling at the end of the benchmark to the between pages. Its likely that we will see a small increase because of newtons law of cooling, but probably not very high. I just dont think we can adequately test it outside of the lab.
On 2016/12/15 15:47:05, rnephew (Reviews Here) wrote: > On 2016/12/15 15:38:11, nednguyen wrote: > > > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > > File telemetry/telemetry/internal/story_runner.py (right): > > > > > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > > telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for > > between pages. In degrees C.') > > I think we should not expose this. Instead, can you do a local run of system > > health benchmark on a test phone & see whether this has big effect on runtime? > > > > If not, we should just go ahead & replace _WaitForThermalThrottlingIfNeeded > with > > this new mechanism > > I dont think any amount of local testing will adequately warm a device like it > would be in the labs. We typically do not see temperature issues out side of lab > conditions, which are nearly impossible to replicate in the lab. I also expect a > single benchmark run to take longer, its the overall telemetry run that should > not take longer. > > Old: New: > [Test 1 start] [Test 1 start] > [page 1] [page 1] > [page 2] [Cool device] > [page 3] [page 2] > [Cool device] [Cool device] > [Cool device] [Page 3] > [Cool device] [Cool device] > > That is about how I expect it to change. Moving all the cooling at the end of > the benchmark to the between pages. Its likely that we will see a small increase > because of newtons law of cooling, but probably not very high. I just dont think > we can adequately test it outside of the lab. I see. I think the next step then could be: 1) Remove the commandline flag. We should be the one who know best about which temperature to use. In general, less public APIs will give you less headache in the long run. 2) Go ahead with landing this patch, but taking it as an experiment. Add logging to know whether the wait for temperature is actually triggered & how long it takes. 3) Collect the data: i) Whether this helps fixing crbug.com/669923 ii) How much runtime increase does this contribute to the benchmark. iii) Whether this makes other benchmarks more stable or less stable. --> Gives this about 3 weeks to see if this CL got blamed by any bisect. 4) Assuming data in 3) is good, go ahead with migrate _WaitForThermalThrottlingIfNeeded(state.platform) over to this new method of waiting for device to cool If the data in 3) is not good, we may need to revert this CL & decide what to do next.
https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... File telemetry/telemetry/internal/story_runner.py (right): https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for between pages. In degrees C.') On 2016/12/15 15:38:10, nednguyen wrote: > I think we should not expose this. Instead, can you do a local run of system > health benchmark on a test phone & see whether this has big effect on runtime? > > If not, we should just go ahead & replace _WaitForThermalThrottlingIfNeeded with > this new mechanism Got rid of the cmd line option.
The CQ bit was checked by rnephew@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/12/15 16:29:01, rnephew (Reviews Here) wrote: > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > File telemetry/telemetry/internal/story_runner.py (right): > > https://codereview.chromium.org/2541843007/diff/40001/telemetry/telemetry/int... > telemetry/telemetry/internal/story_runner.py:64: help='Temperature to wait for > between pages. In degrees C.') > On 2016/12/15 15:38:10, nednguyen wrote: > > I think we should not expose this. Instead, can you do a local run of system > > health benchmark on a test phone & see whether this has big effect on runtime? > > > > If not, we should just go ahead & replace _WaitForThermalThrottlingIfNeeded > with > > this new mechanism > > Got rid of the cmd line option. The devil method this uses already logs the temperature, so no extra logging needed. We can tell by how many times the log message appears how long it waited.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Catapult Mac Tryserver on master.tryserver.client.catapult (JOB_FAILED, https://build.chromium.org/p/tryserver.client.catapult/builders/Catapult%20Ma...)
The CQ bit was checked by rnephew@chromium.org to run a CQ dry run
lgtm, but defer to Ned's lgtm ultimately
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Catapult Mac Tryserver on master.tryserver.client.catapult (JOB_FAILED, https://build.chromium.org/p/tryserver.client.catapult/builders/Catapult%20Ma...)
lgtm
On 2016/12/15 17:48:55, nednguyen wrote: > lgtm +Juan & Hector since this may affect crbug.com/671156
On 2016/12/15 17:49:39, nednguyen wrote: > On 2016/12/15 17:48:55, nednguyen wrote: > > lgtm > > +Juan & Hector since this may affect crbug.com/671156 Fixed unit tests and re. Landing.
On 2016/12/19 21:31:10, rnephew (Reviews Here) wrote: > On 2016/12/15 17:49:39, nednguyen wrote: > > On 2016/12/15 17:48:55, nednguyen wrote: > > > lgtm > > > > +Juan & Hector since this may affect crbug.com/671156 > > Fixed unit tests and re. Landing. rebased*
On 2016/12/19 21:31:10, rnephew (Reviews Here) wrote: > On 2016/12/15 17:49:39, nednguyen wrote: > > On 2016/12/15 17:48:55, nednguyen wrote: > > > lgtm > > > > +Juan & Hector since this may affect crbug.com/671156 > > Fixed unit tests and re. Landing. rebased*
The CQ bit was checked by rnephew@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from nednguyen@google.com, charliea@chromium.org Link to the patchset: https://codereview.chromium.org/2541843007/#ps120001 (title: "rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Catapult Mac Tryserver on master.tryserver.client.catapult (JOB_FAILED, https://build.chromium.org/p/tryserver.client.catapult/builders/Catapult%20Ma...)
The CQ bit was checked by rnephew@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from nednguyen@google.com, charliea@chromium.org Link to the patchset: https://codereview.chromium.org/2541843007/#ps140001 (title: "fix more tests")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 140001, "attempt_start_ts": 1482187380743900,
"parent_rev": "d77eaf7f69e860436f0245c10ea2c186937ab666", "commit_rev":
"70f42a7c55ca69cdeb9aa6ec7e40ff3f155040b9"}
Message was sent while issue was closed.
Description was changed from ========== [Telemetry][Android] Wait for device under test to cool between pages. If the platform supports waiting for the device temperature to drop, wait. Currently this only implements andorid. BUG=chromium:669923 ========== to ========== [Telemetry][Android] Wait for device under test to cool between pages. If the platform supports waiting for the device temperature to drop, wait. Currently this only implements andorid. BUG=chromium:669923 Review-Url: https://codereview.chromium.org/2541843007 Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapu... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://chromium.googlesource.com/external/github.com/catapult-project/catapu... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
