Issue 851503005: Add histograms for temperatures and thermal states.

mka

mka@chromium.org changed reviewers: + isherman@chromium.org, kemp@chromium.org, vadimb@chromium.org

5 years, 11 months ago (2015-01-13 02:03:02 UTC) #1

vadimb1

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml#newcode24626 tools/metrics/histograms/histograms.xml:24626: + <summary>Temperature reading at sensor 0 of the CPU.</summary> ...

5 years, 11 months ago (2015-01-21 18:48:39 UTC) #3

mka

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml#newcode24626 tools/metrics/histograms/histograms.xml:24626: + <summary>Temperature reading at sensor 0 of the CPU.</summary> ...

5 years, 11 months ago (2015-01-21 19:19:33 UTC) #4

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
File tools/metrics/histograms/histograms.xml (right):

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24626: +  <summary>Temperature reading
at sensor 0 of the CPU.</summary>
On 2015/01/21 18:48:39, vadimb1 wrote:
> Please add information about how often this is logged, otherwise it's unclear
> how to interpret that (i.e. logged when the temperature is changed, or every
> second, or once a day, or when the system is restarted - it's all different).
> Same for other Temperature histograms.

my idea is to log it every second, however it could happen that the actual
sensor reading is performed at a slightly lower rate (every few seconds). the
component reporting the metrics doesn't actually read the sensor but subscribes
to the sensor to be notified on changes. the polling rate of the sensor might
vary between different sensors and devices.

not sure whether to express this here in detail or say "every few seconds"

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24647: +    Distribution of time spent
in the states of the thermal zone of the CPU.
On 2015/01/21 18:48:39, vadimb1 wrote:
> 1. Same question about how often this is logged.

every second

> 2. What are the states? Since it's not enum it will be an integer, how to
> interpret that integer?

I left it vague on purpose as the interpretation of the values might vary across
different devices. What is true for all devices is that a higher state
corresponds to a higher temperature

> 3. So what is logged here? Time or state? E.g., say I have the histogram:
> 0-1: 50
> 1-2: 60
> How one would understand that? 50ms spent in state 0? Or 50s? Or we've seen
the
> system 50 times in state 0?

the state is logged, so it would be 50s spent in state 0

> Same for the other states.

vadimb1

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml#newcode24626 tools/metrics/histograms/histograms.xml:24626: + <summary>Temperature reading at sensor 0 of the CPU.</summary> ...

5 years, 11 months ago (2015-01-21 22:32:57 UTC) #5

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
File tools/metrics/histograms/histograms.xml (right):

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24626: +  <summary>Temperature reading
at sensor 0 of the CPU.</summary>
On 2015/01/21 19:19:33, mka wrote:
> On 2015/01/21 18:48:39, vadimb1 wrote:
> > Please add information about how often this is logged, otherwise it's
unclear
> > how to interpret that (i.e. logged when the temperature is changed, or every
> > second, or once a day, or when the system is restarted - it's all
different).
> > Same for other Temperature histograms.
> 
> my idea is to log it every second, however it could happen that the actual
> sensor reading is performed at a slightly lower rate (every few seconds). the
> component reporting the metrics doesn't actually read the sensor but
subscribes
> to the sensor to be notified on changes. the polling rate of the sensor might
> vary between different sensors and devices.
> 
> not sure whether to express this here in detail or say "every few seconds"

Well the idea of the summary (which is shown on the dashboards) is that anyone
looking at it should get as much information as reasonably possible to fit in a
few lines of text so they don't need to dig into the sources or e-mail you every
time. Not every person looking at this is familiar with your project; we had
lots of scenarios when people find a histogram (in many cases created a few
years ago) by text search and want to quickly check if it is close to what they
were looking for.

In this case I would say "collected every few seconds (may vary between
devices)".

See for example "Platform.Temperature.Sensor01" that says "Temperature reading
at sensor 1 (I2C_CPU-Object) taken every 30s."

BTW what is the difference, can you use Platform.Temperature.* or is your
histogram completely unrelated to that group?

Also on this subject, most of the other Platform.* histograms start description
with "Chrome OS" - now sure if that is your case but you may follow the example
for consistency if it is (but up to you).

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24647: +    Distribution of time spent
in the states of the thermal zone of the CPU.
On 2015/01/21 19:19:33, mka wrote:
> On 2015/01/21 18:48:39, vadimb1 wrote:
> > 1. Same question about how often this is logged.
> 
> every second
> 
> > 2. What are the states? Since it's not enum it will be an integer, how to
> > interpret that integer?
> 
> I left it vague on purpose as the interpretation of the values might vary
across
> different devices. What is true for all devices is that a higher state
> corresponds to a higher temperature
> 
> > 3. So what is logged here? Time or state? E.g., say I have the histogram:
> > 0-1: 50
> > 1-2: 60
> > How one would understand that? 50ms spent in state 0? Or 50s? Or we've seen
> the
> > system 50 times in state 0?
> 
> the state is logged, so it would be 50s spent in state 0
>  
> > Same for the other states.
> 

Well, make it clearer from the summary. It's very similar to the previous
histograms group but the summary reads differently. I would say something like
"State of the termal zone, collected every second. State depends on device,
higher state corresponds to a higher temperature" or "Time spent in the states
of termal zone, one count collected every second...".

Also you probably could use units="State" here. Then the histograms table will
be 
  State Quantity PDF CFD
while without units it will be 
  Range Quantity PDF CFD

mka

Vadim, thanks a lot for your feedback https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/histograms.xml#newcode24626 tools/metrics/histograms/histograms.xml:24626: + <summary>Temperature ...

5 years, 11 months ago (2015-01-21 23:25:19 UTC) #6

Vadim, thanks a lot for your feedback

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
File tools/metrics/histograms/histograms.xml (right):

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24626: +  <summary>Temperature reading
at sensor 0 of the CPU.</summary>
On 2015/01/21 22:32:56, vadimb1 wrote:
> On 2015/01/21 19:19:33, mka wrote:
> > On 2015/01/21 18:48:39, vadimb1 wrote:
> > > Please add information about how often this is logged, otherwise it's
> unclear
> > > how to interpret that (i.e. logged when the temperature is changed, or
every
> > > second, or once a day, or when the system is restarted - it's all
> different).
> > > Same for other Temperature histograms.
> > 
> > my idea is to log it every second, however it could happen that the actual
> > sensor reading is performed at a slightly lower rate (every few seconds).
the
> > component reporting the metrics doesn't actually read the sensor but
> subscribes
> > to the sensor to be notified on changes. the polling rate of the sensor
might
> > vary between different sensors and devices.
> > 
> > not sure whether to express this here in detail or say "every few seconds"
> 
> Well the idea of the summary (which is shown on the dashboards) is that anyone
> looking at it should get as much information as reasonably possible to fit in
a
> few lines of text so they don't need to dig into the sources or e-mail you
every
> time. Not every person looking at this is familiar with your project; we had
> lots of scenarios when people find a histogram (in many cases created a few
> years ago) by text search and want to quickly check if it is close to what
they
> were looking for.
> 
> In this case I would say "collected every few seconds (may vary between
> devices)".
> 
> See for example "Platform.Temperature.Sensor01" that says "Temperature reading
> at sensor 1 (I2C_CPU-Object) taken every 30s."

ack, I'll update the description

> BTW what is the difference, can you use Platform.Temperature.* or is your
> histogram completely unrelated to that group?

yes, my histogram is completely unrelated. I saw the Platform.Temperature.*
histograms and considered to add the new histograms to this group, but as there
are also the thermal zone histograms it seemed more appropriate to group them
together under thermal

> Also on this subject, most of the other Platform.* histograms start
description
> with "Chrome OS" - now sure if that is your case but you may follow the
example
> for consistency if it is (but up to you).

Jetstream runs Chrome OS (Core), I'll add it to the description

https://codereview.chromium.org/851503005/diff/1/tools/metrics/histograms/his...
tools/metrics/histograms/histograms.xml:24647: +    Distribution of time spent
in the states of the thermal zone of the CPU.
On 2015/01/21 22:32:56, vadimb1 wrote:
> On 2015/01/21 19:19:33, mka wrote:
> > On 2015/01/21 18:48:39, vadimb1 wrote:
> > > 1. Same question about how often this is logged.
> > 
> > every second
> > 
> > > 2. What are the states? Since it's not enum it will be an integer, how to
> > > interpret that integer?
> > 
> > I left it vague on purpose as the interpretation of the values might vary
> across
> > different devices. What is true for all devices is that a higher state
> > corresponds to a higher temperature
> > 
> > > 3. So what is logged here? Time or state? E.g., say I have the histogram:
> > > 0-1: 50
> > > 1-2: 60
> > > How one would understand that? 50ms spent in state 0? Or 50s? Or we've
seen
> > the
> > > system 50 times in state 0?
> > 
> > the state is logged, so it would be 50s spent in state 0
> >  
> > > Same for the other states.
> > 
> 
> Well, make it clearer from the summary. It's very similar to the previous
> histograms group but the summary reads differently. I would say something like
> "State of the termal zone, collected every second. State depends on device,
> higher state corresponds to a higher temperature" or "Time spent in the states
> of termal zone, one count collected every second...".

ok

> Also you probably could use units="State" here. Then the histograms table will
> be 
>   State Quantity PDF CFD
> while without units it will be 
>   Range Quantity PDF CFD

I'll update the histograms to use units

Ilya Sherman

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms/histograms.xml#newcode24634 tools/metrics/histograms/histograms.xml:24634: +<histogram name="Platform.Thermal.Temperature.Wifi0" units="Celsius"> nit: You use ".0" for CPU ...

5 years, 11 months ago (2015-01-22 02:53:47 UTC) #8

mka

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms/histograms.xml File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms/histograms.xml#newcode24634 tools/metrics/histograms/histograms.xml:24634: +<histogram name="Platform.Thermal.Temperature.Wifi0" units="Celsius"> On 2015/01/22 02:53:47, Ilya Sherman wrote: ...

5 years, 11 months ago (2015-01-23 19:03:32 UTC) #9

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms...
File tools/metrics/histograms/histograms.xml (right):

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms...
tools/metrics/histograms/histograms.xml:24634: +<histogram
name="Platform.Thermal.Temperature.Wifi0" units="Celsius">
On 2015/01/22 02:53:47, Ilya Sherman wrote:
> nit: You use ".0" for CPU readings, but just "0" for Wifi.  I suggest keeping
> those consistent.

In the case of the Wifi interface the 0 is the index of the interface, for the
CPU it's the sensor number. Said that, for now a single sensor is used and I
can't predict if that will ever change. Putting the index is an attempt of being
prepared for more than one sensor. I'm not sure this is actually the right thing
to do or if we should rather have a temperature Cpu and adapt the histograms
when necessary, possibly deprecating the one without indexes.

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms...
tools/metrics/histograms/histograms.xml:24657: +    Temperature reading at
wireless interface 2 collected every few seconds (may
On 2015/01/22 02:53:47, Ilya Sherman wrote:
> What defines wireless interface "1" vs. "2", etc.?  It might be clearer to use
> more semantically contentful names.

It's the interface number/name assigned by the OS. In our team we discussed more
semantic schemes but didn't encounter a consistent and future proof one. Also
for the engineers who will most often look at these histograms the
hardware/system related names are probably more meaningful.

https://codereview.chromium.org/851503005/diff/20001/tools/metrics/histograms...
tools/metrics/histograms/histograms.xml:24664: +<histogram
name="Platform.Thermal.Zone.Cpu.States" units="Thermal state">
On 2015/01/22 02:53:47, Ilya Sherman wrote:
> Would it make sense for the state to be an enum?  You could use a sparse
> histogram and hash a string, for example, if you wanted to be able to assign
> flexible labels to the enum.

I considered initially to use labels like "warm", "hot", "critically hot", ...
but discarded the idea because the number of states would be limited by the
number of reasonable classifications of the thermal states in English language.

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/851503005/20001

5 years, 10 months ago (2015-01-26 19:14:51 UTC) #12

commit-bot: I haz the power

5 years, 10 months ago (2015-01-26 20:53:58 UTC) #14

Message was sent while issue was closed.

Patchset 2 (id:??) landed as
https://crrev.com/7fbd516e6641f9f02446cb2ae956a6454e81d0c9
Cr-Commit-Position: refs/heads/master@{#313133}

Issue 851503005: Add histograms for temperatures and thermal states. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Add thermald histograms #

Messages