Issue 7745007: Pyauto performance tests now output data to be graphed using autotest.

dennis_jeffrey

9 years, 4 months ago (2011-08-24 23:24:44 UTC) #1

Nirnimesh

It'd be nicer if you could get rid of the temp file and directly interpret ...

9 years, 4 months ago (2011-08-25 00:23:36 UTC) #2

dennis_jeffrey

I got rid of the temp file as suggested. http://codereview.chromium.org/7745007/diff/1/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py File chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py (right): http://codereview.chromium.org/7745007/diff/1/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py#newcode19 chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:19: ...

9 years, 4 months ago (2011-08-25 01:30:35 UTC) #3

Nirnimesh

http://codereview.chromium.org/7745007/diff/4/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py File chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py (right): http://codereview.chromium.org/7745007/diff/4/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py#newcode82 chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:82: stderr=subprocess.STDOUT).communicate()[0] check for the process's return code and raise ...

9 years, 4 months ago (2011-08-25 01:39:43 UTC) #4

dennis_jeffrey

http://codereview.chromium.org/7745007/diff/4/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py File chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py (right): http://codereview.chromium.org/7745007/diff/4/chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py#newcode82 chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:82: stderr=subprocess.STDOUT).communicate()[0] On 2011/08/25 01:39:43, Nirnimesh wrote: > check for ...

9 years, 4 months ago (2011-08-25 02:24:43 UTC) #5

truty

This is neat to see. One other thing, it's typical in Autotest to include a ...

9 years, 4 months ago (2011-08-25 04:25:00 UTC) #6

This is neat to see.  One other thing, it's typical in Autotest to include a
vanilla control file to demonstrate vanilla invocation.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
File
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py
(right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:88:
'pyauto_functional.py: %d' % proc.returncode)
You should consider if you want to tag this TestFail with a CHROMEOS_PERF string
to disambiguate this from other instances of pyauto_functional.py failures.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:92:
if re_compiled.search(line)]
Based on the way you wrote the perf value line creation I'd use match() not
search().

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf.py
File chrome/test/functional/perf.py (right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:45: def _MeasureElapsedTime(self, python_command,
num_invocations):
Are you considering that it would make this more reusable to separate the common
utility functions here into a base PerfTest class and then to specialize that
class with say a TabPerfTest class with the actual test code below?

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:61: return (stop_time - start_time) * 1000  #
Convert to milliseconds.
Just wondering...why not use timeit?  There are issues such as GC that can
affect pure Python timings discussed in timeit.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:74: std_dev = math.sqrt(sum(temp_vals) /
len(temp_vals))
This illustrates why I use a Python stats lib instead of writing these.  I think
you want to float() the len(temp_vals) otherwise this could end up being int
division which would incur unexpected rounding.  Also, you're calculating the
population stddev opposed to sample and I think sample is the more common
approach.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:109: self._OutputPerfGraphValue('%s_%s' % (units,
description), avg)
FYI, it's typical for current ChromeOS perf tests to write all the keyvals to
the Autotest db.  That way the data is preserved in case a stat function is
later proven improper.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:118: will open up a single tab.
I think you should explain this pattern of separating the initial sample from
the remaining iterations.  It's ambiguous what 1 iteration would mean - since
iterations=1 would actually return 2 separate samples.

dennis_jeffrey

Thanks you both for looking over this code. Mike - I addressed all of your ...

9 years, 4 months ago (2011-08-25 22:33:05 UTC) #8

Thanks you both for looking over this code.  Mike - I addressed all of your
comments but added a few TODO's for a couple of them - please let me know if
you'd still prefer changes to the current CL.  Also, there is indeed already a
basic autotest control file in the same directory as the file
desktopui_PyAutoPerfTests.py.  It's just that it wasn't modified in the current
CL so is not part of it.  Is that what you meant by a "vanilla control file" for
autotest?

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
File
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py
(right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:88:
'pyauto_functional.py: %d' % proc.returncode)
On 2011/08/25 04:25:01, truty wrote:
> You should consider if you want to tag this TestFail with a CHROMEOS_PERF
string
> to disambiguate this from other instances of pyauto_functional.py failures.

Good idea - I added CHROMEOS_PERF to the failure message to indicate that the
failure here happened when running pyauto_functional with that particular suite.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/chromeos/autotes...
chrome/test/chromeos/autotest/files/client/site_tests/desktopui_PyAutoPerfTests/desktopui_PyAutoPerfTests.py:92:
if re_compiled.search(line)]
On 2011/08/25 04:25:01, truty wrote:
> Based on the way you wrote the perf value line creation I'd use match() not
> search().

That thought did cross my mind, but I decided to be paranoid :-).  I changed it
to use "match" instead.  However, to appease my paranoia, I also added a newline
right before the associated print statement in perf.py to ensure the perf value
starts on a new line and is not appended at the end of some existing line (that
probably wouldn't happen anyway, but I wanted to be sure).

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf.py
File chrome/test/functional/perf.py (right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:45: def _MeasureElapsedTime(self, python_command,
num_invocations):
On 2011/08/25 04:25:01, truty wrote:
> Are you considering that it would make this more reusable to separate the
common
> utility functions here into a base PerfTest class and then to specialize that
> class with say a TabPerfTest class with the actual test code below?

I think this is a great idea, and would like to do it.  But I'd like to address
it in a subsequent CL if you don't mind, since it involves restructuring the
code here, and it strays a bit from the point of the current CL.  I added a TODO
for that to make sure I don't forget about it.  I'd like to take care of it in a
later "cleanup" CL that involves reorganizing this code.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:61: return (stop_time - start_time) * 1000  #
Convert to milliseconds.
On 2011/08/25 04:25:01, truty wrote:
> Just wondering...why not use timeit?  There are issues such as GC that can
> affect pure Python timings discussed in timeit.

I didn't use it because I didn't know about it - that's awesome!

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:74: std_dev = math.sqrt(sum(temp_vals) /
len(temp_vals))
On 2011/08/25 04:25:01, truty wrote:
> This illustrates why I use a Python stats lib instead of writing these.  I
think
> you want to float() the len(temp_vals) otherwise this could end up being int
> division which would incur unexpected rounding.  Also, you're calculating the
> population stddev opposed to sample and I think sample is the more common
> approach.

Originally Nirnimesh had recommended I use the "numpy" module for computing
avg/std dev, but we found out that it's a third-party library that we could not
always count on to be present on all the platforms (e.g., it seemed to exist on
Linux and not Windows).  Since we'd like these perf tests to also run on chrome
on Windows, we decided to leave it like this.

That is a great point about being careful of integer division unexpectedly
cropping up.  However, I think it shouldn't happen here, unless I'm missing
something (in which case I'd like to hear where I'm wrong!).  The reason is that
the division at line 74 above depends upon "sum(temp_vals)", which itself is the
sum of a set of list elements computed at line 73.  Each list element is
computed by squaring a value 'x - avg', with 'avg' being computed at line 72. 
'avg' should be a float because it is computed using "float(sum(values))".  This
should mean that the expression 'x - avg' should also be a float, making its
square a float, making each element of the "temp_vals" list a float, making
"sum(temp_vals)" in line 74 a float, preventing integer division at that line. 
I could be wrong here, so please let me know if I am - I just wanted to check to
make sure I understand this properly.

You're right about the population vs. sample std dev point you bring up.  I was
reading a bit about the different types of std dev, and I see that "sample" is
more commonly-used (according to wikipedia).  The reason I originally went with
"population" was because I figured I just want the std dev of the given set of
measurements, which seems to form the "complete population" (but I am not sure
whether that reasoning is correct).  But a benefit of using "sample" is that the
standard deviation value would be larger, which seems like a more conservative
value to report.  Do you recommend that I change it to "sample"?

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:109: self._OutputPerfGraphValue('%s_%s' % (units,
description), avg)
On 2011/08/25 04:25:01, truty wrote:
> FYI, it's typical for current ChromeOS perf tests to write all the keyvals to
> the Autotest db.  That way the data is preserved in case a stat function is
> later proven improper.

Is this something that happens automatically when values are written to the
'keyval' file using write_perf_keyval()?  I saw a comment in
platform_BootPerf.py saying that "Keyval attribute names go into a database that
truncates after 30 characters", implying that those results are going into the
database, but I don't see any code there that explicitly does that.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:118: will open up a single tab.
On 2011/08/25 04:25:01, truty wrote:
> I think you should explain this pattern of separating the initial sample from
> the remaining iterations.  It's ambiguous what 1 iteration would mean - since
> iterations=1 would actually return 2 separate samples.

This is indeed something that was bugging me a little - actually I think I'd
like to remove the code that does the initial sample, since I'm not using it for
anything right now.  That way, the number of iterations specified would exactly
match the number of samples actually taken.  I added a TODO to take care of this
when I clean up the organization of this file.

truty

All in all, I only have todo's left so lgtm. What throws me is that ...

9 years, 4 months ago (2011-08-26 20:33:00 UTC) #9

All in all, I only have todo's left so lgtm.  What throws me is that I expected
to review a patch with the updates and I don't see it.  I'll LGTM this here.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf.py
File chrome/test/functional/perf.py (right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:74: std_dev = math.sqrt(sum(temp_vals) /
len(temp_vals))
On 2011/08/25 22:33:05, dennis_jeffrey wrote:
> On 2011/08/25 04:25:01, truty wrote:
> > This illustrates why I use a Python stats lib instead of writing these.  I
> think
> > you want to float() the len(temp_vals) otherwise this could end up being int
> > division which would incur unexpected rounding.  Also, you're calculating
the
> > population stddev opposed to sample and I think sample is the more common
> > approach.
> 
> Originally Nirnimesh had recommended I use the "numpy" module for computing
> avg/std dev, but we found out that it's a third-party library that we could
not
> always count on to be present on all the platforms (e.g., it seemed to exist
on
> Linux and not Windows).  Since we'd like these perf tests to also run on
chrome
> on Windows, we decided to leave it like this.
> 
> That is a great point about being careful of integer division unexpectedly
> cropping up.  However, I think it shouldn't happen here, unless I'm missing
> something (in which case I'd like to hear where I'm wrong!).  The reason is
that
> the division at line 74 above depends upon "sum(temp_vals)", which itself is
the
> sum of a set of list elements computed at line 73.  Each list element is
> computed by squaring a value 'x - avg', with 'avg' being computed at line 72. 
> 'avg' should be a float because it is computed using "float(sum(values))". 
This
> should mean that the expression 'x - avg' should also be a float, making its
> square a float, making each element of the "temp_vals" list a float, making
> "sum(temp_vals)" in line 74 a float, preventing integer division at that line.

> I could be wrong here, so please let me know if I am - I just wanted to check
to
> make sure I understand this properly.
> 
> You're right about the population vs. sample std dev point you bring up.  I
was
> reading a bit about the different types of std dev, and I see that "sample" is
> more commonly-used (according to wikipedia).  The reason I originally went
with
> "population" was because I figured I just want the std dev of the given set of
> measurements, which seems to form the "complete population" (but I am not sure
> whether that reasoning is correct).  But a benefit of using "sample" is that
the
> standard deviation value would be larger, which seems like a more conservative
> value to report.  Do you recommend that I change it to "sample"?

-I use a Python stats library that is opensource - we should discuss it this
week.
-You're right about the float.  I'm just paranoid I guess - sorry.
-I prefer the sample stddev - please check with jrbarnette for a final
preference though.

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:109: self._OutputPerfGraphValue('%s_%s' % (units,
description), avg)
On 2011/08/25 22:33:05, dennis_jeffrey wrote:
> On 2011/08/25 04:25:01, truty wrote:
> > FYI, it's typical for current ChromeOS perf tests to write all the keyvals
to
> > the Autotest db.  That way the data is preserved in case a stat function is
> > later proven improper.
> 
> Is this something that happens automatically when values are written to the
> 'keyval' file using write_perf_keyval()?  I saw a comment in
> platform_BootPerf.py saying that "Keyval attribute names go into a database
that
> truncates after 30 characters", implying that those results are going into the
> database, but I don't see any code there that explicitly does that.

Yes. Using write_perf_keyval() causes the entire dict to get written to the db.

dennis_jeffrey

9 years, 4 months ago (2011-08-26 23:23:54 UTC) #11

Submitting this now - thank you!

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf.py
File chrome/test/functional/perf.py (right):

http://codereview.chromium.org/7745007/diff/3002/chrome/test/functional/perf....
chrome/test/functional/perf.py:74: std_dev = math.sqrt(sum(temp_vals) /
len(temp_vals))
On 2011/08/26 20:33:00, truty wrote:
> On 2011/08/25 22:33:05, dennis_jeffrey wrote:
> > On 2011/08/25 04:25:01, truty wrote:
> > > This illustrates why I use a Python stats lib instead of writing these.  I
> > think
> > > you want to float() the len(temp_vals) otherwise this could end up being
int
> > > division which would incur unexpected rounding.  Also, you're calculating
> the
> > > population stddev opposed to sample and I think sample is the more common
> > > approach.
> > 
> > Originally Nirnimesh had recommended I use the "numpy" module for computing
> > avg/std dev, but we found out that it's a third-party library that we could
> not
> > always count on to be present on all the platforms (e.g., it seemed to exist
> on
> > Linux and not Windows).  Since we'd like these perf tests to also run on
> chrome
> > on Windows, we decided to leave it like this.
> > 
> > That is a great point about being careful of integer division unexpectedly
> > cropping up.  However, I think it shouldn't happen here, unless I'm missing
> > something (in which case I'd like to hear where I'm wrong!).  The reason is
> that
> > the division at line 74 above depends upon "sum(temp_vals)", which itself is
> the
> > sum of a set of list elements computed at line 73.  Each list element is
> > computed by squaring a value 'x - avg', with 'avg' being computed at line
72. 
> > 'avg' should be a float because it is computed using "float(sum(values))". 
> This
> > should mean that the expression 'x - avg' should also be a float, making its
> > square a float, making each element of the "temp_vals" list a float, making
> > "sum(temp_vals)" in line 74 a float, preventing integer division at that
line.
> 
> > I could be wrong here, so please let me know if I am - I just wanted to
check
> to
> > make sure I understand this properly.
> > 
> > You're right about the population vs. sample std dev point you bring up.  I
> was
> > reading a bit about the different types of std dev, and I see that "sample"
is
> > more commonly-used (according to wikipedia).  The reason I originally went
> with
> > "population" was because I figured I just want the std dev of the given set
of
> > measurements, which seems to form the "complete population" (but I am not
sure
> > whether that reasoning is correct).  But a benefit of using "sample" is that
> the
> > standard deviation value would be larger, which seems like a more
conservative
> > value to report.  Do you recommend that I change it to "sample"?
> 
> -I use a Python stats library that is opensource - we should discuss it this
> week.
> -You're right about the float.  I'm just paranoid I guess - sorry.
> -I prefer the sample stddev - please check with jrbarnette for a final
> preference though.

I changed the computation to "sample" std dev based on suggestion by jrbarnette.

Issue 7745007: Pyauto performance tests now output data to be graphed using autotest. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Addressed review comments. #

Patch Set 3 : Addressed second round of review comments. #

Patch Set 4 : Addressed latest review comments. #

Patch Set 5 : Now computing sample standard deviation. #

Messages