|
|
Created:
4 years ago by Rick Byers Modified:
3 years, 12 months ago CC:
benjhayden, catapult-reviews_chromium.org, telemetry-reviews_chromium.org Target Ref:
refs/heads/master Project:
catapult Visibility:
Public. |
DescriptionSupport value lists in CSV outputter
Patch Set 1 #
Messages
Total messages: 15 (3 generated)
rbyers@chromium.org changed reviewers: + eakuefner@chromium.org
Hey Ethan, Do you think it would be reasonable to land something like this? I can add unit tests if you think this is landable. It's to enable the analysis I'm doing in https://codereview.chromium.org/1847833003/ which we agreed I wouldn't try to land for now (since it's sort of an abuse of telemetry - not being performance focused). Thanks, Rick
eakuefner@chromium.org changed reviewers: + nednguyen@google.com
+Ned for buy-in on the plan I describe below I think that if we can make CT emit results2, which will be possible for all benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 is landed (it's ready, but got reverted because of the need for a small fix which Ben has uploaded already), that's maybe the best possible solution here. With https://codereview.chromium.org/2508643002 results2 now has a button that allows downloading CSV, and I think the HistogramSet CSV format is a lot more saner and more maintainable than this Telemetry format. My ideal plan here is as follows: 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that users should use --output-format=html instead and either download the CSV from within the UI or use tracing/bin/histograms2csv 1. Make CT use HtmlOutputFormatter instead of this CSV one 2. Delete CsvPivotTableOutputFormatter If that will take too long, I'd be okay with a workaround like the one you write here. As a bonus, my proposed plan solves the mess that Ben/Ned were speaking about in https://github.com/catapult-project/catapult/issues/3020.
rbyers@chromium.org changed reviewers: + rmistry@chromium.org
On 2016/11/28 18:53:15, eakuefner wrote: > +Ned for buy-in on the plan I describe below > > I think that if we can make CT emit results2, which will be possible for all > benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 is > landed (it's ready, but got reverted because of the need for a small fix which > Ben has uploaded already), that's maybe the best possible solution here. > > With https://codereview.chromium.org/2508643002 results2 now has a button that > allows downloading CSV, and I think the HistogramSet CSV format is a lot more > saner and more maintainable than this Telemetry format. > > My ideal plan here is as follows: > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that users > should use --output-format=html instead and either download the CSV from within > the UI or use tracing/bin/histograms2csv > 1. Make CT use HtmlOutputFormatter instead of this CSV one > 2. Delete CsvPivotTableOutputFormatter > > If that will take too long, I'd be okay with a workaround like the one you write > here. > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking about in > https://github.com/catapult-project/catapult/issues/3020. I'm not sure that plan would work for cluster telemetry. +rmistry. Doesn't cluster telemetry need an output format that can be easily sharded / concatenated? For really big runs (eg. 1 million sites) the compactness of the results is important - is the html output really appropriate for that? Presumably we can't even load a 1GB HTML file in any browser. Also it seems unfortunate to add an additional manual step to the process of analyzing bulk data (having to load up an html report just to export CSV from it). Instead might it be reasonable to have some standard compact intermediate results format, and then richer visualizations like the HTML output could be generated from that raw data?
On 2016/11/28 at 18:59:10, rbyers wrote: > On 2016/11/28 18:53:15, eakuefner wrote: > > +Ned for buy-in on the plan I describe below > > > > I think that if we can make CT emit results2, which will be possible for all > > benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 is > > landed (it's ready, but got reverted because of the need for a small fix which > > Ben has uploaded already), that's maybe the best possible solution here. > > > > With https://codereview.chromium.org/2508643002 results2 now has a button that > > allows downloading CSV, and I think the HistogramSet CSV format is a lot more > > saner and more maintainable than this Telemetry format. > > > > My ideal plan here is as follows: > > > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that users > > should use --output-format=html instead and either download the CSV from within > > the UI or use tracing/bin/histograms2csv > > 1. Make CT use HtmlOutputFormatter instead of this CSV one > > 2. Delete CsvPivotTableOutputFormatter > > > > If that will take too long, I'd be okay with a workaround like the one you write > > here. > > > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking about in > > https://github.com/catapult-project/catapult/issues/3020. > > I'm not sure that plan would work for cluster telemetry. +rmistry. > > Doesn't cluster telemetry need an output format that can be easily sharded / concatenated? For really big runs (eg. 1 million sites) the compactness of the results is important - is the html output really appropriate for that? Presumably we can't even load a 1GB HTML file in any browser. Also it seems unfortunate to add an additional manual step to the process of analyzing bulk data (having to load up an html report just to export CSV from it). > > Instead might it be reasonable to have some standard compact intermediate results format, and then richer visualizations like the HTML output could be generated from that raw data? We've spoken off-thread to Ravi about supporting this functionality; it's been a month or so but I think the consensus there was that this would be feasible to do. There was a complaint about slow loading speed at the time on the results2 that Ben had generated by hand (with 10k histograms) but loading of results2 has gotten even faster since then. Results2 instances are in fact mergeable using tracing/bin/valueset2html, and the histograms can be extracted using results2json.
On 2016/11/28 18:59:10, Rick Byers wrote: > On 2016/11/28 18:53:15, eakuefner wrote: > > +Ned for buy-in on the plan I describe below > > > > I think that if we can make CT emit results2, which will be possible for all > > benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 is > > landed (it's ready, but got reverted because of the need for a small fix which > > Ben has uploaded already), that's maybe the best possible solution here. > > > > With https://codereview.chromium.org/2508643002 results2 now has a button that > > allows downloading CSV, and I think the HistogramSet CSV format is a lot more > > saner and more maintainable than this Telemetry format. > > > > My ideal plan here is as follows: > > > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that > users > > should use --output-format=html instead and either download the CSV from > within > > the UI or use tracing/bin/histograms2csv > > 1. Make CT use HtmlOutputFormatter instead of this CSV one > > 2. Delete CsvPivotTableOutputFormatter > > > > If that will take too long, I'd be okay with a workaround like the one you > write > > here. > > > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking about > in > > https://github.com/catapult-project/catapult/issues/3020. > > I'm not sure that plan would work for cluster telemetry. +rmistry. > > Doesn't cluster telemetry need an output format that can be easily sharded / > concatenated? For really big runs (eg. 1 million sites) the compactness of the > results is important - is the html output really appropriate for that? > Presumably we can't even load a 1GB HTML file in any browser. Also it seems > unfortunate to add an additional manual step to the process of analyzing bulk > data (having to load up an html report just to export CSV from it). > > Instead might it be reasonable to have some standard compact intermediate > results format, and then richer visualizations like the HTML output could be > generated from that raw data? Ethan can correct me if I am wrong, but I think the plan here is to deprecate CsvPivotTableOutputFormatter & only keep CsvOutputFormatter. IIRC, Cluster Telemetry currently uses CsvOutputFormatter, so you change should be done there. If you want more fancy UI at a scale that can be UI-ified, the result2 is for you. If you are not, --output-format=csv is the way to go for understanding medium data.
On 2016/11/28 at 19:08:19, nednguyen wrote: > On 2016/11/28 18:59:10, Rick Byers wrote: > > On 2016/11/28 18:53:15, eakuefner wrote: > > > +Ned for buy-in on the plan I describe below > > > > > > I think that if we can make CT emit results2, which will be possible for all > > > benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 is > > > landed (it's ready, but got reverted because of the need for a small fix which > > > Ben has uploaded already), that's maybe the best possible solution here. > > > > > > With https://codereview.chromium.org/2508643002 results2 now has a button that > > > allows downloading CSV, and I think the HistogramSet CSV format is a lot more > > > saner and more maintainable than this Telemetry format. > > > > > > My ideal plan here is as follows: > > > > > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that > > users > > > should use --output-format=html instead and either download the CSV from > > within > > > the UI or use tracing/bin/histograms2csv > > > 1. Make CT use HtmlOutputFormatter instead of this CSV one > > > 2. Delete CsvPivotTableOutputFormatter > > > > > > If that will take too long, I'd be okay with a workaround like the one you > > write > > > here. > > > > > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking about > > in > > > https://github.com/catapult-project/catapult/issues/3020. > > > > I'm not sure that plan would work for cluster telemetry. +rmistry. > > > > Doesn't cluster telemetry need an output format that can be easily sharded / > > concatenated? For really big runs (eg. 1 million sites) the compactness of the > > results is important - is the html output really appropriate for that? > > Presumably we can't even load a 1GB HTML file in any browser. Also it seems > > unfortunate to add an additional manual step to the process of analyzing bulk > > data (having to load up an html report just to export CSV from it). > > > > Instead might it be reasonable to have some standard compact intermediate > > results format, and then richer visualizations like the HTML output could be > > generated from that raw data? > > Ethan can correct me if I am wrong, but I think the plan here is to deprecate CsvPivotTableOutputFormatter & only keep CsvOutputFormatter. IIRC, Cluster Telemetry currently uses CsvOutputFormatter, so you change should be done there. > > If you want more fancy UI at a scale that can be UI-ified, the result2 is for you. If you are not, --output-format=csv is the way to go for understanding medium data. I thought CT uses CsvPivotTableOutputFormatter, and CsvOutputFormatter was unused. My proposal would be to get rid of both and use TBMv2's CSV format instead; it has columns for all summary statistics and works for histograms/lists/scalars. It is easy to get TBMv2's CSV format from an existing HistogramSet using histograms2csv.
On 2016/11/28 19:08:19, nednguyen wrote: > On 2016/11/28 18:59:10, Rick Byers wrote: > > On 2016/11/28 18:53:15, eakuefner wrote: > > > +Ned for buy-in on the plan I describe below > > > > > > I think that if we can make CT emit results2, which will be possible for all > > > benchmarks, not just TBMv2, once https://codereview.chromium.org/2474573002 > is > > > landed (it's ready, but got reverted because of the need for a small fix > which > > > Ben has uploaded already), that's maybe the best possible solution here. > > > > > > With https://codereview.chromium.org/2508643002 results2 now has a button > that > > > allows downloading CSV, and I think the HistogramSet CSV format is a lot > more > > > saner and more maintainable than this Telemetry format. > > > > > > My ideal plan here is as follows: > > > > > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that > > users > > > should use --output-format=html instead and either download the CSV from > > within > > > the UI or use tracing/bin/histograms2csv > > > 1. Make CT use HtmlOutputFormatter instead of this CSV one > > > 2. Delete CsvPivotTableOutputFormatter > > > > > > If that will take too long, I'd be okay with a workaround like the one you > > write > > > here. > > > > > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking > about > > in > > > https://github.com/catapult-project/catapult/issues/3020. > > > > I'm not sure that plan would work for cluster telemetry. +rmistry. > > > > Doesn't cluster telemetry need an output format that can be easily sharded / > > concatenated? For really big runs (eg. 1 million sites) the compactness of > the > > results is important - is the html output really appropriate for that? > > Presumably we can't even load a 1GB HTML file in any browser. Also it seems > > unfortunate to add an additional manual step to the process of analyzing bulk > > data (having to load up an html report just to export CSV from it). > > > > Instead might it be reasonable to have some standard compact intermediate > > results format, and then richer visualizations like the HTML output could be > > generated from that raw data? > > Ethan can correct me if I am wrong, but I think the plan here is to deprecate > CsvPivotTableOutputFormatter & only keep CsvOutputFormatter. IIRC, Cluster > Telemetry currently uses CsvOutputFormatter, so you change should be done there. > > If you want more fancy UI at a scale that can be UI-ified, the result2 is for > you. If you are not, --output-format=csv is the way to go for understanding > medium data. CT uses "--output-format=csv-pivot-table". It used to use "--output-format=csv" but that changed with https://bugs.chromium.org/p/skia/issues/detail?id=3920.
Okay, so Ned/Ben/I just spoke offline about this. I think we've reached some conclusions: 1. Rick, you can probably use this hack to power your experiment without landing it, in the way that we discussed (just make it part of your Catapult CL to patch for the CT job). 2. There is value both in reducing the number of ways to get CSV from Telemetry, and in keeping a way to get CSV directly from Telemetry, so we want to do both (reduce 3 ways of getting CSV to one, and make that what --output-format=csv does), but that is an API change that will require going through our normal deprecation/announcement route. 3. No changes should be required on the CT side except eventually changing --output-format=csv-pivot-table back to --output-format=csv when the time comes; Ravi, we'll be in touch.
On 2016/11/28 19:10:52, eakuefner wrote: > On 2016/11/28 at 19:08:19, nednguyen wrote: > > On 2016/11/28 18:59:10, Rick Byers wrote: > > > On 2016/11/28 18:53:15, eakuefner wrote: > > > > +Ned for buy-in on the plan I describe below > > > > > > > > I think that if we can make CT emit results2, which will be possible for > all > > > > benchmarks, not just TBMv2, once > https://codereview.chromium.org/2474573002 is > > > > landed (it's ready, but got reverted because of the need for a small fix > which > > > > Ben has uploaded already), that's maybe the best possible solution here. > > > > > > > > With https://codereview.chromium.org/2508643002 results2 now has a button > that > > > > allows downloading CSV, and I think the HistogramSet CSV format is a lot > more > > > > saner and more maintainable than this Telemetry format. > > > > > > > > My ideal plan here is as follows: > > > > > > > > 0. Deprecate CsvPivotTableOutputFormatter and have it emit a warning that > > > users > > > > should use --output-format=html instead and either download the CSV from > > > within > > > > the UI or use tracing/bin/histograms2csv > > > > 1. Make CT use HtmlOutputFormatter instead of this CSV one > > > > 2. Delete CsvPivotTableOutputFormatter > > > > > > > > If that will take too long, I'd be okay with a workaround like the one you > > > write > > > > here. > > > > > > > > As a bonus, my proposed plan solves the mess that Ben/Ned were speaking > about > > > in > > > > https://github.com/catapult-project/catapult/issues/3020. > > > > > > I'm not sure that plan would work for cluster telemetry. +rmistry. > > > > > > Doesn't cluster telemetry need an output format that can be easily sharded / > > > concatenated? For really big runs (eg. 1 million sites) the compactness of > the > > > results is important - is the html output really appropriate for that? > > > Presumably we can't even load a 1GB HTML file in any browser. Also it seems > > > unfortunate to add an additional manual step to the process of analyzing > bulk > > > data (having to load up an html report just to export CSV from it). > > > > > > Instead might it be reasonable to have some standard compact intermediate > > > results format, and then richer visualizations like the HTML output could be > > > generated from that raw data? > > > > Ethan can correct me if I am wrong, but I think the plan here is to deprecate > CsvPivotTableOutputFormatter & only keep CsvOutputFormatter. IIRC, Cluster > Telemetry currently uses CsvOutputFormatter, so you change should be done there. > > > > If you want more fancy UI at a scale that can be UI-ified, the result2 is for > you. If you are not, --output-format=csv is the way to go for understanding > medium data. > > I thought CT uses CsvPivotTableOutputFormatter, and CsvOutputFormatter was > unused. My proposal would be to get rid of both and use TBMv2's CSV format > instead; it has columns for all summary statistics and works for > histograms/lists/scalars. It is easy to get TBMv2's CSV format from an existing > HistogramSet using histograms2csv. Thanks. Since (as we discussed offline) there's no good way to officially support the non-perf use case with TBMv2 today, I ended up giving up trying to convert my metric to TBMv2 (seemed like a lot of work to no benefit, since I still couldn't land the CL and I was able to solve the issues I was having with renderer histograms). I could try again to convert to TBMv2, but before I invest the time perhaps we should try to flesh out the longer term plan a bit that would allow me to actually land this stuff in an officially supported way? See the open question for you in https://docs.google.com/document/d/1FSzJm2L2ow6pZTM_CuyHNJecXuX7Mx3XmBzL4SFHy...
On 2016/11/28 19:24:21, eakuefner wrote: > Okay, so Ned/Ben/I just spoke offline about this. > > I think we've reached some conclusions: > > 1. Rick, you can probably use this hack to power your experiment without landing > it, in the way that we discussed (just make it part of your Catapult CL to patch > for the CT job). > 2. There is value both in reducing the number of ways to get CSV from Telemetry, > and in keeping a way to get CSV directly from Telemetry, so we want to do both > (reduce 3 ways of getting CSV to one, and make that what --output-format=csv > does), but that is an API change that will require going through our normal > deprecation/announcement route. > 3. No changes should be required on the CT side except eventually changing > --output-format=csv-pivot-table back to --output-format=csv when the time comes; > Ravi, we'll be in touch. SGTM!
On 2016/11/28 19:26:33, rmistry wrote: > On 2016/11/28 19:24:21, eakuefner wrote: > > Okay, so Ned/Ben/I just spoke offline about this. > > > > I think we've reached some conclusions: > > > > 1. Rick, you can probably use this hack to power your experiment without > landing > > it, in the way that we discussed (just make it part of your Catapult CL to > patch > > for the CT job). Yep, that's what I've been doing for now. Just trying to keep the number of hacks I have to maintain to the minimum :-) > > 2. There is value both in reducing the number of ways to get CSV from > Telemetry, > > and in keeping a way to get CSV directly from Telemetry, so we want to do both > > (reduce 3 ways of getting CSV to one, and make that what --output-format=csv > > does), but that is an API change that will require going through our normal > > deprecation/announcement route. > > 3. No changes should be required on the CT side except eventually changing > > --output-format=csv-pivot-table back to --output-format=csv when the time > comes; > > Ravi, we'll be in touch. > > SGTM! Sounds good to me too - thanks. Is there a bug to track for #2?
On 2016/11/28 at 19:28:18, rbyers wrote: > On 2016/11/28 19:26:33, rmistry wrote: > > On 2016/11/28 19:24:21, eakuefner wrote: > > > Okay, so Ned/Ben/I just spoke offline about this. > > > > > > I think we've reached some conclusions: > > > > > > 1. Rick, you can probably use this hack to power your experiment without > > landing > > > it, in the way that we discussed (just make it part of your Catapult CL to > > patch > > > for the CT job). > > Yep, that's what I've been doing for now. Just trying to keep the number of hacks I have to maintain to the minimum :-) > > > > 2. There is value both in reducing the number of ways to get CSV from > > Telemetry, > > > and in keeping a way to get CSV directly from Telemetry, so we want to do both > > > (reduce 3 ways of getting CSV to one, and make that what --output-format=csv > > > does), but that is an API change that will require going through our normal > > > deprecation/announcement route. > > > 3. No changes should be required on the CT side except eventually changing > > > --output-format=csv-pivot-table back to --output-format=csv when the time > > comes; > > > Ravi, we'll be in touch. > > > > SGTM! > > Sounds good to me too - thanks. Is there a bug to track for #2? Yep: https://github.com/catapult-project/catapult/issues/3020 |