|
|
Created:
4 years, 2 months ago by benjhayden Modified:
4 years, 2 months ago CC:
catapult-reviews_chromium.org, tracing-review_chromium.org Target Ref:
refs/heads/master Project:
catapult Visibility:
Public. |
DescriptionSerialize Histograms more efficiently.
In a results2.html that Juan emailed recently, 16700 Histograms took 95646761
bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were
IterationInfo. This CL reduces the average size of each Histogram in the
memory.top_10_mobile benchmark to 843 bytes, an improvement of 86%.
Avoid serializing unnecessary information.
Serialize sparse centralBins arrays as dictionaries.
Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves.
Add tests to monitor serialization sizes.
Trimming floats is left for another CL.
BUG=catapult:#2794
Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/03fbd54d8fdb0defa1a677b71d67a9f46544fa86
Patch Set 1 #Patch Set 2 : . #Patch Set 3 : make HistogramBin.asDict() even more efficient #Patch Set 4 : HistogramBinBoundaries builder recipes #Patch Set 5 : back out DICT_TAGS #
Total comments: 8
Patch Set 6 : binBoundariesBuilderDict_ #Patch Set 7 : *binRanges(), cache HistogramBinBoundaries #Patch Set 8 : RunningStatistics.asDict undefined #
Total comments: 22
Patch Set 9 : uniformlySampleArray #Patch Set 10 : . #Patch Set 11 : . #
Messages
Total messages: 44 (16 generated)
Description was changed from ========== Serialize Histograms more efficiently. Avoid serializing unnecessary information. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. BUG=catapult:#2794 ==========
benjhayden@chromium.org changed reviewers: + eakuefner@chromium.org, nednguyen@google.com
PTAL :-)
Description was changed from ========== Serialize Histograms more efficiently. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ==========
On 2016/09/23 23:09:14, benjhayden wrote: > PTAL :-) Can you run Juan's benchmark to see how much saving do we get?
Description was changed from ========== Serialize Histograms more efficiently. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a larger impact than these changes. BUG=catapult:#2794 ==========
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a larger impact than these changes. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by nearly half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a larger impact than these changes. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ==========
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by nearly half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a larger impact than these changes. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by nearly half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a significant impact. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ==========
On 2016/09/23 at 23:11:09, nednguyen wrote: > On 2016/09/23 23:09:14, benjhayden wrote: > > PTAL :-) > > Can you run Juan's benchmark to see how much saving do we get? This command: tools/perf/run_benchmark memory.top_10_mobile --browser android-chrome --output-format html2 --results-label out-416763-8d5 --reset-results produced a results2.html that is 36741609 bytes, which is 39% of the size of the 92MB results2.html that Juan emailed recently.
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by nearly half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a significant impact. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a significant impact. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ==========
On 2016/09/26 05:30:38, benjhayden wrote: > On 2016/09/23 at 23:11:09, nednguyen wrote: > > On 2016/09/23 23:09:14, benjhayden wrote: > > > PTAL :-) > > > > Can you run Juan's benchmark to see how much saving do we get? > > This command: > tools/perf/run_benchmark memory.top_10_mobile --browser android-chrome > --output-format html2 --results-label out-416763-8d5 --reset-results > produced a results2.html that is 36741609 bytes, which is 39% of the size of the > 92MB results2.html that Juan emailed recently. Nice win! I will look at this later today, thanks Ben!
The optimization of serializing sparse centralBins arrays as dictionaries makes a lot of sense to me. However, I don't really buy the idea of mapping dictionary strings to number. It's basically a compression scheme, which should better be achieved by gzip. If unpacking the zipped data create large string, let's use https://github.com/catapult-project/catapult/issues/2826 to address it.
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Use small numbers instead of verbose strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Serializing BinBoundaries recipes instead of boundaries numbers is left for another CL, though it will likely have a significant impact. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ==========
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use tags instead of string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ==========
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ==========
On 2016/09/26 at 18:45:57, nednguyen wrote: > The optimization of serializing sparse centralBins arrays as dictionaries makes a lot of sense to me. > > However, I don't really buy the idea of mapping dictionary strings to number. It's basically a compression scheme, which should better be achieved by gzip. If unpacking the zipped data create large string, let's use https://github.com/catapult-project/catapult/issues/2826 to address it. Actually, JSON keys must be strings, so I changed the number tags to short strings. I tried to choose meaningful characters, but I think that shorter keys do have a significant impact here. I could understand an argument that JSON should theoretically be human-readable, but I'm not sure if that's a valid use case here. value-set-json is already serialized to a single line, and none of our tools (devtools, vim, less) are capable of dealing with a 92MB-long line without line-breaks. If you're going to have to dip into python/javascript repl anyway in order to debug Histogram.asDict/fromDict, then you might as well write a couple lines of code to help you parse the results. Similarly, protocol buffer provides tools to convert between binary and pretty-printed text. Sorry, I don't understand the objection to "compression schemes". Compression is just using fewer bits to encode the same information. Serializing sparse centralBins is a compression scheme. I agree that JSONStream will solve the v8-string-too-large problem once and for all, but efficiency (or rather, I'd call JSONStream an inefficient work-around) in one part of the pipeline does not absolve inefficiency in another part. We're trying to optimize the entire workflow of generating results, storing them, serving them, and loading them. JSONStream will make loading obscenely large JSONs possible, and gzip will help store and serve them at the cost of loading latency, but the entire workflow would always be faster if those obscenely large JSONs were smaller. I also implemented the bin boundary builder recipes, which had a significant impact. LMK if you want to VC. Thanks :-)
On 2016/09/26 at 19:41:00, benjhayden wrote: > On 2016/09/26 at 18:45:57, nednguyen wrote: > > The optimization of serializing sparse centralBins arrays as dictionaries makes a lot of sense to me. > > > > However, I don't really buy the idea of mapping dictionary strings to number. It's basically a compression scheme, which should better be achieved by gzip. If unpacking the zipped data create large string, let's use https://github.com/catapult-project/catapult/issues/2826 to address it. > > Actually, JSON keys must be strings, so I changed the number tags to short strings. I tried to choose meaningful characters, but I think that shorter keys do have a significant impact here. > I could understand an argument that JSON should theoretically be human-readable, but I'm not sure if that's a valid use case here. value-set-json is already serialized to a single line, and none of our tools (devtools, vim, less) are capable of dealing with a 92MB-long line without line-breaks. If you're going to have to dip into python/javascript repl anyway in order to debug Histogram.asDict/fromDict, then you might as well write a couple lines of code to help you parse the results. Similarly, protocol buffer provides tools to convert between binary and pretty-printed text. > > Sorry, I don't understand the objection to "compression schemes". Compression is just using fewer bits to encode the same information. Serializing sparse centralBins is a compression scheme. > > I agree that JSONStream will solve the v8-string-too-large problem once and for all, but efficiency (or rather, I'd call JSONStream an inefficient work-around) in one part of the pipeline does not absolve inefficiency in another part. We're trying to optimize the entire workflow of generating results, storing them, serving them, and loading them. JSONStream will make loading obscenely large JSONs possible, and gzip will help store and serve them at the cost of loading latency, but the entire workflow would always be faster if those obscenely large JSONs were smaller. > > I also implemented the bin boundary builder recipes, which had a significant impact. > > LMK if you want to VC. > > Thanks :-) Discussed offline. I'll back out the mapping to short key strings and check that the builder recipes fix the binBoundaries heisenbug. Thanks :-)
On 2016/09/26 at 20:39:56, benjhayden wrote: > On 2016/09/26 at 19:41:00, benjhayden wrote: > > On 2016/09/26 at 18:45:57, nednguyen wrote: > > > The optimization of serializing sparse centralBins arrays as dictionaries makes a lot of sense to me. > > > > > > However, I don't really buy the idea of mapping dictionary strings to number. It's basically a compression scheme, which should better be achieved by gzip. If unpacking the zipped data create large string, let's use https://github.com/catapult-project/catapult/issues/2826 to address it. > > > > Actually, JSON keys must be strings, so I changed the number tags to short strings. I tried to choose meaningful characters, but I think that shorter keys do have a significant impact here. > > I could understand an argument that JSON should theoretically be human-readable, but I'm not sure if that's a valid use case here. value-set-json is already serialized to a single line, and none of our tools (devtools, vim, less) are capable of dealing with a 92MB-long line without line-breaks. If you're going to have to dip into python/javascript repl anyway in order to debug Histogram.asDict/fromDict, then you might as well write a couple lines of code to help you parse the results. Similarly, protocol buffer provides tools to convert between binary and pretty-printed text. > > > > Sorry, I don't understand the objection to "compression schemes". Compression is just using fewer bits to encode the same information. Serializing sparse centralBins is a compression scheme. > > > > I agree that JSONStream will solve the v8-string-too-large problem once and for all, but efficiency (or rather, I'd call JSONStream an inefficient work-around) in one part of the pipeline does not absolve inefficiency in another part. We're trying to optimize the entire workflow of generating results, storing them, serving them, and loading them. JSONStream will make loading obscenely large JSONs possible, and gzip will help store and serve them at the cost of loading latency, but the entire workflow would always be faster if those obscenely large JSONs were smaller. > > > > I also implemented the bin boundary builder recipes, which had a significant impact. > > > > LMK if you want to VC. > > > > Thanks :-) > > Discussed offline. I'll back out the mapping to short key strings and check that the builder recipes fix the binBoundaries heisenbug. > Thanks :-) For posterity, the argument that convinced me is that long key strings won't be a problem when the entire pipeline is streaming all the way from Histogram objects to JSONStream to gzip'd files on disk and back again, and the mapping between short string keys and Histogram fields could be fragile, and we want to try to avoid changing the serialization format too many times.
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Use short strings for dict keys. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ==========
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. Changing Diagnostic dicts to use short string keys is left for another CL. Histograms now look like this: {"b":[0,[0,1000,10]],"N":"","u":"unitless","g":"1197bc31-0bbb-44aa-80f1-066121e20dc9","m":10,"v":[230,250,270,300,510,580,620,680,850,880],"r":[93,990,6.1115859726001505,532.483870967742,100,49521,6530807.225806451],"c":[0,12,[11,[{"foo":{"type":"Generic","value":"bar"}}]],10,10,10,10,10,10,10]} BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ==========
Serializing bin boundaries builder recipes does seem to fix the mysterious boundary deletions. :-)
https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... File tracing/tracing/base/running_statistics.html (right): https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... tracing/tracing/base/running_statistics.html:143: return 0; nits: can you change to return "null"? https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:72: return this.count; I would keep this a Object.freeze([this.count]) for simplicity. I don't expect size of "[5]" is that much different from size of "5" https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < this.sampleValues_.length) { seems like you just want to randomly remove (this.sampleValues_.length - n) element. Can you add a method like getRandomSamples(populationArray, sampleSize)? https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:550: dict.binBoundaries = this.binBoundaries_; I thought you mean to serialize the bin boundary builder instead of the boundaries themselves?
On 2016/09/27 15:04:49, nednguyen wrote: > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > File tracing/tracing/base/running_statistics.html (right): > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > tracing/tracing/base/running_statistics.html:143: return 0; > nits: can you change to return "null"? > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > File tracing/tracing/value/histogram.html (right): > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > tracing/tracing/value/histogram.html:72: return this.count; > I would keep this a Object.freeze([this.count]) for simplicity. I don't expect > size of "[5]" is that much different from size of "5" > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < > this.sampleValues_.length) { > seems like you just want to randomly remove (this.sampleValues_.length - n) > element. Can you add a method like getRandomSamples(populationArray, > sampleSize)? > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > tracing/tracing/value/histogram.html:550: dict.binBoundaries = > this.binBoundaries_; > I thought you mean to serialize the bin boundary builder instead of the > boundaries themselves? Let me know if my idea is silly: instead of serializing the bins directly, why can we just serialize the histogram's configs & all the collected samples (a dictionary + a array of numbers)? Aren't those enough to reconstruct the whole histogram?
On 2016/09/27 at 15:07:52, nednguyen wrote: > On 2016/09/27 15:04:49, nednguyen wrote: > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > File tracing/tracing/base/running_statistics.html (right): > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > tracing/tracing/base/running_statistics.html:143: return 0; > > nits: can you change to return "null"? > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > File tracing/tracing/value/histogram.html (right): > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > tracing/tracing/value/histogram.html:72: return this.count; > > I would keep this a Object.freeze([this.count]) for simplicity. I don't expect > > size of "[5]" is that much different from size of "5" > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < > > this.sampleValues_.length) { > > seems like you just want to randomly remove (this.sampleValues_.length - n) > > element. Can you add a method like getRandomSamples(populationArray, > > sampleSize)? > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > tracing/tracing/value/histogram.html:550: dict.binBoundaries = > > this.binBoundaries_; > > I thought you mean to serialize the bin boundary builder instead of the > > boundaries themselves? > > Let me know if my idea is silly: instead of serializing the bins directly, why can we just serialize the histogram's configs & all the collected samples (a dictionary + a array of numbers)? Aren't those enough to reconstruct the whole histogram? Histograms could contain millions of samples, so we do need machinery for serializing bins, but, you're right, when there are fewer than maxNumSampleValues, then, yes, it would be better to only serialize those and not the bins. I'll do that. Thanks!
On 2016/09/27 16:02:54, benjhayden wrote: > On 2016/09/27 at 15:07:52, nednguyen wrote: > > On 2016/09/27 15:04:49, nednguyen wrote: > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > > File tracing/tracing/base/running_statistics.html (right): > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > > tracing/tracing/base/running_statistics.html:143: return 0; > > > nits: can you change to return "null"? > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > File tracing/tracing/value/histogram.html (right): > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > tracing/tracing/value/histogram.html:72: return this.count; > > > I would keep this a Object.freeze([this.count]) for simplicity. I don't > expect > > > size of "[5]" is that much different from size of "5" > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < > > > this.sampleValues_.length) { > > > seems like you just want to randomly remove (this.sampleValues_.length - n) > > > element. Can you add a method like getRandomSamples(populationArray, > > > sampleSize)? > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > tracing/tracing/value/histogram.html:550: dict.binBoundaries = > > > this.binBoundaries_; > > > I thought you mean to serialize the bin boundary builder instead of the > > > boundaries themselves? > > > > Let me know if my idea is silly: instead of serializing the bins directly, why > can we just serialize the histogram's configs & all the collected samples (a > dictionary + a array of numbers)? Aren't those enough to reconstruct the whole > histogram? > > Histograms could contain millions of samples, so we do need machinery for > serializing bins, but, you're right, when there are fewer than > maxNumSampleValues, then, yes, it would be better to only serialize those and > not the bins. I'll do that. Thanks! Arghh, I forgot the case that we need to keep track of the bin's sample count which could be million. I think we should stick with one strategy of serializing the histogram now to simplify things, and serializing the bin (or bin builder) seems good to me.
On 2016/09/27 at 16:22:17, nednguyen wrote: > On 2016/09/27 16:02:54, benjhayden wrote: > > On 2016/09/27 at 15:07:52, nednguyen wrote: > > > On 2016/09/27 15:04:49, nednguyen wrote: > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > > > File tracing/tracing/base/running_statistics.html (right): > > > > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... > > > > tracing/tracing/base/running_statistics.html:143: return 0; > > > > nits: can you change to return "null"? > > > > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > > File tracing/tracing/value/histogram.html (right): > > > > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > > tracing/tracing/value/histogram.html:72: return this.count; > > > > I would keep this a Object.freeze([this.count]) for simplicity. I don't > > expect > > > > size of "[5]" is that much different from size of "5" > > > > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > > tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < > > > > this.sampleValues_.length) { > > > > seems like you just want to randomly remove (this.sampleValues_.length - n) > > > > element. Can you add a method like getRandomSamples(populationArray, > > > > sampleSize)? > > > > > > > > > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > > > > tracing/tracing/value/histogram.html:550: dict.binBoundaries = > > > > this.binBoundaries_; > > > > I thought you mean to serialize the bin boundary builder instead of the > > > > boundaries themselves? > > > > > > Let me know if my idea is silly: instead of serializing the bins directly, why > > can we just serialize the histogram's configs & all the collected samples (a > > dictionary + a array of numbers)? Aren't those enough to reconstruct the whole > > histogram? > > > > Histograms could contain millions of samples, so we do need machinery for > > serializing bins, but, you're right, when there are fewer than > > maxNumSampleValues, then, yes, it would be better to only serialize those and > > not the bins. I'll do that. Thanks! > > Arghh, I forgot the case that we need to keep track of the bin's sample count which could be million. I think we should stick with one strategy of serializing the histogram now to simplify things, and serializing the bin (or bin builder) seems good to me. K, I'll just fix the binBoundaries_ field name. :-)
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL should reduce the average size of Histograms by about half to two-thirds. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL reduces the average size of each Histogram in the memory.top_10_mobile benchmark to 843 bytes, an improvement of 86%. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ==========
I'm still occasionally seeing (unmergeable) in results2.html with memory.top_10_mobile. I found one histogram that was serialized (by vinn) without its leading 0 bin boundary, and one that created the wrong number of bins when it was deserialized (by chrome 53.0.2785.92) even though it used the exact same HistogramBinBoundaries object with the same boundaries_ array as other Histograms that created the correct number of bins. Reloading fixed the latter histogram but not the former. What do you do when for loops are broken? Anyway, this CL is getting old. PTAL? :-) https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... File tracing/tracing/base/running_statistics.html (right): https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/base/ru... tracing/tracing/base/running_statistics.html:143: return 0; On 2016/09/27 at 15:04:49, nednguyen wrote: > nits: can you change to return "null"? We prefer undefined over null in tracing, but yes. https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:72: return this.count; On 2016/09/27 at 15:04:49, nednguyen wrote: > I would keep this a Object.freeze([this.count]) for simplicity. I don't expect size of "[5]" is that much different from size of "5" "[5]" is 300% larger than "5". Multiplied by dozens of bins per histogram times thousands of histograms. I think that's significant. https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:162: while (this.maxNumSampleValues < this.sampleValues_.length) { On 2016/09/27 at 15:04:49, nednguyen wrote: > seems like you just want to randomly remove (this.sampleValues_.length - n) element. Can you add a method like getRandomSamples(populationArray, sampleSize)? Statistics.uniformlySampleArray?
https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... File tracing/tracing/base/running_statistics.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... tracing/tracing/base/running_statistics.html:142: return undefined; JSON.stringify(undefined) --> undefined, so I think it's better to use null. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:72: return this.count; ping on not special casing on this? https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:84: var DEFAULT_SUMMARY_OPTIONS = new Map([ nits: Object.freeze(..) this? https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:125: this.binBoundariesBuilderDict_ = binBoundaries.asDict(); Oh, now I see that the API is binBoundaries.asDict() and not binBoundaries.builder.asDict(). If you choose to hide the implementation details of binBoundaries's serialization, keep binBoundariesDict_ is fine. Though another question: why do we need to create this object right here? Isn't it only needed for asDict() method? https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:137: this.summaryOptions.set('percentile', []); Why not put 'percentile' in DEFAULT_SUMMARY_OPTIONS as well? https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:164: var i = parseInt(Math.random() * this.sampleValues_.length); Ping on refactor this to a helper method https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:592: // efficient. Can you factor the logic of serializing CENTRAL_BINS to a separate method? this asDict() method is getting too long.. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:599: var centralBinsArrayOverhead = 0; I think this is a bit too complicated. Since the only extra overhead of the dictionary form is the index (should not exceeds 10k) plus the colon and comma which is very negligible, I think it's ok to just use the dict form over the array form. I would choose code simplicity, except when efficiency win is huge. In addition, a problem with "try both & pick the less overhead" strategy is that it increases peak heap memory usage. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:961: // If boundaries_ had been built, then clear them. This seems like a big change to existing API of histogram? On the left, I don't find any logic that clear existing boundaries https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:966: this.range.addValue(nextMaxBinBoundary); why not this.addBinBoundary(nextMaxBinBoundary) here?
https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... tracing/tracing/value/histogram.html:72: return this.count; On 2016/09/27 18:46:34, benjhayden wrote: > On 2016/09/27 at 15:04:49, nednguyen wrote: > > I would keep this a Object.freeze([this.count]) for simplicity. I don't expect > size of "[5]" is that much different from size of "5" > > "[5]" is 300% larger than "5". > Multiplied by dozens of bins per histogram times thousands of histograms. > I think that's significant. I look at how this save contribute to the total result. In Juan's example, we have 16700 Histograms which took 95646761 bytes. As in https://github.com/catapult-project/catapult/blob/master/tracing/tracing/valu..., I see we typical have around 100 bins for a histogram. So assuming all those bins are non-empty and only has 'count', you are saving 2 bytes * 16700 * 100 = 3340000 bytes, which is only 3.4% of that 95646761 bytes. But it's rarely the case that all histogram bins are filled, so the actual saving from this optimization is much smaller..
On 2016/09/27 19:02:58, nednguyen wrote: > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > File tracing/tracing/value/histogram.html (right): > > https://codereview.chromium.org/2364243002/diff/80001/tracing/tracing/value/h... > tracing/tracing/value/histogram.html:72: return this.count; > On 2016/09/27 18:46:34, benjhayden wrote: > > On 2016/09/27 at 15:04:49, nednguyen wrote: > > > I would keep this a Object.freeze([this.count]) for simplicity. I don't > expect > > size of "[5]" is that much different from size of "5" > > > > "[5]" is 300% larger than "5". > > Multiplied by dozens of bins per histogram times thousands of histograms. > > I think that's significant. > > I look at how this save contribute to the total result. In Juan's example, we > have 16700 Histograms which took 95646761 > bytes. > > As in > https://github.com/catapult-project/catapult/blob/master/tracing/tracing/valu..., > I see we typical have around 100 bins for a histogram. > > So assuming all those bins are non-empty and only has 'count', you are saving 2 > bytes * 16700 * 100 = 3340000 bytes, which is only 3.4% of that 95646761 bytes. > But it's rarely the case that all histogram bins are filled, so the actual > saving from this optimization is much smaller.. For the case for loop is broken, let try to create a minimal reproducible & file a v8 bug.
https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:72: return this.count; On 2016/09/27 at 18:48:54, nednguyen wrote: > ping on not special casing on this? Done. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:84: var DEFAULT_SUMMARY_OPTIONS = new Map([ On 2016/09/27 at 18:48:54, nednguyen wrote: > nits: Object.freeze(..) this? Hm, it looks like Object.freeze doesn't work on Maps? https://jsfiddle.net/11kuq8m3/ http://stackoverflow.com/questions/35747325/is-there-a-way-to-freeze-a-es6-map Also, Object.freeze() impacts performance and is not necessary for these asDict() methods. I only added them to see if they would reveal any hidden places that might modify the dicts on their way out of vinn. There don't seem to be any problems with removing them. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:125: this.binBoundariesBuilderDict_ = binBoundaries.asDict(); On 2016/09/27 at 18:48:54, nednguyen wrote: > Oh, now I see that the API is binBoundaries.asDict() and not binBoundaries.builder.asDict(). If you choose to hide the implementation details of binBoundaries's serialization, keep binBoundariesDict_ is fine. > > Though another question: why do we need to create this object right here? Isn't it only needed for asDict() method? I added a comment. It's possible that binBoundaries can be modified after creating this histogram. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:137: this.summaryOptions.set('percentile', []); On 2016/09/27 at 18:48:54, nednguyen wrote: > Why not put 'percentile' in DEFAULT_SUMMARY_OPTIONS as well? I added a comment to DEFAULT_SUMMARY_OPTIONS. Its default value is [], which is modifiable, so clients can push() to it, so each Histogram instance needs a separate Array instance. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:164: var i = parseInt(Math.random() * this.sampleValues_.length); On 2016/09/27 at 18:48:54, nednguyen wrote: > Ping on refactor this to a helper method Done. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:592: // efficient. On 2016/09/27 at 18:48:54, nednguyen wrote: > Can you factor the logic of serializing CENTRAL_BINS to a separate method? this asDict() method is getting too long.. Done. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:599: var centralBinsArrayOverhead = 0; On 2016/09/27 at 18:48:54, nednguyen wrote: > I think this is a bit too complicated. Since the only extra overhead of the dictionary form is the index (should not exceeds 10k) plus the colon and comma which is very negligible, I think it's ok to just use the dict form over the array form. > > I would choose code simplicity, except when efficiency win is huge. > > In addition, a problem with "try both & pick the less overhead" strategy is that it increases peak heap memory usage. If a histogram has 1000 bins and they are all non-empty, dictOverhead=4890 and arrayOverhead=1000. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:961: // If boundaries_ had been built, then clear them. On 2016/09/27 at 18:48:54, nednguyen wrote: > This seems like a big change to existing API of histogram? On the left, I don't find any logic that clear existing boundaries The existing boundaries are preserved in builder_. https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:966: this.range.addValue(nextMaxBinBoundary); On 2016/09/27 at 18:48:54, nednguyen wrote: > why not this.addBinBoundary(nextMaxBinBoundary) here? That's not how it works. Callers call addBinBoundary/addLinearBins/addExponentialBins. Those methods add elements to builder_, not boundaries_. When a Histogram calls binRanges(), HBB builds boundaries_ from builder_. There are several HBBs created statically in metrics and in this file. Many of them might never be used, so we can speed up loading trace-viewer by delaying computing boundaries_ until they are needed. Many Histograms can re-use the same HBB, so it is useful to cache boundaries_ in HBB. And Histograms need the builder_, so HBB must keep both its builder_ and its boundaries_. But if an HBB is modified after a Histogram uses its boundaries_, then the boundaries_ must be updated. The same logic applies for a never-used HBB as an HBB that was modified after it was used: its updated boundaries_ might never be needed. Now, if you're suggesting that, since an HBB was used once, then it will probably be used again after it was modified, then I can refactor build_ so that it can be called either all at once or iteratively as clients modify it, and it would behave differently based on whether it had ever been used, so that it wouldn't need to re-compute old boundaries for old builder slices. I can do that. But it would be complicated, and I feel like I might already be over-optimizing just a bit.
Thanks for your patience lgtm https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:599: var centralBinsArrayOverhead = 0; On 2016/09/27 21:05:16, benjhayden wrote: > On 2016/09/27 at 18:48:54, nednguyen wrote: > > I think this is a bit too complicated. Since the only extra overhead of the > dictionary form is the index (should not exceeds 10k) plus the colon and comma > which is very negligible, I think it's ok to just use the dict form over the > array form. > > > > I would choose code simplicity, except when efficiency win is huge. > > > > In addition, a problem with "try both & pick the less overhead" strategy is > that it increases peak heap memory usage. > > If a histogram has 1000 bins and they are all non-empty, dictOverhead=4890 and > arrayOverhead=1000. You convinced me with your number that this is needed. Though can you find a simpler heuristic, s.t like if 50% bin are non empty, then use array form? I think the problem with heap space growing as we try both is still there.
https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... File tracing/tracing/base/running_statistics.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... tracing/tracing/base/running_statistics.html:142: return undefined; On 2016/09/27 at 18:48:53, nednguyen wrote: > JSON.stringify(undefined) --> undefined, so I think it's better to use null. https://github.com/catapult-project/catapult/blob/master/docs/style-guide.md#... How about the empty array []?
On 2016/09/27 21:50:11, benjhayden wrote: > https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... > File tracing/tracing/base/running_statistics.html (right): > > https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/base/r... > tracing/tracing/base/running_statistics.html:142: return undefined; > On 2016/09/27 at 18:48:53, nednguyen wrote: > > JSON.stringify(undefined) --> undefined, so I think it's better to use null. > > https://github.com/catapult-project/catapult/blob/master/docs/style-guide.md#... > > How about the empty array []? empty array sgtm
https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... File tracing/tracing/value/histogram.html (right): https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... tracing/tracing/value/histogram.html:599: var centralBinsArrayOverhead = 0; On 2016/09/27 at 21:46:29, nednguyen wrote: > On 2016/09/27 21:05:16, benjhayden wrote: > > On 2016/09/27 at 18:48:54, nednguyen wrote: > > > I think this is a bit too complicated. Since the only extra overhead of the > > dictionary form is the index (should not exceeds 10k) plus the colon and comma > > which is very negligible, I think it's ok to just use the dict form over the > > array form. > > > > > > I would choose code simplicity, except when efficiency win is huge. > > > > > > In addition, a problem with "try both & pick the less overhead" strategy is > > that it increases peak heap memory usage. > > > > If a histogram has 1000 bins and they are all non-empty, dictOverhead=4890 and > > arrayOverhead=1000. > > You convinced me with your number that this is needed. Though can you find a simpler heuristic, s.t like if 50% bin are non empty, then use array form? I think the problem with heap space growing as we try both is still there. I didn't expect the tipping point to actually be at 50% since the overhead calculation is so complex, but I ran a simulation, and the tipping point drops exponentially from 100% at 2 bins to 55% at 20 bins to 50.5% at 200 bins to 50.1% at 1000 bins. So I'll just go with 50%. I still need to iterate over all centralBins twice, though, in order to avoid computing both the array and dictionary forms, since Histogram doesn't keep track of empty bins as it's being built.
On 2016/09/27 22:47:50, benjhayden wrote: > https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... > File tracing/tracing/value/histogram.html (right): > > https://codereview.chromium.org/2364243002/diff/140001/tracing/tracing/value/... > tracing/tracing/value/histogram.html:599: var centralBinsArrayOverhead = 0; > On 2016/09/27 at 21:46:29, nednguyen wrote: > > On 2016/09/27 21:05:16, benjhayden wrote: > > > On 2016/09/27 at 18:48:54, nednguyen wrote: > > > > I think this is a bit too complicated. Since the only extra overhead of > the > > > dictionary form is the index (should not exceeds 10k) plus the colon and > comma > > > which is very negligible, I think it's ok to just use the dict form over the > > > array form. > > > > > > > > I would choose code simplicity, except when efficiency win is huge. > > > > > > > > In addition, a problem with "try both & pick the less overhead" strategy > is > > > that it increases peak heap memory usage. > > > > > > If a histogram has 1000 bins and they are all non-empty, dictOverhead=4890 > and > > > arrayOverhead=1000. > > > > You convinced me with your number that this is needed. Though can you find a > simpler heuristic, s.t like if 50% bin are non empty, then use array form? I > think the problem with heap space growing as we try both is still there. > > I didn't expect the tipping point to actually be at 50% since the overhead > calculation is so complex, but I ran a simulation, and the tipping point drops > exponentially from 100% at 2 bins to 55% at 20 bins to 50.5% at 200 bins to > 50.1% at 1000 bins. So I'll just go with 50%. > > I still need to iterate over all centralBins twice, though, in order to avoid > computing both the array and dictionary forms, since Histogram doesn't keep > track of empty bins as it's being built. sgtm Iterate the centralBins is super fast I believe.
The CQ bit was checked by benjhayden@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from nednguyen@google.com Link to the patchset: https://codereview.chromium.org/2364243002/#ps200001 (title: ".")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks for the great review! Sorry there turned out to be more design decisions than I'd thought. I should have started with a doc.
Message was sent while issue was closed.
Description was changed from ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL reduces the average size of each Histogram in the memory.top_10_mobile benchmark to 843 bytes, an improvement of 86%. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 ========== to ========== Serialize Histograms more efficiently. In a results2.html that Juan emailed recently, 16700 Histograms took 95646761 bytes, for an average of 5727 bytes per Histogram. The only Diagnostics were IterationInfo. This CL reduces the average size of each Histogram in the memory.top_10_mobile benchmark to 843 bytes, an improvement of 86%. Avoid serializing unnecessary information. Serialize sparse centralBins arrays as dictionaries. Serialize HistogramBinBoundaries builder recipes instead of the boundaries themselves. Add tests to monitor serialization sizes. Trimming floats is left for another CL. BUG=catapult:#2794 Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapu... ==========
Message was sent while issue was closed.
Committed patchset #11 (id:200001) as https://chromium.googlesource.com/external/github.com/catapult-project/catapu...
Message was sent while issue was closed.
On 2016/09/28 04:28:35, benjhayden wrote: > Thanks for the great review! > Sorry there turned out to be more design decisions than I'd thought. I should > have started with a doc. No problem, another thing that would help is divide this patch to multiple one, each focuses on a single optimization. |