|
|
Created:
6 years, 3 months ago by Alexei Svitkine (slow) Modified:
6 years, 3 months ago CC:
chromium-reviews Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
DescriptionMake Courgette as much as 5x faster.
Changes the Courgette vector implementation to slightly
over-reserve the size of the vector, which makes many future
reserve operations no-ops (as they're quite expensive otherwise,
since they copy the full contents of the previous buffer).
When applying the patch from 35.0.1916.114 to 37.0.2062.120,
before and after this change, the runtime goes from 1m10s to 21s
on my z620.
Slightly higher multipliers produce even better results (e.g. 19s),
but this seemed like a reasonable value to chose so that it doesn't
result in significant additional memory use by Courgette.
BUG=167622
TEST=Build courgette target in Release mode and run it as
courgette.exe -apply chrome.7z chrome_patch.diff out.7z
(where chrome.7z was from un7zipping a 37.0.2062.94 chrome
installer and chrome_patch.diff was from un7zipping a
37.0.2062.94 -> 37.0.2062.120_37 chrome_updater_3stage.exe).
With the patch, the operation should run ~5x faster.
Committed: https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42
Cr-Commit-Position: refs/heads/master@{#294592}
Patch Set 1 : #
Total comments: 2
Messages
Total messages: 26 (4 generated)
Patchset #1 (id:1) has been deleted
asvitkine@chromium.org changed reviewers: + gab@chromium.org, grt@chromium.org, tommi@chromium.org
PTAL! gab+grt: review tommi: OWNERS I've only tried this on the update mention in the description, but the results are excellent. Planning to try it on some more update patches just to sanity check the results and see if the gains are consistent.
Patchset #1 (id:20001) has been deleted
This is awesome. Lgtm.
Wow, this is amazing! LGTM*1000000.01! I wonder in fact how the STL implements reserve() since the only API guarantee is that the buffer will be *at least* |size|. I wonder if there is anything even smarter to do here (when thinking about page sizes, etc.)... Anyways this is awesome as a first attempt/finding regardless!! Thanks! Gab
Some thoughts below. https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... File courgette/memory_allocator.h (right): https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... courgette/memory_allocator.h:341: size *= 1.01; This will have no impact on reserves below 100, should we also increment |size| by a fixed amount before applying the multiplier? Or how about we round |size| up to the nearest multiple of 2? This would most likely provide nice memory alignment properties while also guaranteeing an amortized linear time buffer size increase if reserve is called with sequential indices (which is terrible usage of reserve() and should be fixed in itself).
https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... File courgette/memory_allocator.h (right): https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... courgette/memory_allocator.h:341: size *= 1.01; On 2014/09/12 04:12:19, gab wrote: > This will have no impact on reserves below 100, should we also increment |size| > by a fixed amount before applying the multiplier? > > Or how about we round |size| up to the nearest multiple of 2? Oops s/multiple of 2/power of 2/ of course. > This would most > likely provide nice memory alignment properties while also guaranteeing an > amortized linear time buffer size increase if reserve is called with sequential > indices (which is terrible usage of reserve() and should be fixed in itself).
On 2014/09/12 04:12:19, gab wrote: > Some thoughts below. > > https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... > File courgette/memory_allocator.h (right): > > https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... > courgette/memory_allocator.h:341: size *= 1.01; > This will have no impact on reserves below 100, should we also increment |size| > by a fixed amount before applying the multiplier? > > Or how about we round |size| up to the nearest multiple of 2? This would most > likely provide nice memory alignment properties while also guaranteeing an > amortized linear time buffer size increase if reserve is called with sequential > indices (which is terrible usage of reserve() and should be fixed in itself). I propose we let this go out as-is and measure the impact. In parallel, Alexei can experiment with your suggestions to see how they affect time and memory usage. SGTY?
On 2014/09/12 04:27:06, grt wrote: > On 2014/09/12 04:12:19, gab wrote: > > Some thoughts below. > > > > > https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... > > File courgette/memory_allocator.h (right): > > > > > https://codereview.chromium.org/565753002/diff/40001/courgette/memory_allocat... > > courgette/memory_allocator.h:341: size *= 1.01; > > This will have no impact on reserves below 100, should we also increment > |size| > > by a fixed amount before applying the multiplier? > > > > Or how about we round |size| up to the nearest multiple of 2? This would most > > likely provide nice memory alignment properties while also guaranteeing an > > amortized linear time buffer size increase if reserve is called with > sequential > > indices (which is terrible usage of reserve() and should be fixed in itself). > > I propose we let this go out as-is and measure the impact. In parallel, Alexei > can experiment with your suggestions to see how they affect time and memory > usage. SGTY? Yes yes of course, these are just thoughts for future improvements. Sorry if that wasn't clear.
The CQ bit was checked by tommi@chromium.org
lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patchset/565753002/40001
I double checked with a few other update combinations and the improvements seem to be consistent (even for small updates). 36.0.1985.143 -> 37.0.20262.120_37: (16MB patch file) orig time: 1m20s new time: 20s 37.0.2062.94 -> 37.0.20262.120_37: (3MB patch file) orig time: 1m8s new time: 19.5s
On 2014/09/12 14:42:00, Alexei Svitkine wrote: > I double checked with a few other update combinations and the improvements seem > to be consistent (even for small updates). > > 36.0.1985.143 -> 37.0.20262.120_37: > > (16MB patch file) > > orig time: 1m20s > new time: 20s > > 37.0.2062.94 -> 37.0.20262.120_37: > > (3MB patch file) > > orig time: 1m8s > new time: 19.5s Impressive, I'd say it's worth putting this data under TEST= in the CL description for future reference.
On 2014/09/12 15:04:21, gab wrote: > On 2014/09/12 14:42:00, Alexei Svitkine wrote: > > I double checked with a few other update combinations and the improvements > seem > > to be consistent (even for small updates). > > > > 36.0.1985.143 -> 37.0.20262.120_37: > > > > (16MB patch file) > > > > orig time: 1m20s > > new time: 20s > > > > 37.0.2062.94 -> 37.0.20262.120_37: > > > > (3MB patch file) > > > > orig time: 1m8s > > new time: 19.5s > > Impressive, I'd say it's worth putting this data under TEST= in the CL > description for future reference. You wrote that just so I'd say that's not the right use of TEST, didn't you? ;-)
Added, I agree it's useful to have for future reference. (Whether TEST= section is the right place to put this or not, I won't both debating. ;)
On 2014/09/12 15:10:46, grt wrote: > On 2014/09/12 15:04:21, gab wrote: > > On 2014/09/12 14:42:00, Alexei Svitkine wrote: > > > I double checked with a few other update combinations and the improvements > > seem > > > to be consistent (even for small updates). > > > > > > 36.0.1985.143 -> 37.0.20262.120_37: > > > > > > (16MB patch file) > > > > > > orig time: 1m20s > > > new time: 20s > > > > > > 37.0.2062.94 -> 37.0.20262.120_37: > > > > > > (3MB patch file) > > > > > > orig time: 1m8s > > > new time: 19.5s > > > > Impressive, I'd say it's worth putting this data under TEST= in the CL > > description for future reference. > > You wrote that just so I'd say that's not the right use of TEST, didn't you? ;-) Arg, no I did not because I forgot about our old debate. IMO TEST= is "what you tested which others could try too to confirm your claims".
Message was sent while issue was closed.
Committed patchset #1 (id:40001) as 7ccc48338a2eed3c830ae2253fe73414b30bc9a5
Message was sent while issue was closed.
Patchset 1 (id:??) landed as https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 Cr-Commit-Position: refs/heads/master@{#294592}
Message was sent while issue was closed.
On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: > Patchset 1 (id:??) landed as > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 > Cr-Commit-Position: refs/heads/master@{#294592} What a great CL! :) There should be a LOC/Impact ratio on CLs. This one would go through the roof!
Message was sent while issue was closed.
On 2014/09/12 17:46:16, tommi wrote: > On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: > > Patchset 1 (id:??) landed as > > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 > > Cr-Commit-Position: refs/heads/master@{#294592} > > What a great CL! :) There should be a LOC/Impact ratio on CLs. This one would > go through the roof! Indeed! I can't wait to crunch some numbers after this has shipped for a few days. I think it's going to be major goodness.
As a follow-up experiment, I decided to measure how much overhead is still left as a result of re-sizing these vectors. The operation of courgette is deterministic. So I added some instrumentation to record all the final vector alloc_size's for each vector ordered by when that vector's constructed and then in a subsequent run to have each vector reserve() to its previous size (and verified that with this, the vectors are never grown). This yielded an additional ~3% improvement. (e.g. running time went from 19s to 18.5s) Given that the additional improvement is so minuscule, I don't think it's worth investing any more effort into optimizing this part of courgette (the buffer resizing) beyond this change, that already gives very close to optimal performance here. On Fri, Sep 12, 2014 at 1:55 PM, <grt@chromium.org> wrote: > On 2014/09/12 17:46:16, tommi wrote: > >> On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: >> > Patchset 1 (id:??) landed as >> > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 >> > Cr-Commit-Position: refs/heads/master@{#294592} >> > > What a great CL! :) There should be a LOC/Impact ratio on CLs. This one >> would >> go through the roof! >> > > Indeed! I can't wait to crunch some numbers after this has shipped for a > few > days. I think it's going to be major goodness. > > https://codereview.chromium.org/565753002/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Very nice analysis, your assessment SGTM, thanks Alexei! Can't wait to see field results from this! Gab On Mon, Sep 15, 2014 at 12:39 PM, Alexei Svitkine <asvitkine@chromium.org> wrote: > As a follow-up experiment, I decided to measure how much overhead is still > left as a result of re-sizing these vectors. > > The operation of courgette is deterministic. So I added some > instrumentation to record all the final vector alloc_size's for each vector > ordered by when that vector's constructed and then in a subsequent run to > have each vector reserve() to its previous size (and verified that with > this, the vectors are never grown). > > This yielded an additional ~3% improvement. (e.g. running time went from > 19s to 18.5s) > > Given that the additional improvement is so minuscule, I don't think it's > worth investing any more effort into optimizing this part of courgette (the > buffer resizing) beyond this change, that already gives very close to > optimal performance here. > > On Fri, Sep 12, 2014 at 1:55 PM, <grt@chromium.org> wrote: > >> On 2014/09/12 17:46:16, tommi wrote: >> >>> On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: >>> > Patchset 1 (id:??) landed as >>> > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 >>> > Cr-Commit-Position: refs/heads/master@{#294592} >>> >> >> What a great CL! :) There should be a LOC/Impact ratio on CLs. This one >>> would >>> go through the roof! >>> >> >> Indeed! I can't wait to crunch some numbers after this has shipped for a >> few >> days. I think it's going to be major goodness. >> >> https://codereview.chromium.org/565753002/ >> > > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Mon, Sep 15, 2014 at 12:39 PM, Alexei Svitkine <asvitkine@chromium.org> wrote: > As a follow-up experiment, I decided to measure how much overhead is still > left as a result of re-sizing these vectors. > > The operation of courgette is deterministic. So I added some > instrumentation to record all the final vector alloc_size's for each vector > ordered by when that vector's constructed and then in a subsequent run to > have each vector reserve() to its previous size (and verified that with > this, the vectors are never grown). > > This yielded an additional ~3% improvement. (e.g. running time went from > 19s to 18.5s) > I imagine that the improvement could be more dramatic on a low-ram machine that would end up thrashing due to the resizes. Could the patch generation process be modified so that it somehow contains the required space needed for each of these vectors such that the patch application phase could always get exactly the right amount of memory on the first try? > > Given that the additional improvement is so minuscule, I don't think it's > worth investing any more effort into optimizing this part of courgette (the > buffer resizing) beyond this change, that already gives very close to > optimal performance here. > > On Fri, Sep 12, 2014 at 1:55 PM, <grt@chromium.org> wrote: > >> On 2014/09/12 17:46:16, tommi wrote: >> >>> On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: >>> > Patchset 1 (id:??) landed as >>> > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 >>> > Cr-Commit-Position: refs/heads/master@{#294592} >>> >> >> What a great CL! :) There should be a LOC/Impact ratio on CLs. This one >>> would >>> go through the roof! >>> >> >> Indeed! I can't wait to crunch some numbers after this has shipped for a >> few >> days. I think it's going to be major goodness. >> >> https://codereview.chromium.org/565753002/ >> > > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Message was sent while issue was closed.
On 2014/09/16 13:41:45, grt wrote: > On Mon, Sep 15, 2014 at 12:39 PM, Alexei Svitkine <mailto:asvitkine@chromium.org> > wrote: > > > As a follow-up experiment, I decided to measure how much overhead is still > > left as a result of re-sizing these vectors. > > > > The operation of courgette is deterministic. So I added some > > instrumentation to record all the final vector alloc_size's for each vector > > ordered by when that vector's constructed and then in a subsequent run to > > have each vector reserve() to its previous size (and verified that with > > this, the vectors are never grown). > > > > This yielded an additional ~3% improvement. (e.g. running time went from > > 19s to 18.5s) > > > > I imagine that the improvement could be more dramatic on a low-ram machine > that would end up thrashing due to the resizes. Could the patch generation > process be modified so that it somehow contains the required space needed > for each of these vectors such that the patch application phase could > always get exactly the right amount of memory on the first try? True, I guess it depends on the memory speed on the machine. Do we have any stats yet from the field on the improvements from the last change? Maybe we can use those metrics to estimate how much this further optimization will help in the field. It's certainly possible (and I was thinking of this) of expanding the patch format to store this extra info, though the implementation might be a bit ugly (e.g. either we specialize the vector class in Courgette more and add a global hook that will be called on construction/destruction - or we have to change all the places where these vectors are created and destroyed - neither is particularly elegant, but if the wins are big enough then maybe still worth doing.) > > > > > > Given that the additional improvement is so minuscule, I don't think it's > > worth investing any more effort into optimizing this part of courgette (the > > buffer resizing) beyond this change, that already gives very close to > > optimal performance here. > > > > On Fri, Sep 12, 2014 at 1:55 PM, <mailto:grt@chromium.org> wrote: > > > >> On 2014/09/12 17:46:16, tommi wrote: > >> > >>> On 2014/09/12 15:32:24, I haz the power (commit-bot) wrote: > >>> > Patchset 1 (id:??) landed as > >>> > https://crrev.com/53523269bac5a15fb94da76a879ca3445f096a42 > >>> > Cr-Commit-Position: refs/heads/master@{#294592} > >>> > >> > >> What a great CL! :) There should be a LOC/Impact ratio on CLs. This one > >>> would > >>> go through the roof! > >>> > >> > >> Indeed! I can't wait to crunch some numbers after this has shipped for a > >> few > >> days. I think it's going to be major goodness. > >> > >> https://codereview.chromium.org/565753002/ > >> > > > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. |