Issue 1777363002: Improve performance of the malloc shim layer

Issue 1777363002: Improve performance of the malloc shim layer (Closed)

Created:
4 years, 9 months ago by Primiano Tucci (use gerrit)

Modified:
4 years, 9 months ago

Reviewers:
Nico

CC:
chromium-reviews, wfh+watch_chromium.org, Dai Mikurube (NOT FULLTIME)

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Improve performance of the malloc shim layer This fixes the perf regression introduced by crrev.com/1781573002 As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced) causes a visible perf regression, as it causes the addition of two fences in the malloc fast path. This CL adds a workaraound that falls-back on a raw volatile ptr read on Linux+Clang, relying on the fact that on the architectures we care about a load of an aligned pointer is intrinsically atomic [1,2]. Perf regression: https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd1794af07f0a4a335f1fc2a45f120a&start_rev=379480&end_rev=380417 I verified manually that the regression goes away on cc_perftests with this patch. [1] Chapter 7 of Part 3A - System Programming Guide http://download.intel.com/design/processor/manuals/253668.pdf [2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf BUG=550886, 593872 TEST=cc_perftestsfast --gtest_filter=*PrepareTiles* Committed: https://crrev.com/d76421171daa1327f8e1c10ee710ebf911a90d2f Cr-Commit-Position: refs/heads/master@{#380600}

Patch Set 1 #

Patch Set 2 : . #

Total comments: 5

Created: 4 years, 9 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+9 lines, -1 line)			Patch
	M	base/allocator/allocator_shim.cc	View	1	1 chunk	+9 lines, -1 line	5 comments	Download

Messages

Total messages: 22 (9 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Primiano Tucci (use gerrit)

As partially expected, bummer! a barriered NoBarrier in malloc is pretty bad and our perf ...

4 years, 9 months ago (2016-03-10 19:25:38 UTC) #2

Primiano Tucci (use gerrit)

Description was changed from ========== Improve performances of the malloc shim layer As speculated, crbug.com/593344 ...

4 years, 9 months ago (2016-03-10 19:26:28 UTC) #3

Description was changed from

==========
Improve performances of the malloc shim layer

As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

to

==========
Improve performances of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

Primiano Tucci (use gerrit)

Description was changed from ========== Improve performances of the malloc shim layer This fixes the ...

4 years, 9 months ago (2016-03-10 19:37:05 UTC) #4

Description was changed from

==========
Improve performances of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

to

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1777363002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1777363002/20001

4 years, 9 months ago (2016-03-10 19:38:10 UTC) #6

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_rel_ng/builds/179548)

4 years, 9 months ago (2016-03-10 20:59:54 UTC) #8

Primiano Tucci (use gerrit)

And these are the results from the perf trybot for the pagecycler, showing the 3-9% ...

4 years, 9 months ago (2016-03-10 22:36:46 UTC) #9

Nico

lgtm, sounds like you checked that the assembly this produces makes sense. https://codereview.chromium.org/1777363002/diff/20001/base/allocator/allocator_shim.cc File base/allocator/allocator_shim.cc ...

4 years, 9 months ago (2016-03-10 22:50:00 UTC) #10

Primiano Tucci (use gerrit)

Description was changed from ========== Improve performance of the malloc shim layer This fixes the ...

4 years, 9 months ago (2016-03-11 00:10:22 UTC) #11

Description was changed from

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

to

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

Primiano Tucci (use gerrit)

https://codereview.chromium.org/1777363002/diff/20001/base/allocator/allocator_shim.cc File base/allocator/allocator_shim.cc (right): https://codereview.chromium.org/1777363002/diff/20001/base/allocator/allocator_shim.cc#newcode75 base/allocator/allocator_shim.cc:75: // TODO(primiano): Just use NoBarrier_Load once crbug.com/593344 is fixed. ...

4 years, 9 months ago (2016-03-11 00:11:08 UTC) #12

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1777363002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1777363002/20001

4 years, 9 months ago (2016-03-11 10:33:12 UTC) #14

Primiano Tucci (use gerrit)

On 2016/03/10 22:50:00, Nico wrote: > lgtm, sounds like you checked that the assembly this ...

4 years, 9 months ago (2016-03-11 10:34:39 UTC) #15

commit-bot: I haz the power

Description was changed from ========== Improve performance of the malloc shim layer This fixes the ...

4 years, 9 months ago (2016-03-11 11:21:57 UTC) #16

Message was sent while issue was closed.

Description was changed from

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

to

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

commit-bot: I haz the power

Description was changed from ========== Improve performance of the malloc shim layer This fixes the ...

4 years, 9 months ago (2016-03-11 11:23:06 UTC) #18

Message was sent while issue was closed.

Description was changed from

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
==========

to

==========
Improve performance of the malloc shim layer

This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].

Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd...

I verified manually that the regression goes away on cc_perftests with
this patch.

[1] Chapter 7 of Part 3A - System Programming Guide
    http://download.intel.com/design/processor/manuals/253668.pdf

[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual,
ARMv7-A and ARMv7-R edition
    http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*

Committed: https://crrev.com/d76421171daa1327f8e1c10ee710ebf911a90d2f
Cr-Commit-Position: refs/heads/master@{#380600}
==========

commit-bot: I haz the power

Patchset 2 (id:??) landed as https://crrev.com/d76421171daa1327f8e1c10ee710ebf911a90d2f Cr-Commit-Position: refs/heads/master@{#380600}

4 years, 9 months ago (2016-03-11 11:23:07 UTC) #19

Nico

https://codereview.chromium.org/1777363002/diff/20001/base/allocator/allocator_shim.cc File base/allocator/allocator_shim.cc (right): https://codereview.chromium.org/1777363002/diff/20001/base/allocator/allocator_shim.cc#newcode79 base/allocator/allocator_shim.cc:79: #if defined(OS_LINUX) && defined(__clang__) On 2016/03/11 00:11:08, Primiano wrote: ...

4 years, 9 months ago (2016-03-11 15:42:04 UTC) #20

Primiano Tucci (use gerrit)

On 2016/03/11 15:42:04, Nico wrote: > I mean on ARM fences are pretty different, so ...

4 years, 9 months ago (2016-03-11 16:22:06 UTC) #21

Nico

4 years, 9 months ago (2016-03-11 16:24:24 UTC) #22

Message was sent while issue was closed.

CrOS doesn't use clang yet, but they're looking at it. (Sounds like they won't
use Chromium's clang at the start though.)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages