|
|
Chromium Code Reviews|
Created:
4 years, 8 months ago by vignatti (out of this project) Modified:
4 years, 8 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam, jln+watch_chromium.org, kalyank, ozone-reviews_chromium.org, rickyz+watch_chromium.org, Robert Sesek Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptionui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options: --enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5 ../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
Committed: https://crrev.com/5e5cf24e3825a63996779ff0e817b79a2faa1dc9
Cr-Commit-Position: refs/heads/master@{#385737}
Patch Set 1 #
Total comments: 7
Patch Set 2 : build fix pointed by dshwang #Patch Set 3 : use base::ScopedFD #
Total comments: 10
Patch Set 4 : fix spang comments and use local definition for ioctl #
Total comments: 9
Patch Set 5 : #Patch Set 6 : fix missing semicolon #
Total comments: 1
Patch Set 7 : ran git cl format #
Total comments: 4
Messages
Total messages: 38 (13 generated)
dongseong.hwang@intel.com changed reviewers: + dongseong.hwang@intel.com
Great. few nits. https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) is it ok to build chromium on ubuntu? https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:24: memset(&sync_start, 0, sizeof(sync_start)); we can change it to c++ style. struct dma_buf_sync sync_start = {0}; and then memset is not needed. https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.h (right): https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.h:33: int dmabuf_fd_; how about using base::ScopedFD?
https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) On 2016/03/29 13:23:40, dshwang wrote: > is it ok to build chromium on ubuntu? 1. need #ifdef 2. s/DMA_BUF_IOCTL_SYNC)/DMA_BUF_IOCTL_SYNC))/
Description was changed from ========== ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap This CL gives the needed support for non-coherent hardware when using prime native pixmap. It forwards cache coherency management to the driver which will execute accordingly the hardware needs. It requires kernel changes to implement the ioctl. TODO: benchmark performance before and after this. TEST=chrome on Core and also BayTrail-M platforms, using the following extra options: --enable-native-gpu-memory-buffers --enable-zero-copy \ --trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \ --trace-startup-duration=5 http://browsertests.herokuapp.com/perf/background_color_animation.html BUG=581151 ========== to ========== ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap This CL gives the needed support for non-coherent hardware when using prime native pixmap. It forwards cache coherency management to the driver which will execute accordingly the hardware needs. It requires kernel changes to implement the ioctl. cache coherent hw, Core "IvyBridge" i5: - noop * Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on "CPU Self Time". * PlaybackToMemory time takes on average 0.1840 ms to be executed. Calculation based on "Wall Duration" against "Occurrences". - sync (with this patch applied) * Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on "CPU Self Time". * PlaybackToMemory time takes on average 0.1786 ms to be executed. Calculation based on "Wall Duration" against "Occurrences". cache incoherent hw, Celeron "Bay Trail-M" N2830: - TODO Conclusion: Wall Duration average for PlaybackToMemory is *better* when the ioctls are in place. I.e. even though Map/Unmap brings an in-place overhead, for the overall graphics system there's a speed-up when syncing the caches. TEST=chrome on Core and also BayTrail-M platforms, using the following extra options: --enable-native-gpu-memory-buffers --enable-zero-copy \ --trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \ --trace-startup-duration=5 ../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html BUG=581151 ==========
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
This CL gives the needed support for non-coherent hardware when using prime
native pixmap. It forwards cache coherency management to the driver which will
execute accordingly the hardware needs. It requires kernel changes to implement
the ioctl.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core and also BayTrail-M platforms, using the following
extra options: --enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
This CL gives the needed support for non-coherent hardware when using prime
native pixmap. It forwards cache coherency management to the driver which will
execute accordingly the hardware needs. It requires kernel changes to implement
the ioctl.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
tiago.vignatti@intel.com changed reviewers: + marcheu@chromium.org, reveman@chromium.org, spang@chromium.org
PTAL. I'm abandoning 1609383003 in favour of this CL now.
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
This CL gives the needed support for non-coherent hardware when using prime
native pixmap. It forwards cache coherency management to the driver which will
execute accordingly the hardware needs. It requires kernel changes to implement
the ioctl.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
I've fixed now dshwang's commentaries but one about the Linux support. A simple #ifdef to opt out this CL from Linux is not enough. As I described now in the commit message, these changes introduced here fix the API to be used in a way that was actually meant to be, i.e. flush ioctls always needs to happen. So my idea is to opt out the native pixmap dma-buf client entirely of Linux for now. wdyt? https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/1/content/common/sandbox_linu... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) On 2016/03/29 13:53:10, dshwang wrote: > On 2016/03/29 13:23:40, dshwang wrote: > > is it ok to build chromium on ubuntu? > > 1. need #ifdef > 2. s/DMA_BUF_IOCTL_SYNC)/DMA_BUF_IOCTL_SYNC))/ 2 is done. https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:24: memset(&sync_start, 0, sizeof(sync_start)); On 2016/03/29 13:23:40, dshwang wrote: > we can change it to c++ style. > struct dma_buf_sync sync_start = {0}; > > and then memset is not needed. Done. https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.h (right): https://codereview.chromium.org/1841683003/diff/1/ui/ozone/platform/drm/commo... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.h:33: int dmabuf_fd_; On 2016/03/29 13:23:40, dshwang wrote: > how about using base::ScopedFD? Done.
https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) I'm fairly certain this is going to cause compile errors, and you'll probably need to scope it narrower than "all linux builds". https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:26: LOG(ERROR) << "fail to sync start dma buf fd, errno: " << errno; PLOG(ERROR) https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:34: LOG(ERROR) << "fail to sync end dma buf fd, errno " << errno; PLOG(ERROR) https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:36: } Add a newline between the braces and say } // namespace to close the anonymous namespace. https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:67: TRACE_EVENT0("nativepixmap", "DmaBuf:Map"); Please, use "drm". This doesn't warrant its own tracing category.
https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) On 2016/03/30 00:19:00, spang wrote: > I'm fairly certain this is going to cause compile errors, and you'll probably > need to scope it narrower than "all linux builds". yeah, I was thinking the same. Did you read what I wrote about that in my last reply above? I'm thinking to opt out entirely native dma-buf pixmap support for regular Linuxes...
On 2016/03/30 14:58:37, vignatti wrote: > https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... > File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): > > https://codereview.chromium.org/1841683003/diff/40001/content/common/sandbox_... > content/common/sandbox_linux/bpf_renderer_policy_linux.cc:35: Allow()) > On 2016/03/30 00:19:00, spang wrote: > > I'm fairly certain this is going to cause compile errors, and you'll probably > > need to scope it narrower than "all linux builds". > > yeah, I was thinking the same. Did you read what I wrote about that in my last > reply above? I'm thinking to opt out entirely native dma-buf pixmap support for > regular Linuxes... Sure, one scope that'll probably work is just #if defined(OS_CHROMEOS) I think it'd be fine to do that for starters. We do have to make sure that IsConfigurationSupported() matches. Even so, you might need to remove the #include and just #define the ioctl directly because people do Chrome OS builds on Linux for testing and the header might be missing there altogether, since this is a very recent kernel feature.
spang@ check the latest patch please. I've used the local definitions now and that works for compiling cros on Linux, but I'm afraid that's not so legible/clear. What do you think? https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:26: LOG(ERROR) << "fail to sync start dma buf fd, errno: " << errno; On 2016/03/30 00:19:00, spang wrote: > PLOG(ERROR) Done. https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:34: LOG(ERROR) << "fail to sync end dma buf fd, errno " << errno; On 2016/03/30 00:19:00, spang wrote: > PLOG(ERROR) Done. https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:36: } On 2016/03/30 00:19:00, spang wrote: > Add a newline between the braces and say > > } // namespace > > to close the anonymous namespace. Done. https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:67: TRACE_EVENT0("nativepixmap", "DmaBuf:Map"); On 2016/03/30 00:19:00, spang wrote: > Please, use "drm". This doesn't warrant its own tracing category. Done.
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=nativepixmap,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
https://codereview.chromium.org/1841683003/diff/60001/content/common/sandbox_... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/60001/content/common/sandbox_... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:11: * <linux/dma-buf.h> once kernel version 4.6 becomes widely used. Use C++ style comment. This whole block should go after all of the includes, not in the middle of them. https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc (right): https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc:73: return nullptr; NOTREACHED() https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:26: #endif Use a C++ style comment and move it below the #includes https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:43: PLOG(ERROR) << "fail to sync start dma buf fd"; "Failed to" or better yet "Failed DMA_BUF_IOCTL_SYNC"
On 2016/03/30 20:04:23, vignatti wrote: > spang@ check the latest patch please. I've used the local definitions now and > that works for compiling cros on Linux, but I'm afraid that's not so > legible/clear. What do you think? I think it's fine as long as you fix the rest of the style issues. It takes years for the uapi kernel headers to propagate widely so we have to live with this (or alternatively we work on getting the cros linux build to stop using Ubuntu's headers). > > https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... > File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): > > https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... > ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:26: LOG(ERROR) << > "fail to sync start dma buf fd, errno: " << errno; > On 2016/03/30 00:19:00, spang wrote: > > PLOG(ERROR) > > Done. > > https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... > ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:34: LOG(ERROR) << > "fail to sync end dma buf fd, errno " << errno; > On 2016/03/30 00:19:00, spang wrote: > > PLOG(ERROR) > > Done. > > https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... > ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:36: } > On 2016/03/30 00:19:00, spang wrote: > > Add a newline between the braces and say > > > > } // namespace > > > > to close the anonymous namespace. > > Done. > > https://codereview.chromium.org/1841683003/diff/40001/ui/ozone/platform/drm/c... > ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:67: > TRACE_EVENT0("nativepixmap", "DmaBuf:Map"); > On 2016/03/30 00:19:00, spang wrote: > > Please, use "drm". This doesn't warrant its own tracing category. > > Done.
PTAL now, spang. https://codereview.chromium.org/1841683003/diff/60001/content/common/sandbox_... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/60001/content/common/sandbox_... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:11: * <linux/dma-buf.h> once kernel version 4.6 becomes widely used. On 2016/03/30 21:03:24, spang wrote: > Use C++ style comment. > > This whole block should go after all of the includes, not in the middle of them. Done. https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc (right): https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc:73: return nullptr; On 2016/03/30 21:03:24, spang wrote: > NOTREACHED() Done. https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:26: #endif On 2016/03/30 21:03:25, spang wrote: > Use a C++ style comment and move it below the #includes Done. https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:43: PLOG(ERROR) << "fail to sync start dma buf fd"; On 2016/03/30 21:03:25, spang wrote: > "Failed to" or better yet "Failed DMA_BUF_IOCTL_SYNC" Done. But I've used "Failed DMA_BUF_SYNC_START" and "Failed DMA_BUF_SYNC_END" respectively.
tiago.vignatti@intel.com changed reviewers: + mdempsky@chromium.org
lgtm
mdempsky@ PTAL the changes in content/common/sandbox_linux/
https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... File ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc (right): https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc:73: return nullptr; On 2016/03/30 21:13:56, vignatti wrote: > On 2016/03/30 21:03:24, spang wrote: > > NOTREACHED() > > Done. You forgot the semicolon.
On 2016/03/30 21:20:01, spang wrote: > https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... > File ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc (right): > > https://codereview.chromium.org/1841683003/diff/60001/ui/ozone/platform/drm/c... > ui/ozone/platform/drm/client_native_pixmap_factory_gbm.cc:73: return nullptr; > On 2016/03/30 21:13:56, vignatti wrote: > > On 2016/03/30 21:03:24, spang wrote: > > > NOTREACHED() > > > > Done. > > You forgot the semicolon. oops. Thanks.
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- TODO
Conclusion:
Wall Duration average for PlaybackToMemory is *better* when the ioctls are
in place. I.e. even though Map/Unmap brings an in-place overhead, for the
overall graphics system there's a speed-up when syncing the caches.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
I posted now the tracing of non-coherent hw. PTAL.
rsesek@ PTAL the changes in content/common/sandbox_linux/
rsesek@chromium.org changed reviewers: + rickyz@chromium.org, rsesek@chromium.org
On 2016/04/04 17:59:33, vignatti wrote: > rsesek@ PTAL the changes in content/common/sandbox_linux/ rickyz@ would be a better choice for this ChromeOS change. https://codereview.chromium.org/1841683003/diff/100001/content/common/sandbox... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/100001/content/common/sandbox... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:27: #define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync) Please wrap to 80 columns, and on line 46.
On 2016/04/04 18:12:17, Robert Sesek wrote: > On 2016/04/04 17:59:33, vignatti wrote: > > rsesek@ PTAL the changes in content/common/sandbox_linux/ > > rickyz@ would be a better choice for this ChromeOS change. > > https://codereview.chromium.org/1841683003/diff/100001/content/common/sandbox... > File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): > > https://codereview.chromium.org/1841683003/diff/100001/content/common/sandbox... > content/common/sandbox_linux/bpf_renderer_policy_linux.cc:27: #define > LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct > local_dma_buf_sync) > Please wrap to 80 columns, and on line 46. Done. Thanks. rickyz@ PTAL.
https://codereview.chromium.org/1841683003/diff/120001/content/common/sandbox... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/120001/content/common/sandbox... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:26: #define LOCAL_DMA_BUF_BASE 'b' Not sure how problematic this is likely to be with ioctl req collisions, but it looks like binder also uses 'b' :-/ https://codereview.chromium.org/1841683003/diff/120001/ui/ozone/platform/drm/... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/120001/ui/ozone/platform/drm/... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:68: : dmabuf_fd_(dmabuf_fd), size_(size), stride_(stride) { Out of curiosity, where/how does this fd get exposed to the renderer? I assume it comes from an IPC somewhere?
https://codereview.chromium.org/1841683003/diff/120001/content/common/sandbox... File content/common/sandbox_linux/bpf_renderer_policy_linux.cc (right): https://codereview.chromium.org/1841683003/diff/120001/content/common/sandbox... content/common/sandbox_linux/bpf_renderer_policy_linux.cc:26: #define LOCAL_DMA_BUF_BASE 'b' On 2016/04/06 02:04:48, rickyz wrote: > Not sure how problematic this is likely to be with ioctl req collisions, but it > looks like binder also uses 'b' :-/ https://github.com/torvalds/linux/blob/master/Documentation/ioctl/ioctl-numbe... yup, we discussed this in dri-devel as well and seems alright, as noted in that txt file: "Because of the large number of drivers, many drivers share a partial letter with other drivers". https://codereview.chromium.org/1841683003/diff/120001/ui/ozone/platform/drm/... File ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc (right): https://codereview.chromium.org/1841683003/diff/120001/ui/ozone/platform/drm/... ui/ozone/platform/drm/common/client_native_pixmap_dmabuf.cc:68: : dmabuf_fd_(dmabuf_fd), size_(size), stride_(stride) { On 2016/04/06 02:04:48, rickyz wrote: > Out of curiosity, where/how does this fd get exposed to the renderer? I assume > it comes from an IPC somewhere? yes. The GPU process is the privileged process here, which directly talks to the driver and allocates the buffer. Through the GpuMemoryBuffer implementation, it then IPC that to the Renderer (unprivileged process) which basically will use that for rasterize the mmap'ed fd and also sync the caches using it: https://code.google.com/p/chromium/codesearch#chromium/src/gpu/ipc/client/gpu...
Cool, thanks for the details - lgtm.
The CQ bit was checked by tiago.vignatti@intel.com
The patchset sent to the CQ was uploaded after l-g-t-m from spang@chromium.org Link to the patchset: https://codereview.chromium.org/1841683003/#ps120001 (title: "ran git cl format")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1841683003/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1841683003/120001
Message was sent while issue was closed.
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Description was changed from
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
==========
to
==========
ui/ozone: Flush CPU caches when mapping/unmapping prime pixmap
The dma-buf API requires that the mapped buffer gets always flushed before and
after accessing it. This CL fixes this API usage therefore.
As a consequence this CL gives the needed support for non-coherent hardware
when using prime native pixmap. It forwards cache coherency management to the
driver which will execute accordingly the hardware needs. It requires kernel
changes to implement the ioctl, so for now only Chrome OS is enabled.
cache coherent hw, Core "IvyBridge" i5:
- noop
* Map/Unmap accounts for 2.13% of PlaybackToMemory. Calculation is based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1840 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 6.70% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.1786 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
cache incoherent hw, Celeron "Bay Trail-M" N2830:
- noop
* Map/Unmap accounts for 10.31% of PlaybackToMemory. Calculation is based
on CPU Self Time".
* PlaybackToMemory time takes on average 0.3955 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
- sync (with this patch applied)
* Map/Unmap accounts for 24.17% of PlaybackToMemory. Calculation based on
"CPU Self Time".
* PlaybackToMemory time takes on average 0.6593 ms to be executed.
Calculation based on "Wall Duration" against "Occurrences".
Conclusion:
Wall Duration average for PlaybackToMemory is better when the ioctls are
in place in coherent hardware. i.e. even though Map/Unmap brings an
in-place overhead, for the overall graphics system there's a speed-up when
syncing the caches. For non-coherent hardware, there's a considerable
performance regression.
TEST=chrome on Core platform, using the following options:
--enable-native-gpu-memory-buffers --enable-zero-copy \
--trace-startup=drm,cc --trace-startup-file=mapunmap-sync.json \
--trace-startup-duration=5
../../tools/perf/page_sets/tough_texture_upload_cases/background_color_animation.html
BUG=581151
Committed: https://crrev.com/5e5cf24e3825a63996779ff0e817b79a2faa1dc9
Cr-Commit-Position: refs/heads/master@{#385737}
==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/5e5cf24e3825a63996779ff0e817b79a2faa1dc9 Cr-Commit-Position: refs/heads/master@{#385737} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
