|
|
Created:
4 years, 3 months ago by jvanverth1 Modified:
4 years, 3 months ago CC:
reviews_skia.org Target Ref:
refs/heads/master Project:
skia Visibility:
Public. |
DescriptionSupport use of non-coherent memory allocations in Vulkan.
BUG=skia:5034
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2348523002
Committed: https://skia.googlesource.com/skia/+/9d54afc38b171c01a03b34e773d154fcf83d97dc
Patch Set 1 #Patch Set 2 : Fix viewer init #
Total comments: 12
Patch Set 3 : Some minor tweaks #Patch Set 4 : Some more clean up #
Total comments: 2
Messages
Total messages: 23 (7 generated)
Description was changed from ========== Support use of non-coherent memory allocations in Vulkan. BUG=skia:5034 ========== to ========== Support use of non-coherent memory allocations in Vulkan. BUG=skia:5034 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2348523002 ==========
jvanverth@google.com changed reviewers: + egdaniel@google.com
I'm not seeing lot of difference with this, probably because I'm not getting non-coherent allocations even on Android. But here it is.
https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.cpp File src/gpu/vk/GrVkMemory.cpp (right): https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:60: VkMemoryPropertyFlags desiredMemProps = dynamic ? VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | Is it possible that there is no speed up because we still happen to be getting memory that is coherent? Should we first check to see if we can find one without the bit instead of supporting with or without. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:260: mappedMemoryRange.size = alloc.fSize; should we not be tracking which memory gets changed since the last flush so that we only flush regions that changed. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:261: // TODO: batch these into a single call before command buffer submit? We actaully do a pretty good job of already batching this copies to only once per draw, but it is pretty easy to make it explicitly do that. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:262: // What does the spec mean by "the host writes have completed"? My guess is this is involved if different thread is writing the memory from the one calling flush. Our memcpys will just be finished since it is one thread.
https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.cpp File src/gpu/vk/GrVkMemory.cpp (right): https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:60: VkMemoryPropertyFlags desiredMemProps = dynamic ? VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | On 2016/09/15 20:42:17, egdaniel wrote: > Is it possible that there is no speed up because we still happen to be getting > memory that is coherent? Should we first check to see if we can find one without > the bit instead of supporting with or without. If there's an option that is non-coherent but cached, it should be earlier in the list. That said, should we prioritize non-coherent over cached? With this code if there's a non-coherent but non-cached option we won't pick it if there's a cached but coherent option. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:260: mappedMemoryRange.size = alloc.fSize; On 2016/09/15 20:42:17, egdaniel wrote: > should we not be tracking which memory gets changed since the last flush so that > we only flush regions that changed. Currently we just map, the client does what they like with it, and then we unmap. We'd have to add something to GrVkBuffer so the client can specify the range they changed. I'd rather do that as a separate change, unless we don't see any significant speedup with this (once we get non-coherent buffers). https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:261: // TODO: batch these into a single call before command buffer submit? On 2016/09/15 20:42:17, egdaniel wrote: > We actaully do a pretty good job of already batching this copies to only once > per draw, but it is pretty easy to make it explicitly do that. Acknowledged. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:262: // What does the spec mean by "the host writes have completed"? On 2016/09/15 20:42:17, egdaniel wrote: > My guess is this is involved if different thread is writing the memory from the > one calling flush. Our memcpys will just be finished since it is one thread. Acknowledged.
https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.cpp File src/gpu/vk/GrVkMemory.cpp (right): https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:60: VkMemoryPropertyFlags desiredMemProps = dynamic ? VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | On 2016/09/19 19:40:46, jvanverth1 wrote: > On 2016/09/15 20:42:17, egdaniel wrote: > > Is it possible that there is no speed up because we still happen to be getting > > memory that is coherent? Should we first check to see if we can find one > without > > the bit instead of supporting with or without. > > If there's an option that is non-coherent but cached, it should be earlier in > the list. That said, should we prioritize non-coherent over cached? With this > code if there's a non-coherent but non-cached option we won't pick it if there's > a cached but coherent option. So I definitely thing we should prefer one of non-coherent or caches & coherent over the other. But really without being able to test its hard to really know. I'd lean towards preferring non-coherent. But if we implement it all, then once we can actually test a non-coherent version we can easily switch the code to prefer one or the other. https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:260: mappedMemoryRange.size = alloc.fSize; On 2016/09/19 19:40:46, jvanverth1 wrote: > On 2016/09/15 20:42:17, egdaniel wrote: > > should we not be tracking which memory gets changed since the last flush so > that > > we only flush regions that changed. > > Currently we just map, the client does what they like with it, and then we > unmap. We'd have to add something to GrVkBuffer so the client can specify the > range they changed. I'd rather do that as a separate change, unless we don't see > any significant speedup with this (once we get non-coherent buffers). sgtm
https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.cpp File src/gpu/vk/GrVkMemory.cpp (right): https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:60: VkMemoryPropertyFlags desiredMemProps = dynamic ? VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | On 2016/09/19 19:56:21, egdaniel wrote: > On 2016/09/19 19:40:46, jvanverth1 wrote: > > On 2016/09/15 20:42:17, egdaniel wrote: > > > Is it possible that there is no speed up because we still happen to be > getting > > > memory that is coherent? Should we first check to see if we can find one > > without > > > the bit instead of supporting with or without. > > > > If there's an option that is non-coherent but cached, it should be earlier in > > the list. That said, should we prioritize non-coherent over cached? With this > > code if there's a non-coherent but non-cached option we won't pick it if > there's > > a cached but coherent option. > > So I definitely thing we should prefer one of non-coherent or caches & coherent > over the other. But really without being able to test its hard to really know. > I'd lean towards preferring non-coherent. But if we implement it all, then once > we can actually test a non-coherent version we can easily switch the code to > prefer one or the other. Reading the spec, a host visible option that is neither cached nor coherent doesn't exist (if a host visible type isn't cached it's implicitly coherent). The code below will prioritize CACHED over CACHED|COHERENT, and then COHERENT if no cached option is available, so I think we're okay with this.
so in general l g t m, but rebase this with the latest change I made so make sure everything fits in nicely https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.cpp File src/gpu/vk/GrVkMemory.cpp (right): https://codereview.chromium.org/2348523002/diff/20001/src/gpu/vk/GrVkMemory.c... src/gpu/vk/GrVkMemory.cpp:60: VkMemoryPropertyFlags desiredMemProps = dynamic ? VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | On 2016/09/20 14:03:55, jvanverth1 wrote: > On 2016/09/19 19:56:21, egdaniel wrote: > > On 2016/09/19 19:40:46, jvanverth1 wrote: > > > On 2016/09/15 20:42:17, egdaniel wrote: > > > > Is it possible that there is no speed up because we still happen to be > > getting > > > > memory that is coherent? Should we first check to see if we can find one > > > without > > > > the bit instead of supporting with or without. > > > > > > If there's an option that is non-coherent but cached, it should be earlier > in > > > the list. That said, should we prioritize non-coherent over cached? With > this > > > code if there's a non-coherent but non-cached option we won't pick it if > > there's > > > a cached but coherent option. > > > > So I definitely thing we should prefer one of non-coherent or caches & > coherent > > over the other. But really without being able to test its hard to really know. > > I'd lean towards preferring non-coherent. But if we implement it all, then > once > > we can actually test a non-coherent version we can easily switch the code to > > prefer one or the other. > > Reading the spec, a host visible option that is neither cached nor coherent > doesn't exist (if a host visible type isn't cached it's implicitly coherent). > The code below will prioritize CACHED over CACHED|COHERENT, and then COHERENT if > no cached option is available, so I think we're okay with this. Ah right I forgot there was a required set of possible things it could be.
Rebased and a few more minor cleanups.
lgtm with the one comment on invalidate and map https://codereview.chromium.org/2348523002/diff/60001/src/gpu/vk/GrVkGpu.cpp File src/gpu/vk/GrVkGpu.cpp (right): https://codereview.chromium.org/2348523002/diff/60001/src/gpu/vk/GrVkGpu.cpp#... src/gpu/vk/GrVkGpu.cpp:1755: GrVkMemory::InvalidateMappedAlloc(this, transferBuffer->alloc()); Is this a call we can put into map similar to Flush and unmap?
https://codereview.chromium.org/2348523002/diff/60001/src/gpu/vk/GrVkGpu.cpp File src/gpu/vk/GrVkGpu.cpp (right): https://codereview.chromium.org/2348523002/diff/60001/src/gpu/vk/GrVkGpu.cpp#... src/gpu/vk/GrVkGpu.cpp:1755: GrVkMemory::InvalidateMappedAlloc(this, transferBuffer->alloc()); On 2016/09/20 15:23:22, egdaniel wrote: > Is this a call we can put into map similar to Flush and unmap? I left it outside because we often map to write, and only occasionally to read, and I didn't want to pay any perf cost for invalidating unless we need it.
The CQ bit was checked by jvanverth@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: skia_presubmit-Trybot on master.client.skia.fyi (JOB_FAILED, http://build.chromium.org/p/client.skia.fyi/builders/skia_presubmit-Trybot/bu...)
jvanverth@google.com changed reviewers: + bsalomon@google.com
bsalomon@, PTAL
api lgtm
The CQ bit was checked by jvanverth@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Support use of non-coherent memory allocations in Vulkan. BUG=skia:5034 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2348523002 ========== to ========== Support use of non-coherent memory allocations in Vulkan. BUG=skia:5034 GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2348523002 Committed: https://skia.googlesource.com/skia/+/9d54afc38b171c01a03b34e773d154fcf83d97dc ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://skia.googlesource.com/skia/+/9d54afc38b171c01a03b34e773d154fcf83d97dc
Message was sent while issue was closed.
I think this introduced a few warnings-as-errors in the linux build: /usr/local/google/home/bsalomon/src/skia/tests/VkWrapTests.cpp:57:49: error: missing field 'fFlags' initializer [-Werror,-Wmissing-field-initializers] backendCopy.fAlloc = { VK_NULL_HANDLE, 0, 0 }; ^ /usr/local/google/home/bsalomon/src/skia/tests/VkWrapTests.cpp:103:49: error: missing field 'fFlags' initializer [-Werror,-Wmissing-field-initializers] backendCopy.fAlloc = { VK_NULL_HANDLE, 0, 0 }; ^ [ 88%] Building CXX object CMakeFiles/dm.dir/usr/local/google/home/bsalomon/src/skia/gm/filterbitmap.o [ 88%] Building CXX object CMakeFiles/dm.dir/usr/local/google/home/bsalomon/src/skia/gm/filterfastbounds.o [ 88%] Building CXX object CMakeFiles/dm.dir/usr/local/google/home/bsalomon/src/skia/gm/filterindiabox.o /usr/local/google/home/bsalomon/src/skia/tests/VkWrapTests.cpp:149:49: error: missing field 'fFlags' initializer [-Werror,-Wmissing-field-initializers] backendCopy.fAlloc = { VK_NULL_HANDLE, 0, 0 }; |