Issue 1776453003: Added initial implementation of Vulkan Render Passes.

David Yen

Description was changed from ========== Added initial implementation of a VulkanSurface. An initial VulkanSurface implementation ...

4 years, 9 months ago (2016-03-08 01:16:39 UTC) #1

David Yen

dyen@chromium.org changed reviewers: + piman@chromium.org

4 years, 9 months ago (2016-03-08 01:23:22 UTC) #2

David Yen

Sorry for the biggish change, I kind of had to put all of it together ...

4 years, 9 months ago (2016-03-08 01:23:22 UTC) #3

piman

https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_command_buffer_stub.h File content/common/gpu/gpu_command_buffer_stub.h (right): https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_command_buffer_stub.h#newcode270 content/common/gpu/gpu_command_buffer_stub.h:270: #endif I don't think we should be touching GpuCommandBufferStub. ...

4 years, 9 months ago (2016-03-09 01:25:35 UTC) #4

David Yen

Thank you for the review. I've changed everything to DLOG(ERROR) from LOG(ERROR), but I wish ...

4 years, 9 months ago (2016-03-10 01:39:50 UTC) #5

Thank you for the review. I've changed everything to DLOG(ERROR) from
LOG(ERROR), but I wish there were a similar flag as dcheck_always_on for DLOGs,
do you know if there is a reason there isn't one? I would just always have that
on locally.

https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_...
File content/common/gpu/gpu_command_buffer_stub.h (right):

https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_...
content/common/gpu/gpu_command_buffer_stub.h:270: #endif
On 2016/03/09 01:25:33, piman wrote:
> I don't think we should be touching GpuCommandBufferStub. In the Vulkan world,
> we wouldn't be using GpuCommandBufferStubs, nor the GPU process in the way we
do
> with GL - if at all. Instead, we should have the OutputSurface directly
connect
> to the VulkanSurface.

Still working on this, just fixed everything else for you to review first.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
gpu/vulkan/vulkan_command_buffer.cc:16:
DCHECK_EQ(reinterpret_cast<VkCommandBuffer>(VK_NULL_HANDLE), command_buffer_);
On 2016/03/09 01:25:33, piman wrote:
> nit: static_cast

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
gpu/vulkan/vulkan_command_buffer.cc:31: LOG(ERROR) <<
"vkAllocateCommandBuffers() failed: " << result;
On 2016/03/09 01:25:33, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
gpu/vulkan/vulkan_command_buffer.cc:54: LOG(ERROR) << "vkQueueSubmit() failed: "
<< result;
On 2016/03/09 01:25:33, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
File gpu/vulkan/vulkan_command_buffer.h (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma...
gpu/vulkan/vulkan_command_buffer.h:24: bool Submit();
On 2016/03/09 01:25:33, piman wrote:
> I would prefer being explicit here and passing the queue. It's likely we'll
want
> to take advantage of a separate transfer queue if available.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple...
File gpu/vulkan/vulkan_implementation.cc (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple...
gpu/vulkan/vulkan_implementation.cc:154:
command_pool_create_info.queueFamilyIndex = vk_queue_index;
On 2016/03/09 01:25:34, piman wrote:
> If we reset command buffers individually, we need to set the
> VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag.

Interesting, I missed this. Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple...
gpu/vulkan/vulkan_implementation.cc:159: LOG(ERROR) << "vkCreateCommandPool()
failed: " << result;
On 2016/03/09 01:25:33, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple...
gpu/vulkan/vulkan_implementation.cc:172: VkCommandPool vk_command_pool;
On 2016/03/09 01:25:33, piman wrote:
> We'll definitely need more than one pool - typically one per thread, otherwise
> the mutual exclusion requirement mean command buffer recording needs to be
> serialized... So I think the pool usage should not be implicit, and instead
> explicitly created and managed by the higher levels.

Done. Separated out into a separate class. I now have the SwapChain own the
command pool for the swap change images command buffers.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple...
gpu/vulkan/vulkan_implementation.cc:214: command_buffer->Initialize();
On 2016/03/09 01:25:33, piman wrote:
> don't we need to handle the return value?
> 
> At a high level we will need to handle errors one way or another, and I don't
> think this is something that we'll want to wait until the last minute, I think
> we want to develop patterns early on.
> 
> There's a few errors that will be fairly common (as in, many calls may return
> them, even though they shouldn't be common in practice):
> 
> - VK_ERROR_OUT_OF_HOST_MEMORY, we can probably ignore, because we should abort
> even before we get there, because malloc itself will abort.
> - VK_ERROR_OUT_OF_DEVICE_MEMORY, we will need to handle somehow. It's not
> acceptable to crash the browser in this condition, but we may be able to
handle
> similarly to lost device as a first step, but we may need finer grain
management
> (e.g. if it happens when allocating resource memory for buffers/images, maybe
we
> can free speculative resources instead).
> (those 2 are pretty much always allowed in any call that returns a VkResult)
> - VK_ERROR_DEVICE_LOST, we need to handle by aborting the current frame,
> destroying everything, and recreating everything (similar to GL), and if it
> fails then, fall back to software. We can punt the whole retry logic for now,
> but at least we need to surface the error so that we can handle this
correctly.
> 
> 
> There's a few more topical errors:
> - VK_ERROR_MEMORY_MAP_FAILED, we can probably handle like
> VK_ERROR_OUT_OF_HOST_MEMORY, i.e. terminate, though we'll have to do that
> explicitly.
> - VK_ERROR_TOO_MANY_OBJECTS, depends on context (e.g. if on device creation,
> this is fatal and must fallback to software (or GL), if on device memory
> allocation we should handle like VK_ERROR_OUT_OF_DEVICE_MEMORY). There's some
> places where it could happen in theory (e.g. vkCreateSampler), but we should
> make sure we don't run into it in the first place (i.e. pooling samplers to
stay
> below the limit).
> - VK_ERROR_INITIALIZATION_FAILED, on instance/device creation, this is fatal
and
> we must fallback to software
> - VK_ERROR_FORMAT_NOT_SUPPORTED is only on
> vkGetPhysicalDeviceImageFormatProperties, which I don't know if we'll actually
> need
> -
>
VK_ERROR_LAYER_NOT_PRESENT/VK_ERROR_EXTENSION_NOT_PRESENT/VK_ERROR_FEATURE_NOT_PRESENT
> we should avoid in the first place
> - VK_ERROR_INCOMPATIBLE_DISPLAY_KHR shouldn't happen (only for
> vkCreateSharedSwapchains which we shouldn't use)
> - VK_ERROR_NATIVE_WINDOW_IN_USE_KHR shouldn't happen
> - VK_ERROR_SURFACE_LOST_KHR and VK_ERROR_OUT_OF_DATE_KHR have to do with WSI
> presentation and will need special handling (depending on conditions, from
> ignore to recreate swapchain, or possibly lost device).

I'll have this function return nullptr on errors. Currently all these functions
are just called on initialization so I'm planning on having them just fallback
to GL in those cases, and GL will fallback to software.

I'm not sure how to handle the whole reinitialization on at this point, punt
until how we have drawing tasks?

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
File gpu/vulkan/vulkan_surface.cc (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:63: VulkanWSISurface(gfx::AcceleratedWidget window)
: window_(window) {}
On 2016/03/09 01:25:34, piman wrote:
> nit: explicit

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:87: LOG(ERROR) <<
"vkGetPhysicalDeviceSurfaceFormatsKHR() failed: " << result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:113: surface_format_.format =
VK_FORMAT_B8G8R8A8_UNORM;
On 2016/03/09 01:25:34, piman wrote:
> There's no guarantee that this is supported. If we don't find any format that
we
> can handle, we have to fail.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:123: LOG(ERROR) <<
"vkGetPhysicalDeviceSurfaceCapabilitiesKHR() failed: "
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:152: LOG(ERROR) << "Failed to submit command buffer
commands.";
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:157: // and keeping around the command buffer until
it's done.
On 2016/03/09 01:25:34, piman wrote:
> We'll definitely want some facility to do that - delaying destruction of
command
> buffers (and other objects) until a fence passes - that would be useful in
many
> places where we need transient things (e.g. texture uploads).

After thinking through this more, I now have this command buffer owned by the
swap chain class and it uses it for all image layout commands. I have it
synchronized through semaphores.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:160: LOG(ERROR) << "vkQueueWaitIdle() failed: " <<
result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.cc:185: class VulkanBufferSurface : public
VulkanSurface {
On 2016/03/09 01:25:34, piman wrote:
> I don't think this is useful - you can't use a VkBuffer as a render target. If
> we wanted an offscreen VulkanSurface, it'd be backed by one (or more) VkImage,
> but as mentioned above, I don't think this is necessary at this level of
> abstraction.

Deleted.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
File gpu/vulkan/vulkan_surface.h (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.h:24: SURFACE_OSMESA_RGBA,
On 2016/03/09 01:25:34, piman wrote:
> I doubt OSMesa will ever support Vulkan.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa...
gpu/vulkan/vulkan_surface.h:39: // Create a surface for offscreen rendering.
On 2016/03/09 01:25:34, piman wrote:
> I don't think we want this. It was necessary in GL because you need a surface
to
> use a context at all (to make current), but in Vulkan this is not needed - we
> can use the device without any "current" surface

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:74: std::max(surface_caps.minImageCount + 1,
surface_caps.maxImageCount);
On 2016/03/09 01:25:34, piman wrote:
> We shouldn't need +1, because we shouldn't be keeping ownership (on the app
> side) of more than 1 image at a time. We do need to make sure it's at least 2,
> but if minImageCount==2 we can use that, we don't need 3:
> 
> Start: presentation owns 2 images, app owns 0
> - acquire image -> image 0 (presentation owns 1, app owns 1)
> - render to image 0
> - swap image 0 (presentation owns 2, app owns 0, image 0 is displayed)
> - acquire image -> image 1 (presentation owns 1, app owns 1)
> - render to image 1
> - swap image 1 (presentation owns 2, app owns 0, image 1 is displayed, image 0
> is available)
> - acquire image -> image 0 etc.
> 
> We also definitely don't want to allocate maxImageCount if we don't need them.

Oops, I meant std::min here. I was assuming minImageCount was 1 here. Actually I
am not sure what minImageCount being >1 would mean, would that mean the
presentation must own minImageCount images? So I was thinking if it must own 2
then I would always want one to be able to render to. I'll just put std::min(2,
max) here.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:83: swap_chain_create_info.presentMode =
VK_PRESENT_MODE_IMMEDIATE_KHR;
On 2016/03/09 01:25:35, piman wrote:
> VK_PRESENT_MODE_FIFO_KHR
> 
> 1- that's the only one that's guaranteed
> 2- that's what we want (vsync, no wasted work, similar to eglSwapBuffers with
> eglSwapInterval set to 1)

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:91: LOG(ERROR) << "vkCreateSwapchainKHR()
failed: " << result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:96: // TODO(dyen): Look into how to do this
safely while commands in flight.
On 2016/03/09 01:25:34, piman wrote:
> Right, at the very least we need to wait for previous work to complete.

I will worry about this when I look into how to safely resize().

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:103: bool VulkanSwapChain::InitializeRenderPass(
On 2016/03/09 01:25:35, piman wrote:
> I don't feel like this belongs to the SwapChain. E.g. the compositor may want
a
> stencil buffer for path rendering, or do multi-pass, or use multisampling. It
> will also want to control LOAD_OP_CLEAR or LOAD_OP_LOAD on a case-by-case
basis.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:114: attachment_description.finalLayout =
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
On 2016/03/09 01:25:34, piman wrote:
> It would be useful to do the transition to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR as
> part of the render pass, it can be advantageous in case the driver can do it
on
> a per-tile basis.

RenderPass no longer in this CL, I'll make a note of this in the next one.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:135: LOG(ERROR) << "vkCreateRenderPass() failed:
" << result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:153: LOG(ERROR) <<
"vkGetSwapchainImagesKHR(NULL) failed: " << result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:161: LOG(ERROR) <<
"vkGetSwapchainImagesKHR(images) failed: " << result;
On 2016/03/09 01:25:34, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:177: VK_ACCESS_HOST_WRITE_BIT |
VK_ACCESS_TRANSFER_WRITE_BIT;
On 2016/03/09 01:25:35, piman wrote:
> I don't think either of these flags is useful:
> - we didn't write to the image from the CPU
> - we didn't write to the image with a "transfer" command (copy/blit).
> 
> Really for the initial transition, srcAccessMask = 0 / srcStageMask = 0 is ok.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:189: image_view_create_info.components =
view_mapping;
On 2016/03/09 01:25:34, piman wrote:
> For a render target view, the VkComponentMapping needs to be identity -
> VkComponentMapping only applies to sampling.

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:192: // Frame buffer creation info.
On 2016/03/09 01:25:34, piman wrote:
> For the same reason as the RenderPass, I think we want the client to manage
the
> Framebuffer(s), because it may need stencil and/or more images for multi-pass
or
> multisampling

Ok, I was originally thinking we could just have different attachments as
"options" to the SwapBuffer. But it's probably easier to just let the client
create this like you say. All that needs to be done is to expose the ImageView
then. I'll remove this code for now and will probably create classes to
encapsulate render passes and frame buffers later.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:209: VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 0, 0,
nullptr, 0,
On 2016/03/09 01:25:34, piman wrote:
> If the intent is for this to happen before rendering, the destination stage
> should include VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT - and the
> destination access mask should include VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT

So I was envisioning every image to be in "PRESENT" mode when done, then upon
begin rendering I modify PRESENT -> COLOR, and before present I modify COLOR ->
PRESENT. So for that reason I was initializing them to PRESENT.

Although after actually writing the SwapBuffer() code I now have it synchronized
using 4 semaphores and my initial state is indeed COLOR mode now so I have
changed this code. I was going to have the SwapBuffer() stuff in another CL but
I'll just include it here since it changes the structure of this file quite a
bit.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:217: LOG(ERROR) << "vkCreateImageView() failed:
" << result;
On 2016/03/09 01:25:35, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.cc:226: LOG(ERROR) << "vkCreateFramebuffer()
failed: " << result;
On 2016/03/09 01:25:35, piman wrote:
> nit: DLOG

Done.

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
File gpu/vulkan/vulkan_swap_chain.h (right):

https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_...
gpu/vulkan/vulkan_swap_chain.h:52: std::vector<scoped_ptr<BufferData>> buffers_;
On 2016/03/09 01:25:35, piman wrote:
> nit: "buffer" can be a bit confusing, because in Vulkan it really means
"linear
> memory used for vertex buffers or constants", not "images"/"render targets".

Renamed to ImageData/images_. Done.

David Yen

https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap_chain.cc File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap_chain.cc#newcode101 gpu/vulkan/vulkan_swap_chain.cc:101: current_image_data->present_semaphore, Hmm... looking at this code again I'm unclear ...

4 years, 9 months ago (2016-03-10 02:19:38 UTC) #6

David Yen

I've added "vulkan_tests" which is an end to end test similar to "gl_tests". Currently I ...

4 years, 9 months ago (2016-03-11 00:59:47 UTC) #7

David Yen

https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_command_buffer_stub.h File content/common/gpu/gpu_command_buffer_stub.h (right): https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_command_buffer_stub.h#newcode270 content/common/gpu/gpu_command_buffer_stub.h:270: #endif On 2016/03/10 01:39:48, David Yen wrote: > On ...

4 years, 9 months ago (2016-03-11 01:01:16 UTC) #8

piman

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_command_buffer.cc#newcode60 gpu/vulkan/vulkan_command_buffer.cc:60: if (num_wait_semaphores) { nit: the if here and below ...

4 years, 9 months ago (2016-03-11 03:07:53 UTC) #9

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:60: if (num_wait_semaphores) {
nit: the if here and below are unnecessary, in fact (the cost of the test is
higher than the cost of setting the extra field on the stack).

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf...
File gpu/vulkan/vulkan_surface.h (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf...
gpu/vulkan/vulkan_surface.h:22: SURFACE_GBRA8888,
nit: GBRA->BGRA?

BTW, we can start with this, but thinking more about it, I suspect we won't
really have a choice, at least on desktop. The available formats are likely
fixed based on the visual/format of the underlying window. It may make more
sense to just create the window based on what we want (16 vs 32 bits), then
create the VulkanSurface, and then query the format.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:51: &current_image_data->render_semaphore)) {
You don't need a semaphore to order command buffers on the same queue. I think
you should be able to get away with 2 semaphores per frame, one post-acquire,
one pre-present.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:57: ScopedSingleUseCommandBufferRecorder
recorder(
2 things:
1- you can't reuse the command buffer until you know the previous instance is
done executing - you need a fence for that
2- on some drivers/platforms, submit is supposed to be expensive (hundreds of µs
to ms), so creating a small command buffer has a lot of overhead. A possibility
is to submit multiple command buffers at once, but I think I'd let the caller do
the transition in the main command buffer instead.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:99: // do the synchronization for us.
vkAcquireNextImageKHR needs to be able to block, regardless of semaphores.
Basically, even with an infinitely fast GPU, the system may be asynchronous,
e.g. SurfaceFlinger on Android, or X11, etc.

What this means is that you probably want an infinite timeout, at least to start
with, though maybe a polling loop can be useful at a higher level (can punt for
now).

Obviously, you'd want to do the vkAcquireNextImageKHR as late as possible,
rather than right after the Present, to minimize waiting time. For example if
the compositor is vsync driven, you wouldn't want to acquire until we start
drawing the next frame. It may make sense to separate Acquire from Present into
2 functions.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:116: ScopedSingleUseCommandBufferRecorder
recorder(
Same points as above.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:138: // here.
This should be ok (note that I don't think you need it per above, though). It's
ok to wait on a semaphore before it's signalled (note that it's the queue
waiting, not the CPU), that's more or less the point of semaphores. There are
some constraints around reuse, though - if the semaphore was signalled and
waited on a different queue, it would be invalid to queue another signal unless
you can guarantee the second signal will happen after the wait (e.g. with
another semaphore, or a fence). In your case, you're fine though (single queue).

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:159: std::min(2u, surface_caps.maxImageCount);
It does have to be more than surface_caps.minImageCount though (that could be 3+
in theory - probably not in practice for FIFO mode).

David Yen

Since a lot of the comments were around fixing the layout transition code, I went ...

4 years, 9 months ago (2016-03-21 18:26:32 UTC) #10

Since a lot of the comments were around fixing the layout transition code, I
went ahead and just wrote the render pass code since that is what we will be
using anyways. PTAL.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:60: if (num_wait_semaphores) {
On 2016/03/11 03:07:52, piman wrote:
> nit: the if here and below are unnecessary, in fact (the cost of the test is
> higher than the cost of setting the extra field on the stack).

Good point. Done.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf...
File gpu/vulkan/vulkan_surface.h (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf...
gpu/vulkan/vulkan_surface.h:22: SURFACE_GBRA8888,
On 2016/03/11 03:07:52, piman wrote:
> nit: GBRA->BGRA?
> 
> BTW, we can start with this, but thinking more about it, I suspect we won't
> really have a choice, at least on desktop. The available formats are likely
> fixed based on the visual/format of the underlying window. It may make more
> sense to just create the window based on what we want (16 vs 32 bits), then
> create the VulkanSurface, and then query the format.

Done.

That's interesting about the format choice, I will keep that in mind as we
progress. So some desktops won't support VK_FORMAT_B8G8R8A8_UNORM? I would
assume that would be supported everywhere which is why I chose it to be the
default format.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:51: &current_image_data->render_semaphore)) {
On 2016/03/11 03:07:52, piman wrote:
> You don't need a semaphore to order command buffers on the same queue. I think
> you should be able to get away with 2 semaphores per frame, one post-acquire,
> one pre-present.

That would make things easier. I wasn't sure because according to Chapter 5 in
the spec it says "Unless otherwise specified, and without explicit
synchronization, the various commands submitted to a queue via command buffers
may execute in arbitrary order relative to each other, and/or concurrently". I
guess that is talking more about the execution and not queuing the command on
the device?

Done.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:57: ScopedSingleUseCommandBufferRecorder
recorder(
On 2016/03/11 03:07:53, piman wrote:
> 2 things:
> 1- you can't reuse the command buffer until you know the previous instance is
> done executing - you need a fence for that
> 2- on some drivers/platforms, submit is supposed to be expensive (hundreds of
µs
> to ms), so creating a small command buffer has a lot of overhead. A
possibility
> is to submit multiple command buffers at once, but I think I'd let the caller
do
> the transition in the main command buffer instead.

Done. To make how this is suppose to be done clearer, I went ahead and
implemented vulkan render pass to do this. Therefore, all of this code is now
unnecessary. The tests also test the render pass functionality now.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:99: // do the synchronization for us.
On 2016/03/11 03:07:53, piman wrote:
> vkAcquireNextImageKHR needs to be able to block, regardless of semaphores.
> Basically, even with an infinitely fast GPU, the system may be asynchronous,
> e.g. SurfaceFlinger on Android, or X11, etc.
> 
> What this means is that you probably want an infinite timeout, at least to
start
> with, though maybe a polling loop can be useful at a higher level (can punt
for
> now).
> 
> Obviously, you'd want to do the vkAcquireNextImageKHR as late as possible,
> rather than right after the Present, to minimize waiting time. For example if
> the compositor is vsync driven, you wouldn't want to acquire until we start
> drawing the next frame. It may make sense to separate Acquire from Present
into
> 2 functions.

Done.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:116: ScopedSingleUseCommandBufferRecorder
recorder(
On 2016/03/11 03:07:52, piman wrote:
> Same points as above.

Done.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:138: // here.
On 2016/03/11 03:07:53, piman wrote:
> This should be ok (note that I don't think you need it per above, though).
It's
> ok to wait on a semaphore before it's signalled (note that it's the queue
> waiting, not the CPU), that's more or less the point of semaphores. There are
> some constraints around reuse, though - if the semaphore was signalled and
> waited on a different queue, it would be invalid to queue another signal
unless
> you can guarantee the second signal will happen after the wait (e.g. with
> another semaphore, or a fence). In your case, you're fine though (single
queue).

Acknowledged.

https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:159: std::min(2u, surface_caps.maxImageCount);
On 2016/03/11 03:07:53, piman wrote:
> It does have to be more than surface_caps.minImageCount though (that could be
3+
> in theory - probably not in practice for FIFO mode).

Done.

https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:181: image_memory_barrier.newLayout =
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
I changed this back to PRESENT. The initial UNDEFINED->PRESENT transition is a
special case. This makes it so the render pass can just always follow the
transition PRESENT->COLOR->PRESENT. As an example, the render pass test does
this.

https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:251: result = vkQueueWaitIdle(GetVulkanQueue());
I added this back in since we no longer do layout transitions during swap so
this extra command buffer is only used for setup. Eventually we would probably
have it destroy itself when it's finished through callbacks?

David Yen

Description was changed from ========== Added initial implementation of a VulkanSurface. An initial VulkanSurface implementation ...

4 years, 9 months ago (2016-03-21 18:41:34 UTC) #11

jchen10

jie.a.chen@intel.com changed reviewers: + jie.a.chen@intel.com

4 years, 9 months ago (2016-03-23 07:12:56 UTC) #12

jchen10

https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_command_buffer.cc#newcode93 gpu/vulkan/vulkan_command_buffer.cc:93: vkResetCommandPool(GetVulkanDevice(), command_pool_->handle(), Would it be impertinent to ask whether ...

4 years, 9 months ago (2016-03-23 07:12:57 UTC) #13

David Yen

https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_command_buffer.cc#newcode93 gpu/vulkan/vulkan_command_buffer.cc:93: vkResetCommandPool(GetVulkanDevice(), command_pool_->handle(), On 2016/03/23 07:12:57, jchen10 wrote: > Would ...

4 years, 9 months ago (2016-03-23 18:19:29 UTC) #14

piman

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulkan_test.cc File gpu/vulkan/tests/vulkan_test.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulkan_test.cc#newcode115 gpu/vulkan/tests/vulkan_test.cc:115: render_pass.Destroy(); You need to wait for the RenderPass to ...

4 years, 9 months ago (2016-03-25 02:01:27 UTC) #15

David Yen

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulkan_test.cc File gpu/vulkan/tests/vulkan_test.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulkan_test.cc#newcode115 gpu/vulkan/tests/vulkan_test.cc:115: render_pass.Destroy(); On 2016/03/25 02:01:26, piman wrote: > You need ...

4 years, 9 months ago (2016-03-25 17:36:14 UTC) #16

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka...
File gpu/vulkan/tests/vulkan_test.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka...
gpu/vulkan/tests/vulkan_test.cc:115: render_pass.Destroy();
On 2016/03/25 02:01:26, piman wrote:
> You need to wait for the RenderPass to not be in use before destroying.
> Same with the surface/swapchain below.
> 
> vkQueueWaitIdle is a big hammer, but it's simple enough for tests.

Done. I added a Finish() call to VulkanSurface which does this.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
File gpu/vulkan/vulkan_render_pass.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment =
&depth_stencil_ref;
On 2016/03/25 02:01:26, piman wrote:
> depth_stencil_ref is a local variable. Its pointer is invalid after the loop
> iteration (in practice, likely each depth_stencil_ref for each iteration is at
> the same address and you just overwrite data).
> 
> Also, if you don't have a D/S attachment, you want NULL here, not a
> VkAttachmentReference that points to VK_ATTACHMENT_UNUSED

Fixed the reference. According to the docs both work, so just initializing it to
VK_ATTACHMENT_UNUSED and always using it seems easier here:

7.1: Setting the attachment index to VK_ATTACHMENT_UNUSED or leaving this
pointer as NULL indicates that no depth/stencil attachment will be used in the
subpass.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
File gpu/vulkan/vulkan_render_pass.h (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:46: // The stencil ops are only sued for
IMAGE_TYPE_STENCIL and
On 2016/03/25 02:01:26, piman wrote:
> nit: sued->used

Done.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views;  // If more than
1, must match swap chain count.
On 2016/03/25 02:01:26, piman wrote:
> I'm not exactly sure why you need this, or what it means really. You can only
> have 1 image view per attachment on a given framebuffer.

So this attachment data is meant to used to create all the framebuffers
internally. A framebuffer gets created for every image in the swap chain. If
someone wants to have a single image view for every frame buffer then they would
use a single one, if someone wants to have an image view for every corresponding
swap chain image then they would have multiple (the same amount as the swap
chain image count).

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain*
swap_chain,
On 2016/03/25 02:01:26, piman wrote:
> Do you have to pass a SwapChain here?
> What if I want to make a render pass for offscreen buffers?

We will probably have to create another Initialize function for that and error
if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be thought
about too. Here I am automatically creating a frame buffer for every swap chain
image, would we only want 1 framebuffer for offscreen buffers? or have them be
able to specify the number of frame buffers?

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:95: bool exec_inline);
On 2016/03/25 02:01:26, piman wrote:
> This is ok as a first step, but I think eventually we'll want to separate the
> RenderPass from the RenderPass instance. After the RenderPass is created, it's
> immutable, and that's a useful property because it's trivial to share across
> threads. If it can be mutable (current_sub_pass_/executing_/execution_type_)
> then you need locking and you lose the benefits of threading.
> One example could be to have BeginRenderPass return a RenderPassInstance that
> contains that state, and either NextSubPass/EndRenderPass are methods on it,
or
> they stay here but take the RenderPassInstance as a parameter.

Hmm that is interesting. Acknowledged.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:50: &current_image_data->render_semaphore)) {
On 2016/03/25 02:01:27, piman wrote:
> You need a fence here, to know when you can reuse the command buffer. the next
> time around. One per image, and you probably want to wait on that fence after
> vkAcquireNextImageKHR returns the same image.

Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't that
mean when the presentation engine releases an image the command buffer is also
done being used and can be reused? Or are they not synchronized that way?

Otherwise, would just waiting on the present_semaphore be enough? I think
previously I had this wait on the present_semaphore but after I added the
UINT64_MAX timeout I removed it since the present_semaphore didn't seem
necessary anymore.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us.
On 2016/03/25 02:01:26, piman wrote:
> We'll need a fence for throttling, though. Without a fence, this could queue
an
> infinite number of frames, faster than the GPU or display can consume. The
fence
> suggested above may be enough.

Wouldn't this call throttle by blocking if the next image is still being owned
by the presentation engine?

piman

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_render_pass.cc File gpu/vulkan/vulkan_render_pass.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_render_pass.cc#newcode266 gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment = &depth_stencil_ref; On 2016/03/25 17:36:14, David Yen wrote: ...

4 years, 9 months ago (2016-03-25 21:13:22 UTC) #17

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
File gpu/vulkan/vulkan_render_pass.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment =
&depth_stencil_ref;
On 2016/03/25 17:36:14, David Yen wrote:
> On 2016/03/25 02:01:26, piman wrote:
> > depth_stencil_ref is a local variable. Its pointer is invalid after the loop
> > iteration (in practice, likely each depth_stencil_ref for each iteration is
at
> > the same address and you just overwrite data).
> > 
> > Also, if you don't have a D/S attachment, you want NULL here, not a
> > VkAttachmentReference that points to VK_ATTACHMENT_UNUSED
> 
> Fixed the reference. According to the docs both work, so just initializing it
to
> VK_ATTACHMENT_UNUSED and always using it seems easier here:
> 
> 7.1: Setting the attachment index to VK_ATTACHMENT_UNUSED or leaving this
> pointer as NULL indicates that no depth/stencil attachment will be used in the
> subpass.

Ah, ok, you're right.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
File gpu/vulkan/vulkan_render_pass.h (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views;  // If more than
1, must match swap chain count.
On 2016/03/25 17:36:14, David Yen wrote:
> On 2016/03/25 02:01:26, piman wrote:
> > I'm not exactly sure why you need this, or what it means really. You can
only
> > have 1 image view per attachment on a given framebuffer.
> 
> So this attachment data is meant to used to create all the framebuffers
> internally. A framebuffer gets created for every image in the swap chain. If
> someone wants to have a single image view for every frame buffer then they
would
> use a single one, if someone wants to have an image view for every
corresponding
> swap chain image then they would have multiple (the same amount as the swap
> chain image count).

I think I understand. It's hard to imagine a use case for that - presumably we
never use different swap chain images concurrently in the same frame, so any
temporary render target can just be reused across frames. We certainly don't
have anything like that in the compositor today.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain*
swap_chain,
On 2016/03/25 17:36:14, David Yen wrote:
> On 2016/03/25 02:01:26, piman wrote:
> > Do you have to pass a SwapChain here?
> > What if I want to make a render pass for offscreen buffers?
> 
> We will probably have to create another Initialize function for that and error
> if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be
thought
> about too. Here I am automatically creating a frame buffer for every swap
chain
> image, would we only want 1 framebuffer for offscreen buffers? or have them be
> able to specify the number of frame buffers?

I don't see a use case to do more than one.

Well, we may want to decouple the framebuffer from the render pass. The render
pass affects pipeline compilation, and we definitely want to keep the same
pipelines for all compositor passes to avoid a combinatorial explosion. But
that's orthogonal. We can do that as a follow up (similar to RP Instance).

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:50: &current_image_data->render_semaphore)) {
On 2016/03/25 17:36:14, David Yen wrote:
> On 2016/03/25 02:01:27, piman wrote:
> > You need a fence here, to know when you can reuse the command buffer. the
next
> > time around. One per image, and you probably want to wait on that fence
after
> > vkAcquireNextImageKHR returns the same image.
> 
> Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't
that
> mean when the presentation engine releases an image the command buffer is also
> done being used and can be reused? Or are they not synchronized that way?

That's not enough. The blocking that vkAcquireNextImageKHR is actually
potentially unrelated to GPU<->CPU communication, and has to do with the
app<->compositor communication.

For example if the presentation engine uses a blit model (i.e. it copies the
data from the presentation image), you can conceptually see vkQueuePresentKHR as
just enqueuing a blit, and after that the image is available again.

IOW, the following API sequence:
Submit(queue, null /*sema*/, command_buffer1, sema1);
QueuePresent(queue, sema1, swapchain, image_index);
AcquireNextImage(device, swap_chain, sema2, &image_index);
Submit(queue, sema2, command_buffer2, null /*sema*/);

Is really equivalent, under the hood, to:
Submit(queue, null /*sema*/, command_buffer1, sema1);
Submit(queue, sema1, blit_command_buffer, sema2);
Submit(queue, sema2, command_buffer2, null /*sema*/);

No fence needs to be involved. That would mean that AcquireNextImage can return
as soon as blit_command_buffer is submitted, with the same image, but that
doesn't mean that command_buffer1 is done executing, or any previous command
buffer, really. In fact the presentation engine may never even wait on the GPU.

> Otherwise, would just waiting on the present_semaphore be enough? I think
> previously I had this wait on the present_semaphore but after I added the
> UINT64_MAX timeout I removed it since the present_semaphore didn't seem
> necessary anymore.

You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a
VkSemaphore from the CPU. A semaphore is more like an ordering primitive
(similar to chrome sync tokens).

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us.
On 2016/03/25 17:36:14, David Yen wrote:
> On 2016/03/25 02:01:26, piman wrote:
> > We'll need a fence for throttling, though. Without a fence, this could queue
> an
> > infinite number of frames, faster than the GPU or display can consume. The
> fence
> > suggested above may be enough.
> 
> Wouldn't this call throttle by blocking if the next image is still being owned
> by the presentation engine?

The presentation engine may very well never wait on the GPU, so it's not enough
to throttle.

David Yen

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_render_pass.h File gpu/vulkan/vulkan_render_pass.h (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_render_pass.h#newcode52 gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views; // If more than 1, must match ...

4 years, 9 months ago (2016-03-26 00:33:30 UTC) #18

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
File gpu/vulkan/vulkan_render_pass.h (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views;  // If more than
1, must match swap chain count.
On 2016/03/25 21:13:22, piman wrote:
> On 2016/03/25 17:36:14, David Yen wrote:
> > On 2016/03/25 02:01:26, piman wrote:
> > > I'm not exactly sure why you need this, or what it means really. You can
> only
> > > have 1 image view per attachment on a given framebuffer.
> > 
> > So this attachment data is meant to used to create all the framebuffers
> > internally. A framebuffer gets created for every image in the swap chain. If
> > someone wants to have a single image view for every frame buffer then they
> would
> > use a single one, if someone wants to have an image view for every
> corresponding
> > swap chain image then they would have multiple (the same amount as the swap
> > chain image count).
> 
> I think I understand. It's hard to imagine a use case for that - presumably we
> never use different swap chain images concurrently in the same frame, so any
> temporary render target can just be reused across frames. We certainly don't
> have anything like that in the compositor today.

Ok, removed.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend...
gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain*
swap_chain,
On 2016/03/25 21:13:22, piman wrote:
> On 2016/03/25 17:36:14, David Yen wrote:
> > On 2016/03/25 02:01:26, piman wrote:
> > > Do you have to pass a SwapChain here?
> > > What if I want to make a render pass for offscreen buffers?
> > 
> > We will probably have to create another Initialize function for that and
error
> > if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be
> thought
> > about too. Here I am automatically creating a frame buffer for every swap
> chain
> > image, would we only want 1 framebuffer for offscreen buffers? or have them
be
> > able to specify the number of frame buffers?
> 
> I don't see a use case to do more than one.
> 
> Well, we may want to decouple the framebuffer from the render pass. The render
> pass affects pipeline compilation, and we definitely want to keep the same
> pipelines for all compositor passes to avoid a combinatorial explosion. But
> that's orthogonal. We can do that as a follow up (similar to RP Instance).

Acknowledged.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:50: &current_image_data->render_semaphore)) {
On 2016/03/25 21:13:22, piman wrote:
> On 2016/03/25 17:36:14, David Yen wrote:
> > On 2016/03/25 02:01:27, piman wrote:
> > > You need a fence here, to know when you can reuse the command buffer. the
> next
> > > time around. One per image, and you probably want to wait on that fence
> after
> > > vkAcquireNextImageKHR returns the same image.
> > 
> > Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't
> that
> > mean when the presentation engine releases an image the command buffer is
also
> > done being used and can be reused? Or are they not synchronized that way?
> 
> That's not enough. The blocking that vkAcquireNextImageKHR is actually
> potentially unrelated to GPU<->CPU communication, and has to do with the
> app<->compositor communication.
> 
> For example if the presentation engine uses a blit model (i.e. it copies the
> data from the presentation image), you can conceptually see vkQueuePresentKHR
as
> just enqueuing a blit, and after that the image is available again.
> 
> IOW, the following API sequence:
> Submit(queue, null /*sema*/, command_buffer1, sema1);
> QueuePresent(queue, sema1, swapchain, image_index);
> AcquireNextImage(device, swap_chain, sema2, &image_index);
> Submit(queue, sema2, command_buffer2, null /*sema*/);
> 
> Is really equivalent, under the hood, to:
> Submit(queue, null /*sema*/, command_buffer1, sema1);
> Submit(queue, sema1, blit_command_buffer, sema2);
> Submit(queue, sema2, command_buffer2, null /*sema*/);
> 
> No fence needs to be involved. That would mean that AcquireNextImage can
return
> as soon as blit_command_buffer is submitted, with the same image, but that
> doesn't mean that command_buffer1 is done executing, or any previous command
> buffer, really. In fact the presentation engine may never even wait on the
GPU.
> 
> > Otherwise, would just waiting on the present_semaphore be enough? I think
> > previously I had this wait on the present_semaphore but after I added the
> > UINT64_MAX timeout I removed it since the present_semaphore didn't seem
> > necessary anymore.
> 
> You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a
> VkSemaphore from the CPU. A semaphore is more like an ordering primitive
> (similar to chrome sync tokens).

Ok I see. I added the submission fence. Since the AcquireNextImage call also
enqueues commands, that means we still would need to wait on the present
semaphore right? Waiting on the submission fence is not enough to guarantee the
present semaphore has signaled yet.


I wonder if we should just make this a feature in VulkanCommandBuffer, where we
can set a VkFence which always gets waited upon before submission. Seems like we
would always need something like this. We can add an check for the fence status
too for asynchronous usage patterns.

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us.
On 2016/03/25 21:13:22, piman wrote:
> On 2016/03/25 17:36:14, David Yen wrote:
> > On 2016/03/25 02:01:26, piman wrote:
> > > We'll need a fence for throttling, though. Without a fence, this could
queue
> > an
> > > infinite number of frames, faster than the GPU or display can consume. The
> > fence
> > > suggested above may be enough.
> > 
> > Wouldn't this call throttle by blocking if the next image is still being
owned
> > by the presentation engine?
> 
> The presentation engine may very well never wait on the GPU, so it's not
enough
> to throttle.

Acknowledged.

piman

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap_chain.cc File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap_chain.cc#newcode50 gpu/vulkan/vulkan_swap_chain.cc:50: &current_image_data->render_semaphore)) { On 2016/03/26 00:33:30, David Yen wrote: > ...

4 years, 9 months ago (2016-03-26 01:38:01 UTC) #19

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:50: &current_image_data->render_semaphore)) {
On 2016/03/26 00:33:30, David Yen wrote:
> On 2016/03/25 21:13:22, piman wrote:
> > On 2016/03/25 17:36:14, David Yen wrote:
> > > On 2016/03/25 02:01:27, piman wrote:
> > > > You need a fence here, to know when you can reuse the command buffer.
the
> > next
> > > > time around. One per image, and you probably want to wait on that fence
> > after
> > > > vkAcquireNextImageKHR returns the same image.
> > > 
> > > Since vkAcquireNextImageKHR is blocking on the presentation engine,
doesn't
> > that
> > > mean when the presentation engine releases an image the command buffer is
> also
> > > done being used and can be reused? Or are they not synchronized that way?
> > 
> > That's not enough. The blocking that vkAcquireNextImageKHR is actually
> > potentially unrelated to GPU<->CPU communication, and has to do with the
> > app<->compositor communication.
> > 
> > For example if the presentation engine uses a blit model (i.e. it copies the
> > data from the presentation image), you can conceptually see
vkQueuePresentKHR
> as
> > just enqueuing a blit, and after that the image is available again.
> > 
> > IOW, the following API sequence:
> > Submit(queue, null /*sema*/, command_buffer1, sema1);
> > QueuePresent(queue, sema1, swapchain, image_index);
> > AcquireNextImage(device, swap_chain, sema2, &image_index);
> > Submit(queue, sema2, command_buffer2, null /*sema*/);
> > 
> > Is really equivalent, under the hood, to:
> > Submit(queue, null /*sema*/, command_buffer1, sema1);
> > Submit(queue, sema1, blit_command_buffer, sema2);
> > Submit(queue, sema2, command_buffer2, null /*sema*/);
> > 
> > No fence needs to be involved. That would mean that AcquireNextImage can
> return
> > as soon as blit_command_buffer is submitted, with the same image, but that
> > doesn't mean that command_buffer1 is done executing, or any previous command
> > buffer, really. In fact the presentation engine may never even wait on the
> GPU.
> > 
> > > Otherwise, would just waiting on the present_semaphore be enough? I think
> > > previously I had this wait on the present_semaphore but after I added the
> > > UINT64_MAX timeout I removed it since the present_semaphore didn't seem
> > > necessary anymore.
> > 
> > You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a
> > VkSemaphore from the CPU. A semaphore is more like an ordering primitive
> > (similar to chrome sync tokens).
> 
> Ok I see. I added the submission fence. Since the AcquireNextImage call also
> enqueues commands, that means we still would need to wait on the present
> semaphore right? Waiting on the submission fence is not enough to guarantee
the
> present semaphore has signaled yet.

Correct.

> I wonder if we should just make this a feature in VulkanCommandBuffer, where
we
> can set a VkFence which always gets waited upon before submission. Seems like
we
> would always need something like this. We can add an check for the fence
status
> too for asynchronous usage patterns.

I think associating the VkFence with the VkCommandBuffer is a good idea, but
keep in mind that you want to wait for it before reusing the command buffer, not
before submitting it.

https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap...
File gpu/vulkan/vulkan_swap_chain.cc (right):

https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:51: // Make sure the current command buffer is
not in use.
You need to do that before resetting/recording the command buffer, though...
That's why I suggested doing that after vkAcquireNextImageKHR.

https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap...
gpu/vulkan/vulkan_swap_chain.cc:231: &image_data->submission_fence);
Maybe you want to create these ones with VK_FENCE_CREATE_SIGNALED_BIT for the
first frame. Otherwise your first vkWaitForFences in SwapBuffers will block.

(but obviously, not for layout_fence since you submit it before waiting for it).

jchen10

I have just started to learn Vulkan. I roughly went through the Spec and samples ...

4 years, 8 months ago (2016-03-28 07:47:23 UTC) #20

piman

https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap_chain.cc File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap_chain.cc#newcode191 gpu/vulkan/vulkan_swap_chain.cc:191: image_memory_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; On 2016/03/28 07:47:23, jchen10 wrote: > ...

4 years, 8 months ago (2016-03-28 16:48:48 UTC) #21

David Yen

https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap_chain.cc File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap_chain.cc#newcode51 gpu/vulkan/vulkan_swap_chain.cc:51: // Make sure the current command buffer is not ...

4 years, 8 months ago (2016-03-28 18:31:12 UTC) #22

David Yen

On 2016/03/28 07:47:23, jchen10 wrote: > I have just started to learn Vulkan. I roughly ...

4 years, 8 months ago (2016-03-28 18:37:28 UTC) #23

David Yen

Whoops, forgot to upload the patch. Patch #15 contains the changes.

4 years, 8 months ago (2016-03-29 16:54:17 UTC) #24

piman

One last thing and then I think it's good to go. https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): ...

4 years, 8 months ago (2016-03-29 19:09:21 UTC) #25

David Yen

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc#newcode45 gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 19:09:21, piman wrote: > ...

4 years, 8 months ago (2016-03-29 19:27:31 UTC) #26

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags =
VK_FENCE_CREATE_SIGNALED_BIT;
On 2016/03/29 19:09:21, piman wrote:
> I don't think you want it to be created signaled, if you initialize in
> "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that case
-
> so you'll have a signalled fence when you submit (which is illegal, and
> incorrect anyway).

That's interesting that my tests passed. Maybe validation would have caught it?
I will add validation layers on the next patch.

Having it signaled makes some things very convenient though (for example being
able to always call SubmissionFinished() such as my DCHECK in the destroy).
Maybe we should reset the fence before submission instead? That would make the
behavior of Wait() and SubmissionFinished() slightly different, but I can change
the comments to reflect that.

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:87: VkResult result = vkQueueSubmit(queue,
1, &submit_info, submission_fence_);
On 2016/03/29 19:09:21, piman wrote:
> Thought for later (nothing to do in this CL, it's big enough as it is): we may
> want at some point to submit several command buffers at once - the submit
> overhead is high on at least some platforms (Windows in particular). When we
get
> to that, we'll need to share the fence across command buffers. It could be
> something along the lines of: the fence is owned by the image data in the swap
> chain, and it's given to the command buffer after the multi-submit.

Would we want to submit multiple command buffers which setup render passes?
Otherwise we can submit multiple secondary command buffers within a single
primary command buffer right?

piman

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc#newcode45 gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 19:27:31, David Yen wrote: ...

4 years, 8 months ago (2016-03-29 20:42:39 UTC) #27

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags =
VK_FENCE_CREATE_SIGNALED_BIT;
On 2016/03/29 19:27:31, David Yen wrote:
> On 2016/03/29 19:09:21, piman wrote:
> > I don't think you want it to be created signaled, if you initialize in
> > "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that
case
> -
> > so you'll have a signalled fence when you submit (which is illegal, and
> > incorrect anyway).
> 
> That's interesting that my tests passed. Maybe validation would have caught
it?
> I will add validation layers on the next patch.

I think they would have caught it.

> Having it signaled makes some things very convenient though (for example being
> able to always call SubmissionFinished() such as my DCHECK in the destroy).
> Maybe we should reset the fence before submission instead? That would make the
> behavior of Wait() and SubmissionFinished() slightly different, but I can
change
> the comments to reflect that.

Another option is to add one more state, RECORD_TYPE_SUBMITTED. Wait could
return immediately except in that state, and SubmissionFinished would return
true. Either one would transition to RECORD_TYPE_DIRTY after the fence is
passed, and reset the fence. ResetIfDirty could call Wait(infinite) and then
reset the command buffer if dirty.

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:87: VkResult result = vkQueueSubmit(queue,
1, &submit_info, submission_fence_);
On 2016/03/29 19:27:31, David Yen wrote:
> On 2016/03/29 19:09:21, piman wrote:
> > Thought for later (nothing to do in this CL, it's big enough as it is): we
may
> > want at some point to submit several command buffers at once - the submit
> > overhead is high on at least some platforms (Windows in particular). When we
> get
> > to that, we'll need to share the fence across command buffers. It could be
> > something along the lines of: the fence is owned by the image data in the
swap
> > chain, and it's given to the command buffer after the multi-submit.
> 
> Would we want to submit multiple command buffers which setup render passes?

Possibly. For example, we'll need one render pass per tile we raster. It could
make sense to create those on multiple threads, which will require separate
command buffers.

> Otherwise we can submit multiple secondary command buffers within a single
> primary command buffer right?

The other things are non-render things, such as texture upload, which don't go
into render passes.

David Yen

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_command_buffer.cc#newcode45 gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 20:42:39, piman wrote: > ...

4 years, 8 months ago (2016-03-29 21:26:43 UTC) #28

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
File gpu/vulkan/vulkan_command_buffer.cc (right):

https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm...
gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags =
VK_FENCE_CREATE_SIGNALED_BIT;
On 2016/03/29 20:42:39, piman wrote:
> On 2016/03/29 19:27:31, David Yen wrote:
> > On 2016/03/29 19:09:21, piman wrote:
> > > I don't think you want it to be created signaled, if you initialize in
> > > "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that
> case
> > -
> > > so you'll have a signalled fence when you submit (which is illegal, and
> > > incorrect anyway).
> > 
> > That's interesting that my tests passed. Maybe validation would have caught
> it?
> > I will add validation layers on the next patch.
> 
> I think they would have caught it.
> 
> > Having it signaled makes some things very convenient though (for example
being
> > able to always call SubmissionFinished() such as my DCHECK in the destroy).
> > Maybe we should reset the fence before submission instead? That would make
the
> > behavior of Wait() and SubmissionFinished() slightly different, but I can
> change
> > the comments to reflect that.
> 
> Another option is to add one more state, RECORD_TYPE_SUBMITTED. Wait could
> return immediately except in that state, and SubmissionFinished would return
> true. Either one would transition to RECORD_TYPE_DIRTY after the fence is
> passed, and reset the fence. ResetIfDirty could call Wait(infinite) and then
> reset the command buffer if dirty.

It kind of is already like that already, but because of the state tracking we
have 2 for the submission state: RECORD_TYPE_RECORDED for multi use and DIRTY
for single use, although DIRTY is also used when the command buffer is cleared.
I think keeping the current state is simplest to understand and only adds one
extra reset for the very first submission. The fence reset would just be a
client side function right?

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1776453003/310001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1776453003/310001

4 years, 8 months ago (2016-03-30 21:25:48 UTC) #31

commit-bot: I haz the power

Description was changed from ========== Added initial implementation of Vulkan Render Passes. An initial implementation ...

4 years, 8 months ago (2016-03-30 22:45:54 UTC) #32

commit-bot: I haz the power

Description was changed from ========== Added initial implementation of Vulkan Render Passes. An initial implementation ...

4 years, 8 months ago (2016-03-30 22:47:14 UTC) #34

commit-bot: I haz the power

4 years, 8 months ago (2016-03-30 22:47:16 UTC) #35

Message was sent while issue was closed.

Patchset 17 (id:??) landed as
https://crrev.com/4ec04eccd55ebfabc916eae2f417e5fa483e1c3e
Cr-Commit-Position: refs/heads/master@{#384125}

Issue 1776453003: Added initial implementation of Vulkan Render Passes. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Added command buffer files #

Patch Set 3 : format #

Patch Set 4 : Ensure vulkan handles all initialized to null, check destruction in destructor #

Patch Set 5 : Separated out command pool, implemented SwapBuffer() #

Patch Set 6 : rebase #

Patch Set 7 : Fix SwapBuffers() present layout, test in unittests instead of injections #

Patch Set 8 : Layout transitions done through render passes #

Patch Set 9 : Use vkResetCommandBuffer for command buffer resets #

Patch Set 10 : removed unnecessary dependency #

Patch Set 11 : applied fixes #

Patch Set 12 : Fixed depth stencil reference #

Patch Set 13 : Added submission fence #

Patch Set 14 : Put back in present semaphore #

Patch Set 15 : added fence to command buffer #

Patch Set 16 : Moved reset fences to submit. #

Patch Set 17 : Adding logging/macros headers #

Messages