|
|
DescriptionAdded initial implementation of Vulkan Render Passes.
An initial implementation for various Vulkan rendering objects
has been done here. I tried to do the minimal amount of work
which would still compile/run but that ended up being quite a bit
of work.
We have now have Vulkan implementations of command buffers, swap
chains, render passes, and surfaces. A Vulkan test has also been
added which tests that all these functionalities run and return
expected return values.
BUG=582558
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel
Committed: https://crrev.com/4ec04eccd55ebfabc916eae2f417e5fa483e1c3e
Cr-Commit-Position: refs/heads/master@{#384125}
Patch Set 1 #Patch Set 2 : Added command buffer files #Patch Set 3 : format #Patch Set 4 : Ensure vulkan handles all initialized to null, check destruction in destructor #
Total comments: 71
Patch Set 5 : Separated out command pool, implemented SwapBuffer() #Patch Set 6 : rebase #
Total comments: 2
Patch Set 7 : Fix SwapBuffers() present layout, test in unittests instead of injections #
Total comments: 16
Patch Set 8 : Layout transitions done through render passes #
Total comments: 4
Patch Set 9 : Use vkResetCommandBuffer for command buffer resets #Patch Set 10 : removed unnecessary dependency #
Total comments: 27
Patch Set 11 : applied fixes #Patch Set 12 : Fixed depth stencil reference #Patch Set 13 : Added submission fence #Patch Set 14 : Put back in present semaphore #
Total comments: 7
Patch Set 15 : added fence to command buffer #
Total comments: 7
Patch Set 16 : Moved reset fences to submit. #Patch Set 17 : Adding logging/macros headers #
Dependent Patchsets: Messages
Total messages: 35 (7 generated)
Description was changed from ========== Added initial implementation of a VulkanSurface. An initial VulkanSurface implementation has been done here, the surface is now platform independent as most of the implementation should be shared as a VulkanWSISurface. The only exception is the creation which is separated out by #ifdefs. Additionally, each surface contains the swap chain so that it can swap buffers itself. Each buffer also contains a command buffer which will eventually be used to submit commands for each frame. BUG=582558 ========== to ========== Added initial implementation of a VulkanSurface. An initial VulkanSurface implementation has been done here, the surface is now platform independent as most of the implementation should be shared as a VulkanWSISurface. The only exception is the creation which is separated out by #ifdefs. Additionally, each surface contains the swap chain so that it can swap buffers itself. Each buffer also contains a command buffer which will eventually be used to submit commands for each frame. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ==========
dyen@chromium.org changed reviewers: + piman@chromium.org
Sorry for the biggish change, I kind of had to put all of it together just to see that Vulkan would actually succeed in all the resource creation calls. I added a DCHECK in gpu_command_buffer_stub which just verifies the surface and swap chains all get created properly. This passes using a custom built loader which enables xlib. This should take care of most of the boiler plate code for Vulcan so we can start rendering something soon into the window. PTAL.
https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.h (right): https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.h:270: #endif I don't think we should be touching GpuCommandBufferStub. In the Vulkan world, we wouldn't be using GpuCommandBufferStubs, nor the GPU process in the way we do with GL - if at all. Instead, we should have the OutputSurface directly connect to the VulkanSurface. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:16: DCHECK_EQ(reinterpret_cast<VkCommandBuffer>(VK_NULL_HANDLE), command_buffer_); nit: static_cast https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:31: LOG(ERROR) << "vkAllocateCommandBuffers() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:54: LOG(ERROR) << "vkQueueSubmit() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... File gpu/vulkan/vulkan_command_buffer.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.h:24: bool Submit(); I would prefer being explicit here and passing the queue. It's likely we'll want to take advantage of a separate transfer queue if available. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... File gpu/vulkan/vulkan_implementation.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:154: command_pool_create_info.queueFamilyIndex = vk_queue_index; If we reset command buffers individually, we need to set the VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:159: LOG(ERROR) << "vkCreateCommandPool() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:172: VkCommandPool vk_command_pool; We'll definitely need more than one pool - typically one per thread, otherwise the mutual exclusion requirement mean command buffer recording needs to be serialized... So I think the pool usage should not be implicit, and instead explicitly created and managed by the higher levels. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:214: command_buffer->Initialize(); don't we need to handle the return value? At a high level we will need to handle errors one way or another, and I don't think this is something that we'll want to wait until the last minute, I think we want to develop patterns early on. There's a few errors that will be fairly common (as in, many calls may return them, even though they shouldn't be common in practice): - VK_ERROR_OUT_OF_HOST_MEMORY, we can probably ignore, because we should abort even before we get there, because malloc itself will abort. - VK_ERROR_OUT_OF_DEVICE_MEMORY, we will need to handle somehow. It's not acceptable to crash the browser in this condition, but we may be able to handle similarly to lost device as a first step, but we may need finer grain management (e.g. if it happens when allocating resource memory for buffers/images, maybe we can free speculative resources instead). (those 2 are pretty much always allowed in any call that returns a VkResult) - VK_ERROR_DEVICE_LOST, we need to handle by aborting the current frame, destroying everything, and recreating everything (similar to GL), and if it fails then, fall back to software. We can punt the whole retry logic for now, but at least we need to surface the error so that we can handle this correctly. There's a few more topical errors: - VK_ERROR_MEMORY_MAP_FAILED, we can probably handle like VK_ERROR_OUT_OF_HOST_MEMORY, i.e. terminate, though we'll have to do that explicitly. - VK_ERROR_TOO_MANY_OBJECTS, depends on context (e.g. if on device creation, this is fatal and must fallback to software (or GL), if on device memory allocation we should handle like VK_ERROR_OUT_OF_DEVICE_MEMORY). There's some places where it could happen in theory (e.g. vkCreateSampler), but we should make sure we don't run into it in the first place (i.e. pooling samplers to stay below the limit). - VK_ERROR_INITIALIZATION_FAILED, on instance/device creation, this is fatal and we must fallback to software - VK_ERROR_FORMAT_NOT_SUPPORTED is only on vkGetPhysicalDeviceImageFormatProperties, which I don't know if we'll actually need - VK_ERROR_LAYER_NOT_PRESENT/VK_ERROR_EXTENSION_NOT_PRESENT/VK_ERROR_FEATURE_NOT_PRESENT we should avoid in the first place - VK_ERROR_INCOMPATIBLE_DISPLAY_KHR shouldn't happen (only for vkCreateSharedSwapchains which we shouldn't use) - VK_ERROR_NATIVE_WINDOW_IN_USE_KHR shouldn't happen - VK_ERROR_SURFACE_LOST_KHR and VK_ERROR_OUT_OF_DATE_KHR have to do with WSI presentation and will need special handling (depending on conditions, from ignore to recreate swapchain, or possibly lost device). https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... File gpu/vulkan/vulkan_surface.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:63: VulkanWSISurface(gfx::AcceleratedWidget window) : window_(window) {} nit: explicit https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:87: LOG(ERROR) << "vkGetPhysicalDeviceSurfaceFormatsKHR() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:113: surface_format_.format = VK_FORMAT_B8G8R8A8_UNORM; There's no guarantee that this is supported. If we don't find any format that we can handle, we have to fail. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:123: LOG(ERROR) << "vkGetPhysicalDeviceSurfaceCapabilitiesKHR() failed: " nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:152: LOG(ERROR) << "Failed to submit command buffer commands."; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:157: // and keeping around the command buffer until it's done. We'll definitely want some facility to do that - delaying destruction of command buffers (and other objects) until a fence passes - that would be useful in many places where we need transient things (e.g. texture uploads). https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:160: LOG(ERROR) << "vkQueueWaitIdle() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:185: class VulkanBufferSurface : public VulkanSurface { I don't think this is useful - you can't use a VkBuffer as a render target. If we wanted an offscreen VulkanSurface, it'd be backed by one (or more) VkImage, but as mentioned above, I don't think this is necessary at this level of abstraction. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... File gpu/vulkan/vulkan_surface.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.h:24: SURFACE_OSMESA_RGBA, I doubt OSMesa will ever support Vulkan. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.h:39: // Create a surface for offscreen rendering. I don't think we want this. It was necessary in GL because you need a surface to use a context at all (to make current), but in Vulkan this is not needed - we can use the device without any "current" surface https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:74: std::max(surface_caps.minImageCount + 1, surface_caps.maxImageCount); We shouldn't need +1, because we shouldn't be keeping ownership (on the app side) of more than 1 image at a time. We do need to make sure it's at least 2, but if minImageCount==2 we can use that, we don't need 3: Start: presentation owns 2 images, app owns 0 - acquire image -> image 0 (presentation owns 1, app owns 1) - render to image 0 - swap image 0 (presentation owns 2, app owns 0, image 0 is displayed) - acquire image -> image 1 (presentation owns 1, app owns 1) - render to image 1 - swap image 1 (presentation owns 2, app owns 0, image 1 is displayed, image 0 is available) - acquire image -> image 0 etc. We also definitely don't want to allocate maxImageCount if we don't need them. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:83: swap_chain_create_info.presentMode = VK_PRESENT_MODE_IMMEDIATE_KHR; VK_PRESENT_MODE_FIFO_KHR 1- that's the only one that's guaranteed 2- that's what we want (vsync, no wasted work, similar to eglSwapBuffers with eglSwapInterval set to 1) https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:91: LOG(ERROR) << "vkCreateSwapchainKHR() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:96: // TODO(dyen): Look into how to do this safely while commands in flight. Right, at the very least we need to wait for previous work to complete. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:103: bool VulkanSwapChain::InitializeRenderPass( I don't feel like this belongs to the SwapChain. E.g. the compositor may want a stencil buffer for path rendering, or do multi-pass, or use multisampling. It will also want to control LOAD_OP_CLEAR or LOAD_OP_LOAD on a case-by-case basis. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:114: attachment_description.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; It would be useful to do the transition to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR as part of the render pass, it can be advantageous in case the driver can do it on a per-tile basis. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:135: LOG(ERROR) << "vkCreateRenderPass() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:153: LOG(ERROR) << "vkGetSwapchainImagesKHR(NULL) failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:161: LOG(ERROR) << "vkGetSwapchainImagesKHR(images) failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:177: VK_ACCESS_HOST_WRITE_BIT | VK_ACCESS_TRANSFER_WRITE_BIT; I don't think either of these flags is useful: - we didn't write to the image from the CPU - we didn't write to the image with a "transfer" command (copy/blit). Really for the initial transition, srcAccessMask = 0 / srcStageMask = 0 is ok. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:189: image_view_create_info.components = view_mapping; For a render target view, the VkComponentMapping needs to be identity - VkComponentMapping only applies to sampling. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:192: // Frame buffer creation info. For the same reason as the RenderPass, I think we want the client to manage the Framebuffer(s), because it may need stencil and/or more images for multi-pass or multisampling https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:209: VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 0, 0, nullptr, 0, If the intent is for this to happen before rendering, the destination stage should include VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT - and the destination access mask should include VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:217: LOG(ERROR) << "vkCreateImageView() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:226: LOG(ERROR) << "vkCreateFramebuffer() failed: " << result; nit: DLOG https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... File gpu/vulkan/vulkan_swap_chain.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.h:52: std::vector<scoped_ptr<BufferData>> buffers_; nit: "buffer" can be a bit confusing, because in Vulkan it really means "linear memory used for vertex buffers or constants", not "images"/"render targets".
Thank you for the review. I've changed everything to DLOG(ERROR) from LOG(ERROR), but I wish there were a similar flag as dcheck_always_on for DLOGs, do you know if there is a reason there isn't one? I would just always have that on locally. https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.h (right): https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.h:270: #endif On 2016/03/09 01:25:33, piman wrote: > I don't think we should be touching GpuCommandBufferStub. In the Vulkan world, > we wouldn't be using GpuCommandBufferStubs, nor the GPU process in the way we do > with GL - if at all. Instead, we should have the OutputSurface directly connect > to the VulkanSurface. Still working on this, just fixed everything else for you to review first. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:16: DCHECK_EQ(reinterpret_cast<VkCommandBuffer>(VK_NULL_HANDLE), command_buffer_); On 2016/03/09 01:25:33, piman wrote: > nit: static_cast Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:31: LOG(ERROR) << "vkAllocateCommandBuffers() failed: " << result; On 2016/03/09 01:25:33, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.cc:54: LOG(ERROR) << "vkQueueSubmit() failed: " << result; On 2016/03/09 01:25:33, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... File gpu/vulkan/vulkan_command_buffer.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_comma... gpu/vulkan/vulkan_command_buffer.h:24: bool Submit(); On 2016/03/09 01:25:33, piman wrote: > I would prefer being explicit here and passing the queue. It's likely we'll want > to take advantage of a separate transfer queue if available. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... File gpu/vulkan/vulkan_implementation.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:154: command_pool_create_info.queueFamilyIndex = vk_queue_index; On 2016/03/09 01:25:34, piman wrote: > If we reset command buffers individually, we need to set the > VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag. Interesting, I missed this. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:159: LOG(ERROR) << "vkCreateCommandPool() failed: " << result; On 2016/03/09 01:25:33, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:172: VkCommandPool vk_command_pool; On 2016/03/09 01:25:33, piman wrote: > We'll definitely need more than one pool - typically one per thread, otherwise > the mutual exclusion requirement mean command buffer recording needs to be > serialized... So I think the pool usage should not be implicit, and instead > explicitly created and managed by the higher levels. Done. Separated out into a separate class. I now have the SwapChain own the command pool for the swap change images command buffers. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_imple... gpu/vulkan/vulkan_implementation.cc:214: command_buffer->Initialize(); On 2016/03/09 01:25:33, piman wrote: > don't we need to handle the return value? > > At a high level we will need to handle errors one way or another, and I don't > think this is something that we'll want to wait until the last minute, I think > we want to develop patterns early on. > > There's a few errors that will be fairly common (as in, many calls may return > them, even though they shouldn't be common in practice): > > - VK_ERROR_OUT_OF_HOST_MEMORY, we can probably ignore, because we should abort > even before we get there, because malloc itself will abort. > - VK_ERROR_OUT_OF_DEVICE_MEMORY, we will need to handle somehow. It's not > acceptable to crash the browser in this condition, but we may be able to handle > similarly to lost device as a first step, but we may need finer grain management > (e.g. if it happens when allocating resource memory for buffers/images, maybe we > can free speculative resources instead). > (those 2 are pretty much always allowed in any call that returns a VkResult) > - VK_ERROR_DEVICE_LOST, we need to handle by aborting the current frame, > destroying everything, and recreating everything (similar to GL), and if it > fails then, fall back to software. We can punt the whole retry logic for now, > but at least we need to surface the error so that we can handle this correctly. > > > There's a few more topical errors: > - VK_ERROR_MEMORY_MAP_FAILED, we can probably handle like > VK_ERROR_OUT_OF_HOST_MEMORY, i.e. terminate, though we'll have to do that > explicitly. > - VK_ERROR_TOO_MANY_OBJECTS, depends on context (e.g. if on device creation, > this is fatal and must fallback to software (or GL), if on device memory > allocation we should handle like VK_ERROR_OUT_OF_DEVICE_MEMORY). There's some > places where it could happen in theory (e.g. vkCreateSampler), but we should > make sure we don't run into it in the first place (i.e. pooling samplers to stay > below the limit). > - VK_ERROR_INITIALIZATION_FAILED, on instance/device creation, this is fatal and > we must fallback to software > - VK_ERROR_FORMAT_NOT_SUPPORTED is only on > vkGetPhysicalDeviceImageFormatProperties, which I don't know if we'll actually > need > - > VK_ERROR_LAYER_NOT_PRESENT/VK_ERROR_EXTENSION_NOT_PRESENT/VK_ERROR_FEATURE_NOT_PRESENT > we should avoid in the first place > - VK_ERROR_INCOMPATIBLE_DISPLAY_KHR shouldn't happen (only for > vkCreateSharedSwapchains which we shouldn't use) > - VK_ERROR_NATIVE_WINDOW_IN_USE_KHR shouldn't happen > - VK_ERROR_SURFACE_LOST_KHR and VK_ERROR_OUT_OF_DATE_KHR have to do with WSI > presentation and will need special handling (depending on conditions, from > ignore to recreate swapchain, or possibly lost device). I'll have this function return nullptr on errors. Currently all these functions are just called on initialization so I'm planning on having them just fallback to GL in those cases, and GL will fallback to software. I'm not sure how to handle the whole reinitialization on at this point, punt until how we have drawing tasks? https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... File gpu/vulkan/vulkan_surface.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:63: VulkanWSISurface(gfx::AcceleratedWidget window) : window_(window) {} On 2016/03/09 01:25:34, piman wrote: > nit: explicit Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:87: LOG(ERROR) << "vkGetPhysicalDeviceSurfaceFormatsKHR() failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:113: surface_format_.format = VK_FORMAT_B8G8R8A8_UNORM; On 2016/03/09 01:25:34, piman wrote: > There's no guarantee that this is supported. If we don't find any format that we > can handle, we have to fail. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:123: LOG(ERROR) << "vkGetPhysicalDeviceSurfaceCapabilitiesKHR() failed: " On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:152: LOG(ERROR) << "Failed to submit command buffer commands."; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:157: // and keeping around the command buffer until it's done. On 2016/03/09 01:25:34, piman wrote: > We'll definitely want some facility to do that - delaying destruction of command > buffers (and other objects) until a fence passes - that would be useful in many > places where we need transient things (e.g. texture uploads). After thinking through this more, I now have this command buffer owned by the swap chain class and it uses it for all image layout commands. I have it synchronized through semaphores. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:160: LOG(ERROR) << "vkQueueWaitIdle() failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.cc:185: class VulkanBufferSurface : public VulkanSurface { On 2016/03/09 01:25:34, piman wrote: > I don't think this is useful - you can't use a VkBuffer as a render target. If > we wanted an offscreen VulkanSurface, it'd be backed by one (or more) VkImage, > but as mentioned above, I don't think this is necessary at this level of > abstraction. Deleted. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... File gpu/vulkan/vulkan_surface.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.h:24: SURFACE_OSMESA_RGBA, On 2016/03/09 01:25:34, piman wrote: > I doubt OSMesa will ever support Vulkan. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_surfa... gpu/vulkan/vulkan_surface.h:39: // Create a surface for offscreen rendering. On 2016/03/09 01:25:34, piman wrote: > I don't think we want this. It was necessary in GL because you need a surface to > use a context at all (to make current), but in Vulkan this is not needed - we > can use the device without any "current" surface Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:74: std::max(surface_caps.minImageCount + 1, surface_caps.maxImageCount); On 2016/03/09 01:25:34, piman wrote: > We shouldn't need +1, because we shouldn't be keeping ownership (on the app > side) of more than 1 image at a time. We do need to make sure it's at least 2, > but if minImageCount==2 we can use that, we don't need 3: > > Start: presentation owns 2 images, app owns 0 > - acquire image -> image 0 (presentation owns 1, app owns 1) > - render to image 0 > - swap image 0 (presentation owns 2, app owns 0, image 0 is displayed) > - acquire image -> image 1 (presentation owns 1, app owns 1) > - render to image 1 > - swap image 1 (presentation owns 2, app owns 0, image 1 is displayed, image 0 > is available) > - acquire image -> image 0 etc. > > We also definitely don't want to allocate maxImageCount if we don't need them. Oops, I meant std::min here. I was assuming minImageCount was 1 here. Actually I am not sure what minImageCount being >1 would mean, would that mean the presentation must own minImageCount images? So I was thinking if it must own 2 then I would always want one to be able to render to. I'll just put std::min(2, max) here. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:83: swap_chain_create_info.presentMode = VK_PRESENT_MODE_IMMEDIATE_KHR; On 2016/03/09 01:25:35, piman wrote: > VK_PRESENT_MODE_FIFO_KHR > > 1- that's the only one that's guaranteed > 2- that's what we want (vsync, no wasted work, similar to eglSwapBuffers with > eglSwapInterval set to 1) Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:91: LOG(ERROR) << "vkCreateSwapchainKHR() failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:96: // TODO(dyen): Look into how to do this safely while commands in flight. On 2016/03/09 01:25:34, piman wrote: > Right, at the very least we need to wait for previous work to complete. I will worry about this when I look into how to safely resize(). https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:103: bool VulkanSwapChain::InitializeRenderPass( On 2016/03/09 01:25:35, piman wrote: > I don't feel like this belongs to the SwapChain. E.g. the compositor may want a > stencil buffer for path rendering, or do multi-pass, or use multisampling. It > will also want to control LOAD_OP_CLEAR or LOAD_OP_LOAD on a case-by-case basis. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:114: attachment_description.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; On 2016/03/09 01:25:34, piman wrote: > It would be useful to do the transition to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR as > part of the render pass, it can be advantageous in case the driver can do it on > a per-tile basis. RenderPass no longer in this CL, I'll make a note of this in the next one. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:135: LOG(ERROR) << "vkCreateRenderPass() failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:153: LOG(ERROR) << "vkGetSwapchainImagesKHR(NULL) failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:161: LOG(ERROR) << "vkGetSwapchainImagesKHR(images) failed: " << result; On 2016/03/09 01:25:34, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:177: VK_ACCESS_HOST_WRITE_BIT | VK_ACCESS_TRANSFER_WRITE_BIT; On 2016/03/09 01:25:35, piman wrote: > I don't think either of these flags is useful: > - we didn't write to the image from the CPU > - we didn't write to the image with a "transfer" command (copy/blit). > > Really for the initial transition, srcAccessMask = 0 / srcStageMask = 0 is ok. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:189: image_view_create_info.components = view_mapping; On 2016/03/09 01:25:34, piman wrote: > For a render target view, the VkComponentMapping needs to be identity - > VkComponentMapping only applies to sampling. Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:192: // Frame buffer creation info. On 2016/03/09 01:25:34, piman wrote: > For the same reason as the RenderPass, I think we want the client to manage the > Framebuffer(s), because it may need stencil and/or more images for multi-pass or > multisampling Ok, I was originally thinking we could just have different attachments as "options" to the SwapBuffer. But it's probably easier to just let the client create this like you say. All that needs to be done is to expose the ImageView then. I'll remove this code for now and will probably create classes to encapsulate render passes and frame buffers later. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:209: VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, 0, 0, nullptr, 0, On 2016/03/09 01:25:34, piman wrote: > If the intent is for this to happen before rendering, the destination stage > should include VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT - and the > destination access mask should include VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT So I was envisioning every image to be in "PRESENT" mode when done, then upon begin rendering I modify PRESENT -> COLOR, and before present I modify COLOR -> PRESENT. So for that reason I was initializing them to PRESENT. Although after actually writing the SwapBuffer() code I now have it synchronized using 4 semaphores and my initial state is indeed COLOR mode now so I have changed this code. I was going to have the SwapBuffer() stuff in another CL but I'll just include it here since it changes the structure of this file quite a bit. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:217: LOG(ERROR) << "vkCreateImageView() failed: " << result; On 2016/03/09 01:25:35, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.cc:226: LOG(ERROR) << "vkCreateFramebuffer() failed: " << result; On 2016/03/09 01:25:35, piman wrote: > nit: DLOG Done. https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... File gpu/vulkan/vulkan_swap_chain.h (right): https://codereview.chromium.org/1776453003/diff/60001/gpu/vulkan/vulkan_swap_... gpu/vulkan/vulkan_swap_chain.h:52: std::vector<scoped_ptr<BufferData>> buffers_; On 2016/03/09 01:25:35, piman wrote: > nit: "buffer" can be a bit confusing, because in Vulkan it really means "linear > memory used for vertex buffers or constants", not "images"/"render targets". Renamed to ImageData/images_. Done.
https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:101: current_image_data->present_semaphore, Hmm... looking at this code again I'm unclear which image this present_semaphore should be associated with. Does it signal when the previous last vkQueuePresentKHR finishes and releases an image? or does it signal when the acquire succeeds because the next image is done presenting?
I've added "vulkan_tests" which is an end to end test similar to "gl_tests". Currently I am testing the initialization, surface creation, and swap buffer paths there. As discussed, after this lands I will work on getting proper VulkanRenderer/VulkanContextProvider plumbing in so I have also removed all the temporary hacks which injected vulkan calls into Chromium. These paths are all covered through the unit tests. PTAL. https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/100001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:101: current_image_data->present_semaphore, On 2016/03/10 02:19:38, David Yen wrote: > Hmm... looking at this code again I'm unclear which image this present_semaphore > should be associated with. Does it signal when the previous last > vkQueuePresentKHR finishes and releases an image? or does it signal when the > acquire succeeds because the next image is done presenting? Chatted separately. This semaphore and the following layout transition should apply to the next image.
https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.h (right): https://codereview.chromium.org/1776453003/diff/60001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.h:270: #endif On 2016/03/10 01:39:48, David Yen wrote: > On 2016/03/09 01:25:33, piman wrote: > > I don't think we should be touching GpuCommandBufferStub. In the Vulkan world, > > we wouldn't be using GpuCommandBufferStubs, nor the GPU process in the way we > do > > with GL - if at all. Instead, we should have the OutputSurface directly > connect > > to the VulkanSurface. > > Still working on this, just fixed everything else for you to review first. This has been reverted and tested through the vulkan_tests instead. This should eventually be done through OutputSurface with VulkanContextProvider.
https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:60: if (num_wait_semaphores) { nit: the if here and below are unnecessary, in fact (the cost of the test is higher than the cost of setting the extra field on the stack). https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf... File gpu/vulkan/vulkan_surface.h (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf... gpu/vulkan/vulkan_surface.h:22: SURFACE_GBRA8888, nit: GBRA->BGRA? BTW, we can start with this, but thinking more about it, I suspect we won't really have a choice, at least on desktop. The available formats are likely fixed based on the visual/format of the underlying window. It may make more sense to just create the window based on what we want (16 vs 32 bits), then create the VulkanSurface, and then query the format. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:51: ¤t_image_data->render_semaphore)) { You don't need a semaphore to order command buffers on the same queue. I think you should be able to get away with 2 semaphores per frame, one post-acquire, one pre-present. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:57: ScopedSingleUseCommandBufferRecorder recorder( 2 things: 1- you can't reuse the command buffer until you know the previous instance is done executing - you need a fence for that 2- on some drivers/platforms, submit is supposed to be expensive (hundreds of µs to ms), so creating a small command buffer has a lot of overhead. A possibility is to submit multiple command buffers at once, but I think I'd let the caller do the transition in the main command buffer instead. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:99: // do the synchronization for us. vkAcquireNextImageKHR needs to be able to block, regardless of semaphores. Basically, even with an infinitely fast GPU, the system may be asynchronous, e.g. SurfaceFlinger on Android, or X11, etc. What this means is that you probably want an infinite timeout, at least to start with, though maybe a polling loop can be useful at a higher level (can punt for now). Obviously, you'd want to do the vkAcquireNextImageKHR as late as possible, rather than right after the Present, to minimize waiting time. For example if the compositor is vsync driven, you wouldn't want to acquire until we start drawing the next frame. It may make sense to separate Acquire from Present into 2 functions. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:116: ScopedSingleUseCommandBufferRecorder recorder( Same points as above. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:138: // here. This should be ok (note that I don't think you need it per above, though). It's ok to wait on a semaphore before it's signalled (note that it's the queue waiting, not the CPU), that's more or less the point of semaphores. There are some constraints around reuse, though - if the semaphore was signalled and waited on a different queue, it would be invalid to queue another signal unless you can guarantee the second signal will happen after the wait (e.g. with another semaphore, or a fence). In your case, you're fine though (single queue). https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:159: std::min(2u, surface_caps.maxImageCount); It does have to be more than surface_caps.minImageCount though (that could be 3+ in theory - probably not in practice for FIFO mode).
Since a lot of the comments were around fixing the layout transition code, I went ahead and just wrote the render pass code since that is what we will be using anyways. PTAL. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:60: if (num_wait_semaphores) { On 2016/03/11 03:07:52, piman wrote: > nit: the if here and below are unnecessary, in fact (the cost of the test is > higher than the cost of setting the extra field on the stack). Good point. Done. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf... File gpu/vulkan/vulkan_surface.h (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_surf... gpu/vulkan/vulkan_surface.h:22: SURFACE_GBRA8888, On 2016/03/11 03:07:52, piman wrote: > nit: GBRA->BGRA? > > BTW, we can start with this, but thinking more about it, I suspect we won't > really have a choice, at least on desktop. The available formats are likely > fixed based on the visual/format of the underlying window. It may make more > sense to just create the window based on what we want (16 vs 32 bits), then > create the VulkanSurface, and then query the format. Done. That's interesting about the format choice, I will keep that in mind as we progress. So some desktops won't support VK_FORMAT_B8G8R8A8_UNORM? I would assume that would be supported everywhere which is why I chose it to be the default format. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:51: ¤t_image_data->render_semaphore)) { On 2016/03/11 03:07:52, piman wrote: > You don't need a semaphore to order command buffers on the same queue. I think > you should be able to get away with 2 semaphores per frame, one post-acquire, > one pre-present. That would make things easier. I wasn't sure because according to Chapter 5 in the spec it says "Unless otherwise specified, and without explicit synchronization, the various commands submitted to a queue via command buffers may execute in arbitrary order relative to each other, and/or concurrently". I guess that is talking more about the execution and not queuing the command on the device? Done. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:57: ScopedSingleUseCommandBufferRecorder recorder( On 2016/03/11 03:07:53, piman wrote: > 2 things: > 1- you can't reuse the command buffer until you know the previous instance is > done executing - you need a fence for that > 2- on some drivers/platforms, submit is supposed to be expensive (hundreds of µs > to ms), so creating a small command buffer has a lot of overhead. A possibility > is to submit multiple command buffers at once, but I think I'd let the caller do > the transition in the main command buffer instead. Done. To make how this is suppose to be done clearer, I went ahead and implemented vulkan render pass to do this. Therefore, all of this code is now unnecessary. The tests also test the render pass functionality now. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:99: // do the synchronization for us. On 2016/03/11 03:07:53, piman wrote: > vkAcquireNextImageKHR needs to be able to block, regardless of semaphores. > Basically, even with an infinitely fast GPU, the system may be asynchronous, > e.g. SurfaceFlinger on Android, or X11, etc. > > What this means is that you probably want an infinite timeout, at least to start > with, though maybe a polling loop can be useful at a higher level (can punt for > now). > > Obviously, you'd want to do the vkAcquireNextImageKHR as late as possible, > rather than right after the Present, to minimize waiting time. For example if > the compositor is vsync driven, you wouldn't want to acquire until we start > drawing the next frame. It may make sense to separate Acquire from Present into > 2 functions. Done. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:116: ScopedSingleUseCommandBufferRecorder recorder( On 2016/03/11 03:07:52, piman wrote: > Same points as above. Done. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:138: // here. On 2016/03/11 03:07:53, piman wrote: > This should be ok (note that I don't think you need it per above, though). It's > ok to wait on a semaphore before it's signalled (note that it's the queue > waiting, not the CPU), that's more or less the point of semaphores. There are > some constraints around reuse, though - if the semaphore was signalled and > waited on a different queue, it would be invalid to queue another signal unless > you can guarantee the second signal will happen after the wait (e.g. with > another semaphore, or a fence). In your case, you're fine though (single queue). Acknowledged. https://codereview.chromium.org/1776453003/diff/120001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:159: std::min(2u, surface_caps.maxImageCount); On 2016/03/11 03:07:53, piman wrote: > It does have to be more than surface_caps.minImageCount though (that could be 3+ > in theory - probably not in practice for FIFO mode). Done. https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:181: image_memory_barrier.newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR; I changed this back to PRESENT. The initial UNDEFINED->PRESENT transition is a special case. This makes it so the render pass can just always follow the transition PRESENT->COLOR->PRESENT. As an example, the render pass test does this. https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:251: result = vkQueueWaitIdle(GetVulkanQueue()); I added this back in since we no longer do layout transitions during swap so this extra command buffer is only used for setup. Eventually we would probably have it destroy itself when it's finished through callbacks?
Description was changed from ========== Added initial implementation of a VulkanSurface. An initial VulkanSurface implementation has been done here, the surface is now platform independent as most of the implementation should be shared as a VulkanWSISurface. The only exception is the creation which is separated out by #ifdefs. Additionally, each surface contains the swap chain so that it can swap buffers itself. Each buffer also contains a command buffer which will eventually be used to submit commands for each frame. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ========== to ========== Added initial implementation of Vulkan Render Passes. An initial implementation for various Vulkan rendering objects has been done here. I tried to do the minimal amount of work which would still compile/run but that ended up being quite a bit of work. We have now have Vulkan implementations of command buffers, swap chains, render passes, and surfaces. A Vulkan test has also been added which tests that all these functionalities run and return expected return values. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ==========
jie.a.chen@intel.com changed reviewers: + jie.a.chen@intel.com
https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:93: vkResetCommandPool(GetVulkanDevice(), command_pool_->handle(), Would it be impertinent to ask whether vkResetCommandPool seems a typo here? Probably it should be vkResetCommandBuffer instead.
https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/140001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:93: vkResetCommandPool(GetVulkanDevice(), command_pool_->handle(), On 2016/03/23 07:12:57, jchen10 wrote: > Would it be impertinent to ask whether vkResetCommandPool seems a typo here? > Probably it should be vkResetCommandBuffer instead. Good catch, not sure what happened here. Fixed.
https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka... File gpu/vulkan/tests/vulkan_test.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka... gpu/vulkan/tests/vulkan_test.cc:115: render_pass.Destroy(); You need to wait for the RenderPass to not be in use before destroying. Same with the surface/swapchain below. vkQueueWaitIdle is a big hammer, but it's simple enough for tests. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment = &depth_stencil_ref; depth_stencil_ref is a local variable. Its pointer is invalid after the loop iteration (in practice, likely each depth_stencil_ref for each iteration is at the same address and you just overwrite data). Also, if you don't have a D/S attachment, you want NULL here, not a VkAttachmentReference that points to VK_ATTACHMENT_UNUSED https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.h (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:46: // The stencil ops are only sued for IMAGE_TYPE_STENCIL and nit: sued->used https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views; // If more than 1, must match swap chain count. I'm not exactly sure why you need this, or what it means really. You can only have 1 image view per attachment on a given framebuffer. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:53: VulkanImageView* image_views; // used for ATTACHMENT_TYPE_ATTACHMENT_VIEW. Why plural? https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain* swap_chain, Do you have to pass a SwapChain here? What if I want to make a render pass for offscreen buffers? https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:95: bool exec_inline); This is ok as a first step, but I think eventually we'll want to separate the RenderPass from the RenderPass instance. After the RenderPass is created, it's immutable, and that's a useful property because it's trivial to share across threads. If it can be mutable (current_sub_pass_/executing_/execution_type_) then you need locking and you lose the benefits of threading. One example could be to have BeginRenderPass return a RenderPassInstance that contains that state, and either NextSubPass/EndRenderPass are methods on it, or they stay here but take the RenderPassInstance as a parameter. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:50: ¤t_image_data->render_semaphore)) { You need a fence here, to know when you can reuse the command buffer. the next time around. One per image, and you probably want to wait on that fence after vkAcquireNextImageKHR returns the same image. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us. We'll need a fence for throttling, though. Without a fence, this could queue an infinite number of frames, faster than the GPU or display can consume. The fence suggested above may be enough.
https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka... File gpu/vulkan/tests/vulkan_test.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/tests/vulka... gpu/vulkan/tests/vulkan_test.cc:115: render_pass.Destroy(); On 2016/03/25 02:01:26, piman wrote: > You need to wait for the RenderPass to not be in use before destroying. > Same with the surface/swapchain below. > > vkQueueWaitIdle is a big hammer, but it's simple enough for tests. Done. I added a Finish() call to VulkanSurface which does this. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment = &depth_stencil_ref; On 2016/03/25 02:01:26, piman wrote: > depth_stencil_ref is a local variable. Its pointer is invalid after the loop > iteration (in practice, likely each depth_stencil_ref for each iteration is at > the same address and you just overwrite data). > > Also, if you don't have a D/S attachment, you want NULL here, not a > VkAttachmentReference that points to VK_ATTACHMENT_UNUSED Fixed the reference. According to the docs both work, so just initializing it to VK_ATTACHMENT_UNUSED and always using it seems easier here: 7.1: Setting the attachment index to VK_ATTACHMENT_UNUSED or leaving this pointer as NULL indicates that no depth/stencil attachment will be used in the subpass. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.h (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:46: // The stencil ops are only sued for IMAGE_TYPE_STENCIL and On 2016/03/25 02:01:26, piman wrote: > nit: sued->used Done. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views; // If more than 1, must match swap chain count. On 2016/03/25 02:01:26, piman wrote: > I'm not exactly sure why you need this, or what it means really. You can only > have 1 image view per attachment on a given framebuffer. So this attachment data is meant to used to create all the framebuffers internally. A framebuffer gets created for every image in the swap chain. If someone wants to have a single image view for every frame buffer then they would use a single one, if someone wants to have an image view for every corresponding swap chain image then they would have multiple (the same amount as the swap chain image count). https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain* swap_chain, On 2016/03/25 02:01:26, piman wrote: > Do you have to pass a SwapChain here? > What if I want to make a render pass for offscreen buffers? We will probably have to create another Initialize function for that and error if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be thought about too. Here I am automatically creating a frame buffer for every swap chain image, would we only want 1 framebuffer for offscreen buffers? or have them be able to specify the number of frame buffers? https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:95: bool exec_inline); On 2016/03/25 02:01:26, piman wrote: > This is ok as a first step, but I think eventually we'll want to separate the > RenderPass from the RenderPass instance. After the RenderPass is created, it's > immutable, and that's a useful property because it's trivial to share across > threads. If it can be mutable (current_sub_pass_/executing_/execution_type_) > then you need locking and you lose the benefits of threading. > One example could be to have BeginRenderPass return a RenderPassInstance that > contains that state, and either NextSubPass/EndRenderPass are methods on it, or > they stay here but take the RenderPassInstance as a parameter. Hmm that is interesting. Acknowledged. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:50: ¤t_image_data->render_semaphore)) { On 2016/03/25 02:01:27, piman wrote: > You need a fence here, to know when you can reuse the command buffer. the next > time around. One per image, and you probably want to wait on that fence after > vkAcquireNextImageKHR returns the same image. Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't that mean when the presentation engine releases an image the command buffer is also done being used and can be reused? Or are they not synchronized that way? Otherwise, would just waiting on the present_semaphore be enough? I think previously I had this wait on the present_semaphore but after I added the UINT64_MAX timeout I removed it since the present_semaphore didn't seem necessary anymore. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us. On 2016/03/25 02:01:26, piman wrote: > We'll need a fence for throttling, though. Without a fence, this could queue an > infinite number of frames, faster than the GPU or display can consume. The fence > suggested above may be enough. Wouldn't this call throttle by blocking if the next image is still being owned by the presentation engine?
https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.cc:266: subpass_desc.pDepthStencilAttachment = &depth_stencil_ref; On 2016/03/25 17:36:14, David Yen wrote: > On 2016/03/25 02:01:26, piman wrote: > > depth_stencil_ref is a local variable. Its pointer is invalid after the loop > > iteration (in practice, likely each depth_stencil_ref for each iteration is at > > the same address and you just overwrite data). > > > > Also, if you don't have a D/S attachment, you want NULL here, not a > > VkAttachmentReference that points to VK_ATTACHMENT_UNUSED > > Fixed the reference. According to the docs both work, so just initializing it to > VK_ATTACHMENT_UNUSED and always using it seems easier here: > > 7.1: Setting the attachment index to VK_ATTACHMENT_UNUSED or leaving this > pointer as NULL indicates that no depth/stencil attachment will be used in the > subpass. Ah, ok, you're right. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.h (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views; // If more than 1, must match swap chain count. On 2016/03/25 17:36:14, David Yen wrote: > On 2016/03/25 02:01:26, piman wrote: > > I'm not exactly sure why you need this, or what it means really. You can only > > have 1 image view per attachment on a given framebuffer. > > So this attachment data is meant to used to create all the framebuffers > internally. A framebuffer gets created for every image in the swap chain. If > someone wants to have a single image view for every frame buffer then they would > use a single one, if someone wants to have an image view for every corresponding > swap chain image then they would have multiple (the same amount as the swap > chain image count). I think I understand. It's hard to imagine a use case for that - presumably we never use different swap chain images concurrently in the same frame, so any temporary render target can just be reused across frames. We certainly don't have anything like that in the compositor today. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain* swap_chain, On 2016/03/25 17:36:14, David Yen wrote: > On 2016/03/25 02:01:26, piman wrote: > > Do you have to pass a SwapChain here? > > What if I want to make a render pass for offscreen buffers? > > We will probably have to create another Initialize function for that and error > if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be thought > about too. Here I am automatically creating a frame buffer for every swap chain > image, would we only want 1 framebuffer for offscreen buffers? or have them be > able to specify the number of frame buffers? I don't see a use case to do more than one. Well, we may want to decouple the framebuffer from the render pass. The render pass affects pipeline compilation, and we definitely want to keep the same pipelines for all compositor passes to avoid a combinatorial explosion. But that's orthogonal. We can do that as a follow up (similar to RP Instance). https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:50: ¤t_image_data->render_semaphore)) { On 2016/03/25 17:36:14, David Yen wrote: > On 2016/03/25 02:01:27, piman wrote: > > You need a fence here, to know when you can reuse the command buffer. the next > > time around. One per image, and you probably want to wait on that fence after > > vkAcquireNextImageKHR returns the same image. > > Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't that > mean when the presentation engine releases an image the command buffer is also > done being used and can be reused? Or are they not synchronized that way? That's not enough. The blocking that vkAcquireNextImageKHR is actually potentially unrelated to GPU<->CPU communication, and has to do with the app<->compositor communication. For example if the presentation engine uses a blit model (i.e. it copies the data from the presentation image), you can conceptually see vkQueuePresentKHR as just enqueuing a blit, and after that the image is available again. IOW, the following API sequence: Submit(queue, null /*sema*/, command_buffer1, sema1); QueuePresent(queue, sema1, swapchain, image_index); AcquireNextImage(device, swap_chain, sema2, &image_index); Submit(queue, sema2, command_buffer2, null /*sema*/); Is really equivalent, under the hood, to: Submit(queue, null /*sema*/, command_buffer1, sema1); Submit(queue, sema1, blit_command_buffer, sema2); Submit(queue, sema2, command_buffer2, null /*sema*/); No fence needs to be involved. That would mean that AcquireNextImage can return as soon as blit_command_buffer is submitted, with the same image, but that doesn't mean that command_buffer1 is done executing, or any previous command buffer, really. In fact the presentation engine may never even wait on the GPU. > Otherwise, would just waiting on the present_semaphore be enough? I think > previously I had this wait on the present_semaphore but after I added the > UINT64_MAX timeout I removed it since the present_semaphore didn't seem > necessary anymore. You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a VkSemaphore from the CPU. A semaphore is more like an ordering primitive (similar to chrome sync tokens). https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us. On 2016/03/25 17:36:14, David Yen wrote: > On 2016/03/25 02:01:26, piman wrote: > > We'll need a fence for throttling, though. Without a fence, this could queue > an > > infinite number of frames, faster than the GPU or display can consume. The > fence > > suggested above may be enough. > > Wouldn't this call throttle by blocking if the next image is still being owned > by the presentation engine? The presentation engine may very well never wait on the GPU, so it's not enough to throttle.
https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... File gpu/vulkan/vulkan_render_pass.h (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:52: uint32_t num_image_views; // If more than 1, must match swap chain count. On 2016/03/25 21:13:22, piman wrote: > On 2016/03/25 17:36:14, David Yen wrote: > > On 2016/03/25 02:01:26, piman wrote: > > > I'm not exactly sure why you need this, or what it means really. You can > only > > > have 1 image view per attachment on a given framebuffer. > > > > So this attachment data is meant to used to create all the framebuffers > > internally. A framebuffer gets created for every image in the swap chain. If > > someone wants to have a single image view for every frame buffer then they > would > > use a single one, if someone wants to have an image view for every > corresponding > > swap chain image then they would have multiple (the same amount as the swap > > chain image count). > > I think I understand. It's hard to imagine a use case for that - presumably we > never use different swap chain images concurrently in the same frame, so any > temporary render target can just be reused across frames. We certainly don't > have anything like that in the compositor today. Ok, removed. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_rend... gpu/vulkan/vulkan_render_pass.h:86: bool Initialize(const VulkanSwapChain* swap_chain, On 2016/03/25 21:13:22, piman wrote: > On 2016/03/25 17:36:14, David Yen wrote: > > On 2016/03/25 02:01:26, piman wrote: > > > Do you have to pass a SwapChain here? > > > What if I want to make a render pass for offscreen buffers? > > > > We will probably have to create another Initialize function for that and error > > if ATTACHMENT_TYPE_SWAP_IMAGE is used. Some other things will have to be > thought > > about too. Here I am automatically creating a frame buffer for every swap > chain > > image, would we only want 1 framebuffer for offscreen buffers? or have them be > > able to specify the number of frame buffers? > > I don't see a use case to do more than one. > > Well, we may want to decouple the framebuffer from the render pass. The render > pass affects pipeline compilation, and we definitely want to keep the same > pipelines for all compositor passes to avoid a combinatorial explosion. But > that's orthogonal. We can do that as a follow up (similar to RP Instance). Acknowledged. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:50: ¤t_image_data->render_semaphore)) { On 2016/03/25 21:13:22, piman wrote: > On 2016/03/25 17:36:14, David Yen wrote: > > On 2016/03/25 02:01:27, piman wrote: > > > You need a fence here, to know when you can reuse the command buffer. the > next > > > time around. One per image, and you probably want to wait on that fence > after > > > vkAcquireNextImageKHR returns the same image. > > > > Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't > that > > mean when the presentation engine releases an image the command buffer is also > > done being used and can be reused? Or are they not synchronized that way? > > That's not enough. The blocking that vkAcquireNextImageKHR is actually > potentially unrelated to GPU<->CPU communication, and has to do with the > app<->compositor communication. > > For example if the presentation engine uses a blit model (i.e. it copies the > data from the presentation image), you can conceptually see vkQueuePresentKHR as > just enqueuing a blit, and after that the image is available again. > > IOW, the following API sequence: > Submit(queue, null /*sema*/, command_buffer1, sema1); > QueuePresent(queue, sema1, swapchain, image_index); > AcquireNextImage(device, swap_chain, sema2, &image_index); > Submit(queue, sema2, command_buffer2, null /*sema*/); > > Is really equivalent, under the hood, to: > Submit(queue, null /*sema*/, command_buffer1, sema1); > Submit(queue, sema1, blit_command_buffer, sema2); > Submit(queue, sema2, command_buffer2, null /*sema*/); > > No fence needs to be involved. That would mean that AcquireNextImage can return > as soon as blit_command_buffer is submitted, with the same image, but that > doesn't mean that command_buffer1 is done executing, or any previous command > buffer, really. In fact the presentation engine may never even wait on the GPU. > > > Otherwise, would just waiting on the present_semaphore be enough? I think > > previously I had this wait on the present_semaphore but after I added the > > UINT64_MAX timeout I removed it since the present_semaphore didn't seem > > necessary anymore. > > You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a > VkSemaphore from the CPU. A semaphore is more like an ordering primitive > (similar to chrome sync tokens). Ok I see. I added the submission fence. Since the AcquireNextImage call also enqueues commands, that means we still would need to wait on the present semaphore right? Waiting on the submission fence is not enough to guarantee the present semaphore has signaled yet. I wonder if we should just make this a feature in VulkanCommandBuffer, where we can set a VkFence which always gets waited upon before submission. Seems like we would always need something like this. We can add an check for the fence status too for asynchronous usage patterns. https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:73: // do the synchronization for us. On 2016/03/25 21:13:22, piman wrote: > On 2016/03/25 17:36:14, David Yen wrote: > > On 2016/03/25 02:01:26, piman wrote: > > > We'll need a fence for throttling, though. Without a fence, this could queue > > an > > > infinite number of frames, faster than the GPU or display can consume. The > > fence > > > suggested above may be enough. > > > > Wouldn't this call throttle by blocking if the next image is still being owned > > by the presentation engine? > > The presentation engine may very well never wait on the GPU, so it's not enough > to throttle. Acknowledged.
https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/180001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:50: ¤t_image_data->render_semaphore)) { On 2016/03/26 00:33:30, David Yen wrote: > On 2016/03/25 21:13:22, piman wrote: > > On 2016/03/25 17:36:14, David Yen wrote: > > > On 2016/03/25 02:01:27, piman wrote: > > > > You need a fence here, to know when you can reuse the command buffer. the > > next > > > > time around. One per image, and you probably want to wait on that fence > > after > > > > vkAcquireNextImageKHR returns the same image. > > > > > > Since vkAcquireNextImageKHR is blocking on the presentation engine, doesn't > > that > > > mean when the presentation engine releases an image the command buffer is > also > > > done being used and can be reused? Or are they not synchronized that way? > > > > That's not enough. The blocking that vkAcquireNextImageKHR is actually > > potentially unrelated to GPU<->CPU communication, and has to do with the > > app<->compositor communication. > > > > For example if the presentation engine uses a blit model (i.e. it copies the > > data from the presentation image), you can conceptually see vkQueuePresentKHR > as > > just enqueuing a blit, and after that the image is available again. > > > > IOW, the following API sequence: > > Submit(queue, null /*sema*/, command_buffer1, sema1); > > QueuePresent(queue, sema1, swapchain, image_index); > > AcquireNextImage(device, swap_chain, sema2, &image_index); > > Submit(queue, sema2, command_buffer2, null /*sema*/); > > > > Is really equivalent, under the hood, to: > > Submit(queue, null /*sema*/, command_buffer1, sema1); > > Submit(queue, sema1, blit_command_buffer, sema2); > > Submit(queue, sema2, command_buffer2, null /*sema*/); > > > > No fence needs to be involved. That would mean that AcquireNextImage can > return > > as soon as blit_command_buffer is submitted, with the same image, but that > > doesn't mean that command_buffer1 is done executing, or any previous command > > buffer, really. In fact the presentation engine may never even wait on the > GPU. > > > > > Otherwise, would just waiting on the present_semaphore be enough? I think > > > previously I had this wait on the present_semaphore but after I added the > > > UINT64_MAX timeout I removed it since the present_semaphore didn't seem > > > necessary anymore. > > > > You need a GPU->CPU synchronization, that's a VkFence. You can't wait on a > > VkSemaphore from the CPU. A semaphore is more like an ordering primitive > > (similar to chrome sync tokens). > > Ok I see. I added the submission fence. Since the AcquireNextImage call also > enqueues commands, that means we still would need to wait on the present > semaphore right? Waiting on the submission fence is not enough to guarantee the > present semaphore has signaled yet. Correct. > I wonder if we should just make this a feature in VulkanCommandBuffer, where we > can set a VkFence which always gets waited upon before submission. Seems like we > would always need something like this. We can add an check for the fence status > too for asynchronous usage patterns. I think associating the VkFence with the VkCommandBuffer is a good idea, but keep in mind that you want to wait for it before reusing the command buffer, not before submitting it. https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:51: // Make sure the current command buffer is not in use. You need to do that before resetting/recording the command buffer, though... That's why I suggested doing that after vkAcquireNextImageKHR. https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:231: &image_data->submission_fence); Maybe you want to create these ones with VK_FENCE_CREATE_SIGNALED_BIT for the first frame. Otherwise your first vkWaitForFences in SwapBuffers will block. (but obviously, not for layout_fence since you submit it before waiting for it).
I have just started to learn Vulkan. I roughly went through the Spec and samples in SDK. I found this CL a few days ago and have been studying it since then. It's amazing and has benefited me a lot regarding my Vulkan study. As a newbie to Vulkan, my comments might be superficial or simply noisy. Please ignore them in such cases! BTW, if you have any relatively simple or plain coding task like adding more unit-tests to be offloaded, I am willing to take it! https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:191: image_memory_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; Should we use VK_ACCESS_MEMORY_READ_BIT here for the below VK_IMAGE_LAYOUT_SRC_KHR layout?
https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:191: image_memory_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; On 2016/03/28 07:47:23, jchen10 wrote: > Should we use VK_ACCESS_MEMORY_READ_BIT here for the below > VK_IMAGE_LAYOUT_SRC_KHR layout? True, this is inconsistent. Actually, we're not allowed to operate on swapchain images that we haven't acquired, so we should do the transition as part of the commands of the first frame that operates on each image.
https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... File gpu/vulkan/vulkan_swap_chain.cc (right): https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:51: // Make sure the current command buffer is not in use. On 2016/03/26 01:38:01, piman wrote: > You need to do that before resetting/recording the command buffer, though... > That's why I suggested doing that after vkAcquireNextImageKHR. Right, I was trying to delay it to be as late as possible. Putting it here doesn't make sense... I think placing it in VulkanCommandBuffer before we reset the command buffer is the best place. https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:191: image_memory_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; On 2016/03/28 07:47:23, jchen10 wrote: > Should we use VK_ACCESS_MEMORY_READ_BIT here for the below > VK_IMAGE_LAYOUT_SRC_KHR layout? Done. I can just queue the command for each command buffer which will get submitted after the acquire commands. https://codereview.chromium.org/1776453003/diff/250001/gpu/vulkan/vulkan_swap... gpu/vulkan/vulkan_swap_chain.cc:231: &image_data->submission_fence); On 2016/03/26 01:38:01, piman wrote: > Maybe you want to create these ones with VK_FENCE_CREATE_SIGNALED_BIT for the > first frame. Otherwise your first vkWaitForFences in SwapBuffers will block. > > (but obviously, not for layout_fence since you submit it before waiting for it). Done inside VulkanCommandBuffer. The implementation didn't need it for this particular problem (VulkanCommandBuffer has state tracking for when a submission took place), but this just makes it so Wait() and SubmissionFinished() can be called before the first Submit() call.
On 2016/03/28 07:47:23, jchen10 wrote: > I have just started to learn Vulkan. I roughly went through the Spec and samples > in SDK. I found this CL a few days ago and have been studying it since then. > It's amazing and has benefited me a lot regarding my Vulkan study. As a newbie > to Vulkan, my comments might be superficial or simply noisy. Please ignore them > in such cases! > BTW, if you have any relatively simple or plain coding task like adding more > unit-tests to be offloaded, I am willing to take it! > Thank you for spotting this. I am also just reading the spec for the first time so am also learning :) We are definitely still prototyping and figuring out how best to utilize the Vulkan feature set. Since this is still very early in our efforts we haven't quite figured out how things will look yet. We are probably not yet at the point where we can parallelize the efforts. Definitely continue keeping an eye on this though!
Whoops, forgot to upload the patch. Patch #15 contains the changes.
One last thing and then I think it's good to go. https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; I don't think you want it to be created signaled, if you initialize in "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that case - so you'll have a signalled fence when you submit (which is illegal, and incorrect anyway). https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:87: VkResult result = vkQueueSubmit(queue, 1, &submit_info, submission_fence_); Thought for later (nothing to do in this CL, it's big enough as it is): we may want at some point to submit several command buffers at once - the submit overhead is high on at least some platforms (Windows in particular). When we get to that, we'll need to share the fence across command buffers. It could be something along the lines of: the fence is owned by the image data in the swap chain, and it's given to the command buffer after the multi-submit.
https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 19:09:21, piman wrote: > I don't think you want it to be created signaled, if you initialize in > "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that case - > so you'll have a signalled fence when you submit (which is illegal, and > incorrect anyway). That's interesting that my tests passed. Maybe validation would have caught it? I will add validation layers on the next patch. Having it signaled makes some things very convenient though (for example being able to always call SubmissionFinished() such as my DCHECK in the destroy). Maybe we should reset the fence before submission instead? That would make the behavior of Wait() and SubmissionFinished() slightly different, but I can change the comments to reflect that. https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:87: VkResult result = vkQueueSubmit(queue, 1, &submit_info, submission_fence_); On 2016/03/29 19:09:21, piman wrote: > Thought for later (nothing to do in this CL, it's big enough as it is): we may > want at some point to submit several command buffers at once - the submit > overhead is high on at least some platforms (Windows in particular). When we get > to that, we'll need to share the fence across command buffers. It could be > something along the lines of: the fence is owned by the image data in the swap > chain, and it's given to the command buffer after the multi-submit. Would we want to submit multiple command buffers which setup render passes? Otherwise we can submit multiple secondary command buffers within a single primary command buffer right?
https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 19:27:31, David Yen wrote: > On 2016/03/29 19:09:21, piman wrote: > > I don't think you want it to be created signaled, if you initialize in > > "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that case > - > > so you'll have a signalled fence when you submit (which is illegal, and > > incorrect anyway). > > That's interesting that my tests passed. Maybe validation would have caught it? > I will add validation layers on the next patch. I think they would have caught it. > Having it signaled makes some things very convenient though (for example being > able to always call SubmissionFinished() such as my DCHECK in the destroy). > Maybe we should reset the fence before submission instead? That would make the > behavior of Wait() and SubmissionFinished() slightly different, but I can change > the comments to reflect that. Another option is to add one more state, RECORD_TYPE_SUBMITTED. Wait could return immediately except in that state, and SubmissionFinished would return true. Either one would transition to RECORD_TYPE_DIRTY after the fence is passed, and reset the fence. ResetIfDirty could call Wait(infinite) and then reset the command buffer if dirty. https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:87: VkResult result = vkQueueSubmit(queue, 1, &submit_info, submission_fence_); On 2016/03/29 19:27:31, David Yen wrote: > On 2016/03/29 19:09:21, piman wrote: > > Thought for later (nothing to do in this CL, it's big enough as it is): we may > > want at some point to submit several command buffers at once - the submit > > overhead is high on at least some platforms (Windows in particular). When we > get > > to that, we'll need to share the fence across command buffers. It could be > > something along the lines of: the fence is owned by the image data in the swap > > chain, and it's given to the command buffer after the multi-submit. > > Would we want to submit multiple command buffers which setup render passes? Possibly. For example, we'll need one render pass per tile we raster. It could make sense to create those on multiple threads, which will require separate command buffers. > Otherwise we can submit multiple secondary command buffers within a single > primary command buffer right? The other things are non-render things, such as texture upload, which don't go into render passes.
https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... File gpu/vulkan/vulkan_command_buffer.cc (right): https://codereview.chromium.org/1776453003/diff/270001/gpu/vulkan/vulkan_comm... gpu/vulkan/vulkan_command_buffer.cc:45: fence_create_info.flags = VK_FENCE_CREATE_SIGNALED_BIT; On 2016/03/29 20:42:39, piman wrote: > On 2016/03/29 19:27:31, David Yen wrote: > > On 2016/03/29 19:09:21, piman wrote: > > > I don't think you want it to be created signaled, if you initialize in > > > "RECORD_TYPE_EMPTY" state, because ResetIfDirty will skip waiting in that > case > > - > > > so you'll have a signalled fence when you submit (which is illegal, and > > > incorrect anyway). > > > > That's interesting that my tests passed. Maybe validation would have caught > it? > > I will add validation layers on the next patch. > > I think they would have caught it. > > > Having it signaled makes some things very convenient though (for example being > > able to always call SubmissionFinished() such as my DCHECK in the destroy). > > Maybe we should reset the fence before submission instead? That would make the > > behavior of Wait() and SubmissionFinished() slightly different, but I can > change > > the comments to reflect that. > > Another option is to add one more state, RECORD_TYPE_SUBMITTED. Wait could > return immediately except in that state, and SubmissionFinished would return > true. Either one would transition to RECORD_TYPE_DIRTY after the fence is > passed, and reset the fence. ResetIfDirty could call Wait(infinite) and then > reset the command buffer if dirty. It kind of is already like that already, but because of the state tracking we have 2 for the submission state: RECORD_TYPE_RECORDED for multi use and DIRTY for single use, although DIRTY is also used when the command buffer is cleared. I think keeping the current state is simplest to understand and only adds one extra reset for the very first submission. The fence reset would just be a client side function right?
Ok, LGTM.
The CQ bit was checked by dyen@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1776453003/310001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1776453003/310001
Message was sent while issue was closed.
Description was changed from ========== Added initial implementation of Vulkan Render Passes. An initial implementation for various Vulkan rendering objects has been done here. I tried to do the minimal amount of work which would still compile/run but that ended up being quite a bit of work. We have now have Vulkan implementations of command buffers, swap chains, render passes, and surfaces. A Vulkan test has also been added which tests that all these functionalities run and return expected return values. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ========== to ========== Added initial implementation of Vulkan Render Passes. An initial implementation for various Vulkan rendering objects has been done here. I tried to do the minimal amount of work which would still compile/run but that ended up being quite a bit of work. We have now have Vulkan implementations of command buffers, swap chains, render passes, and surfaces. A Vulkan test has also been added which tests that all these functionalities run and return expected return values. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ==========
Message was sent while issue was closed.
Committed patchset #17 (id:310001)
Message was sent while issue was closed.
Description was changed from ========== Added initial implementation of Vulkan Render Passes. An initial implementation for various Vulkan rendering objects has been done here. I tried to do the minimal amount of work which would still compile/run but that ended up being quite a bit of work. We have now have Vulkan implementations of command buffers, swap chains, render passes, and surfaces. A Vulkan test has also been added which tests that all these functionalities run and return expected return values. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel ========== to ========== Added initial implementation of Vulkan Render Passes. An initial implementation for various Vulkan rendering objects has been done here. I tried to do the minimal amount of work which would still compile/run but that ended up being quite a bit of work. We have now have Vulkan implementations of command buffers, swap chains, render passes, and surfaces. A Vulkan test has also been added which tests that all these functionalities run and return expected return values. BUG=582558 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel Committed: https://crrev.com/4ec04eccd55ebfabc916eae2f417e5fa483e1c3e Cr-Commit-Position: refs/heads/master@{#384125} ==========
Message was sent while issue was closed.
Patchset 17 (id:??) landed as https://crrev.com/4ec04eccd55ebfabc916eae2f417e5fa483e1c3e Cr-Commit-Position: refs/heads/master@{#384125} |