Your First Triangle in Vulkan
This is the big one. A single colored triangle in explicit Vulkan is on the order of a thousand lines, because you declare every piece yourself: the instance, the device and its queues, the swapchain, the pipeline, the command buffers, and the synchronization that keeps the CPU and GPU out of each other's way. We build it in C++ and Rust (ash), with dynamic rendering, and put the part everyone gets wrong (semaphores vs fences) front and center.
01Why ~1000 lines
OpenGL hid the instance, the device, the framebuffer, and the synchronization behind global state and the driver. Vulkan is explicit by design: you declare all of it. That's why a from-scratch, validation-clean colored triangle runs on the order of a thousand lines[14], and why the payoff is control over CPU-side submission and multi-threaded command recording, not a higher frame rate for one triangle.
Beginners and many shipping indie engines use vk-bootstrap to collapse instance, device, and swapchain creation from ~400 lines to under 50[12], and VMA for memory allocation[13]. We write the longhand here so the machine is visible, then point at the shortcut. And if you want portability and safety over control, wgpu (Rust) is the higher-level layer that sits on top of exactly this[15].
Vulkan object lifetimes are the dependency order, so the order we build in is also the teardown order reversed. The first widget is that dependency graph; come back to it as each object appears.
02Instance & validation
The VkInstance is the per-application Vulkan connection. You give it an app info (with the API version you target), the instance extensions (the surface extensions came from the Platform & Window tutorial), and the validation layer.
Turn on the single meta-layer VK_LAYER_KHRONOS_validation (from the Vulkan SDK) in debug builds and off in release[1]. Wire a VK_EXT_debug_utils messenger so its messages reach your log. Two traps: validation is a separate layer that must be installed (no SDK means it's silently off), and to catch errors during instance creation itself you chain a bootstrap messenger via the instance's pNext. Also: apiVersion is what you target; it does not enable newer features (those are gated at device creation). And the surface is an instance extension while the swapchain is a device extension, a classic mix-up.
VkApplicationInfo appInfo{ VK_STRUCTURE_TYPE_APPLICATION_INFO };
appInfo.apiVersion = VK_API_VERSION_1_3; // targeted version, not a feature switch
const char* layers[] = { "VK_LAYER_KHRONOS_validation" }; // debug builds only
// extensions = surface extensions from the platform layer + VK_EXT_debug_utils
VkInstanceCreateInfo ci{ VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO };
ci.pApplicationInfo = &appInfo;
ci.enabledLayerCount = 1; ci.ppEnabledLayerNames = layers;
ci.enabledExtensionCount = (uint32_t)exts.size(); ci.ppEnabledExtensionNames = exts.data();
vkCreateInstance(&ci, nullptr, &instance);
// ash 0.38: ::default() + consuming setters (no more ::builder()).
let app_info = vk::ApplicationInfo::default()
.api_version(vk::make_api_version(0, 1, 3, 0)); // targeted version
let layers = [c"VK_LAYER_KHRONOS_validation".as_ptr()]; // debug only
let create_info = vk::InstanceCreateInfo::default()
.application_info(&app_info)
.enabled_layer_names(&layers)
.enabled_extension_names(&extension_ptrs); // surface exts + debug_utils
let instance = unsafe { entry.create_instance(&create_info, None)? };
03Devices & queues
Enumerate the physical devices, score them, and pick one that supports the swapchain extension. Then find the queue families you need: a graphics family (a capability bit) and a present family (a surface-specific query). Create the logical VkDevice from one queue-create-info per unique family index.
They are not guaranteed distinct and not guaranteed the same[1]. Store each as an optional index; if they differ, create two queues (and the swapchain uses concurrent sharing or an explicit ownership transfer). The official tutorial simplifies to "one family that does both"; a real engine handles both cases. Two more: dedupe the unique indices (passing the same index twice in the queue array is invalid), and a feature must be enabled, not just available, dynamicRendering is a field in VkPhysicalDeviceVulkan13Features (core in 1.3)[2], or the VK_KHR_dynamic_rendering extension before that.
// present support is a SEPARATE, surface-specific query:
VkBool32 presentSupport = VK_FALSE;
vkGetPhysicalDeviceSurfaceSupportKHR(phys, familyIndex, surface, &presentSupport);
std::set<uint32_t> unique = { graphicsFamily, presentFamily }; // dedupe!
std::vector<VkDeviceQueueCreateInfo> qcis;
float prio = 1.0f;
for (uint32_t f : unique) qcis.push_back({ VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, nullptr, 0, f, 1, &prio });
VkPhysicalDeviceVulkan13Features f13{ VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_3_FEATURES };
f13.dynamicRendering = VK_TRUE; // enable the feature, not just check it
const char* devExts[] = { VK_KHR_SWAPCHAIN_EXTENSION_NAME }; // DEVICE extension
VkDeviceCreateInfo dci{ VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, &f13 };
dci.queueCreateInfoCount = (uint32_t)qcis.size(); dci.pQueueCreateInfos = qcis.data();
dci.enabledExtensionCount = 1; dci.ppEnabledExtensionNames = devExts;
vkCreateDevice(phys, &dci, nullptr, &device);
vkGetDeviceQueue(device, graphicsFamily, 0, &graphicsQueue);
vkGetDeviceQueue(device, presentFamily, 0, &presentQueue);
let surface_loader = ash::khr::surface::Instance::new(&entry, &instance);
let present = unsafe { surface_loader.get_physical_device_surface_support(phys, i, surface)? };
use std::collections::HashSet;
let unique: HashSet<u32> = [graphics_family, present_family].into_iter().collect(); // dedupe
let qcis: Vec<_> = unique.iter().map(|&f| vk::DeviceQueueCreateInfo::default()
.queue_family_index(f).queue_priorities(&[1.0])).collect();
let mut f13 = vk::PhysicalDeviceVulkan13Features::default().dynamic_rendering(true);
let dev_exts = [ash::khr::swapchain::NAME.as_ptr()]; // DEVICE extension
let dci = vk::DeviceCreateInfo::default()
.queue_create_infos(&qcis).enabled_extension_names(&dev_exts).push_next(&mut f13);
let device = unsafe { instance.create_device(phys, &dci, None)? };
let graphics_queue = unsafe { device.get_device_queue(graphics_family, 0) };
04The swapchain
The (VK_KHR_swapchain) is the queue of presentable images tied to the surface. You choose a surface format (prefer B8G8R8A8_SRGB with the sRGB color space, but enumerate vkGetPhysicalDeviceSurfaceFormatsKHR and fall back, the pair is not guaranteed), a present mode, an image count, and an extent, then create one VkImageView per image to render into.
- FIFO is the only present mode guaranteed available (v-synced, no tearing). MAILBOX and IMMEDIATE are optional; query
vkGetPhysicalDeviceSurfacePresentModesKHRand fall back to FIFO[6]. - Extent has a special case. If
currentExtentis a concrete value you must use it; if it's the sentinel0xFFFFFFFF(Wayland), you pick the size, clamped tomin/maxImageExtent[5]. Use the framebuffer pixel size, not the logical window size. - Image count: request
minImageCount + 1, and clamp tomaxImageCountonly when it's nonzero (0 means no limit). The number of swapchain images is not the same as frames in flight (§9).
05Shaders → SPIR-V
Vulkan does not take GLSL. It consumes SPIR-V bytecode, which you precompile offline with glslc (from the SDK); HLSL and Slang are also valid SPIR-V sources[7]. For the hardcoded triangle there's no vertex buffer at all: the vertex shader indexes a constant array with the built-in gl_VertexIndex, and the draw is vkCmdDraw(cmd, 3, 1, 0, 0).
// triangle.vert, compiled with: glslc triangle.vert -o triangle.vert.spv
#version 450
layout(location = 0) out vec3 fragColor;
vec2 positions[3] = vec2[](vec2(0.0, -0.5), vec2(0.5, 0.5), vec2(-0.5, 0.5));
vec3 colors[3] = vec3[](vec3(1,0,0), vec3(0,1,0), vec3(0,0,1));
void main() {
gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0); // gl_VertexIndex, not gl_VertexID
fragColor = colors[gl_VertexIndex];
}
Read the .spv as uint32-aligned bytes (ash has ash::util::read_spv[3]) and wrap it in a VkShaderModule. Note: Vulkan's clip space is Y-down with a 0..1 depth range, so a triangle that looks right in OpenGL renders upside down here; for the hardcoded triangle we just pick positions that look right, and the camera tutorial handles the projection properly.
06The graphics pipeline
The is the concrete form of the "baked state object" from the GPU Pipeline tutorial: it freezes the shader stages and the fixed-function state (input assembly, rasterizer, multisample, color blend, the pipeline layout) into one object[8].
If you bake the viewport into the pipeline, every window resize forces a new pipeline. Instead declare VK_DYNAMIC_STATE_VIEWPORT and VK_DYNAMIC_STATE_SCISSOR and set them at record time[8]. Also: the VkPipelineLayout (the descriptor and push-constant interface, empty for this triangle) is a separate object from the pipeline; beginners conflate them.
07Dynamic rendering
We use (core in Vulkan 1.3), which removes VkRenderPass and VkFramebuffer entirely. At pipeline creation you chain a VkPipelineRenderingCreateInfo (with the color attachment format) into the pipeline's pNext and set renderPass = VK_NULL_HANDLE; at record time you bracket drawing with vkCmdBeginRendering / vkCmdEndRendering[9].
This is the number-one dynamic-rendering bug. The render pass used to transition the swapchain image automatically (UNDEFINED → COLOR_ATTACHMENT_OPTIMAL before drawing, → PRESENT_SRC_KHR after). Without a render pass, you insert those with a pipeline barrier each frame[9]. And the attachment format in VkPipelineRenderingCreateInfo must match the format you pass to vkCmdBeginRendering. Dynamic rendering is the desktop default, but render passes still matter on tile-based mobile GPUs and for subpass techniques, so it's not universally "better."
// No render pass: declare the attachment format the pipeline renders to.
let mut rendering = vk::PipelineRenderingCreateInfo::default()
.color_attachment_formats(&[swapchain_format]); // must match begin_rendering
let pipeline_info = vk::GraphicsPipelineCreateInfo::default()
.stages(&stages)
.dynamic_state(&dynamic_state) // VIEWPORT + SCISSOR
.layout(pipeline_layout)
.push_next(&mut rendering); // render_pass stays NULL
let pipeline = unsafe {
device.create_graphics_pipelines(vk::PipelineCache::null(), &[pipeline_info], None)
}.unwrap()[0];
08Command buffers
A VkCommandPool (created for the graphics queue family) allocates command buffers. Each frame you record: transition the image to color-attachment, vkCmdBeginRendering, bind the pipeline, set the dynamic viewport and scissor, vkCmdDraw(3, 1, 0, 0), vkCmdEndRendering, transition to present.
A command buffer can only be submitted to the queue family its pool was created for, that binding is fixed. And command pools are not internally synchronized: you can't record into buffers from one pool on two threads at once, so the standard pattern is one pool per thread and per frame-in-flight. The three triangle vertices come from gl_VertexIndex 0/1/2, not a bound buffer.
09The render loop & sync
Per frame: wait the in-flight fence, acquire the next swapchain image, record, submit, present. (The loop below is shown in Rust/ash; the C++ is the identical vk* calls in the same order.) The synchronization is the part everyone gets wrong, so be precise about the two primitives[4]:
- order GPU work, queue to queue. A binary semaphore is not waited on by the CPU.
- Fences are the CPU-GPU sync: the CPU calls
vkWaitForFencesto know the GPU finished.
Trying to vkWaitForFences on a semaphore, or expecting a binary semaphore to block the CPU, is the classic mistake: semaphores order GPU work, only fences (or timeline semaphores) are CPU-waitable[4]. The canonical pair is an image-available semaphore (signaled by acquire, waited by the submit) and a render-finished semaphore (signaled by the submit, waited by present, so present never shows a half-drawn frame). Frames in flight (usually 2) duplicates the command buffer and sync objects per frame so the CPU can record frame N+1 while the GPU works on N, and is not the same as the swapchain image count[10]. (One simplification here: a fully robust renderer ties the render-finished semaphore to the swapchain image, not the frame-in-flight, so it never re-signals a semaphore a prior present is still waiting on.)
unsafe {
device.wait_for_fences(&[in_flight_fence], true, u64::MAX)?; // CPU waits the FENCE
device.reset_fences(&[in_flight_fence])?;
let (image_index, _suboptimal) = swapchain_loader
.acquire_next_image(swapchain, u64::MAX, image_available, vk::Fence::null())?;
// ... record cmd: barrier→begin_rendering→bind→draw(3,1,0,0)→end_rendering→barrier ...
let submit = vk::SubmitInfo::default()
.wait_semaphores(&[image_available]) // GPU waits the SEMAPHORE
.wait_dst_stage_mask(&[vk::PipelineStageFlags::COLOR_ATTACHMENT_OUTPUT])
.command_buffers(&[cmd])
.signal_semaphores(&[render_finished]);
device.queue_submit(graphics_queue, &[submit], in_flight_fence)?; // fence signaled on GPU done
let present = vk::PresentInfoKHR::default()
.wait_semaphores(&[render_finished]) // present waits render-finished
.swapchains(&[swapchain]).image_indices(&[image_index]);
swapchain_loader.queue_present(present_queue, &present)?; // match OUT_OF_DATE → recreate
}
The widget animates the CPU and GPU lanes across frames in flight. Drop the count to 1 to watch the CPU stall on the fence every frame:
Wrong answers, and why: the CPU can only wait on a fence (not a binary semaphore or a barrier); and a missing present mode is a capability you must query, not a VRAM or validation issue.
10Resize & shutdown
When the window resizes, the swapchain goes stale and the acquire/present calls report it: VK_ERROR_OUT_OF_DATE_KHR means you must recreate; VK_SUBOPTIMAL_KHR still presents but should be recreated soon[11].
VK_SUBOPTIMAL_KHRis a success code (≥ 0); a blanketresult != VK_SUCCESScheck mishandles it.- Don't rely only on the error codes; some platforms don't report them, so also keep an explicit
framebufferResizedflag from the window callback[11]. - On minimize the extent is
(0,0); pause (wait for events) until it's nonzero before recreating. - Recreate after vkDeviceWaitIdle; destroy in reverse creation order. In Rust/ash, handles have no
Drop, you calldestroy_*yourself, in order. C++ Vulkan-Hpp RAII destroys automatically.
11Pitfalls
12What's next
There's a triangle on screen, and the entire scaffold (instance, device, swapchain, pipeline, sync) is reusable. Next, Textures & Materials adds images, samplers, and descriptor sets so the triangle can be textured, then the 2D Renderer turns this into a sprite batcher for the 2D-game capstone. The full path is on the series hub.
- Khronos. Vulkan Tutorial. docs.vulkan.org/tutorial. The canonical step-by-step: validation layers, physical-device/queue-family selection, and the explicit-API model.
- Victor Blanco. Vulkan Guide. vkguide.dev. Vulkan 1.3 with dynamic rendering, structured for an engine; logical-device feature enablement.
- ash (Rust). docs.rs/ash and the README. Version 0.38:
::default()+ consuming setters,push_next,ash::khr/ash::extloader modules. - The Khronos Group. Vulkan Specification, "Synchronization and Cache Control." docs.vulkan.org. Semaphores order GPU queue work (not CPU-waited); fences are the device-to-host sync.
- The Khronos Group. Vulkan Specification, WSI /
VkSurfaceCapabilitiesKHR. docs.vulkan.org. ThecurrentExtent == 0xFFFFFFFFsentinel and the min/max extent clamp. - The Khronos Group. Vulkan Specification,
VkPresentModeKHR. docs.vulkan.org. FIFO is the only present mode required to be supported; MAILBOX/IMMEDIATE are optional. - Khronos. Vulkan Tutorial, "Shader modules." docs.vulkan.org. Vulkan consumes SPIR-V bytecode (via glslc), not GLSL source; the hardcoded-triangle shader and
gl_VertexIndex. - Khronos. Vulkan Tutorial, "Fixed functions." docs.vulkan.org. The baked pipeline state and dynamic viewport/scissor so resize needn't recreate it.
- Lesley Lai. "Vulkan dynamic rendering." lesleylai.info.
VkPipelineRenderingCreateInfowiring and the manual image-layout barriers dynamic rendering requires. - Khronos. Vulkan Tutorial, "Frames in flight." docs.vulkan.org. Per-frame command buffers and sync objects; frames in flight is not the swapchain image count.
- Khronos. Vulkan Tutorial, "Swap chain recreation." docs.vulkan.org.
OUT_OF_DATEvsSUBOPTIMAL, the resize flag, and the minimize (0,0) case. - Charles Giessen / LunarG. vk-bootstrap. github.com/charles-lunarg/vk-bootstrap. Collapses instance/device/swapchain creation from hundreds of lines to dozens.
- AMD GPUOpen. Vulkan Memory Allocator. gpuopen.com. The de-facto allocator most non-AAA codebases use instead of hand-rolled
vkAllocateMemory. - Sascha Willems. Vulkan C++ examples. github.com/SaschaWillems/Vulkan. The canonical example corpus, including a 1.3 dynamic-rendering triangle.
- gfx-rs. wgpu. wgpu.rs. The safe, cross-backend (Vulkan/Metal/D3D12/WebGPU) Rust layer that sits on top of explicit APIs like this one.