All tutorials Mighty Professional
Build a Game Engine · Rendering

Your First Triangle in Vulkan

This is the big one. A single colored triangle in explicit Vulkan is on the order of a thousand lines, because you declare every piece yourself: the instance, the device and its queues, the swapchain, the pipeline, the command buffers, and the synchronization that keeps the CPU and GPU out of each other's way. We build it in C++ and Rust (ash), with dynamic rendering, and put the part everyone gets wrong (semaphores vs fences) front and center.

Time~75 min LevelSenior PrereqsThe GPU & Graphics Pipeline (the model this configures) and Platform & Window (it already made the window and surface). StackC++ (Vulkan) · Rust (ash)
◂ Build a Game Engine Phase 5 · Rendering Next · Textures & Materials ▸

01Why ~1000 lines

OpenGL hid the instance, the device, the framebuffer, and the synchronization behind global state and the driver. Vulkan is explicit by design: you declare all of it. That's why a from-scratch, validation-clean colored triangle runs on the order of a thousand lines[14], and why the payoff is control over CPU-side submission and multi-threaded command recording, not a higher frame rate for one triangle.

Be honest: most people use helpers

Beginners and many shipping indie engines use vk-bootstrap to collapse instance, device, and swapchain creation from ~400 lines to under 50[12], and VMA for memory allocation[13]. We write the longhand here so the machine is visible, then point at the shortcut. And if you want portability and safety over control, wgpu (Rust) is the higher-level layer that sits on top of exactly this[15].

Vulkan object lifetimes are the dependency order, so the order we build in is also the teardown order reversed. The first widget is that dependency graph; come back to it as each object appears.

Click an object to see what it needs and what it owns.
Click any object: its prerequisites (what must exist first) and the things it owns light up. Flip to teardown and the arrows reverse, the order you must destroy in. Destroy the device before its children, or the surface before the swapchain, and validation will tell you immediately.

02Instance & validation

The VkInstance is the per-application Vulkan connection. You give it an app info (with the API version you target), the instance extensions (the surface extensions came from the Platform & Window tutorial), and the validation layer.

Validation layers are how you learn Vulkan

Turn on the single meta-layer VK_LAYER_KHRONOS_validation (from the Vulkan SDK) in debug builds and off in release[1]. Wire a VK_EXT_debug_utils messenger so its messages reach your log. Two traps: validation is a separate layer that must be installed (no SDK means it's silently off), and to catch errors during instance creation itself you chain a bootstrap messenger via the instance's pNext. Also: apiVersion is what you target; it does not enable newer features (those are gated at device creation). And the surface is an instance extension while the swapchain is a device extension, a classic mix-up.

Create the instance with validation
VkApplicationInfo appInfo{ VK_STRUCTURE_TYPE_APPLICATION_INFO };
appInfo.apiVersion = VK_API_VERSION_1_3;                 // targeted version, not a feature switch

const char* layers[] = { "VK_LAYER_KHRONOS_validation" };   // debug builds only
// extensions = surface extensions from the platform layer + VK_EXT_debug_utils
VkInstanceCreateInfo ci{ VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO };
ci.pApplicationInfo = &appInfo;
ci.enabledLayerCount = 1; ci.ppEnabledLayerNames = layers;
ci.enabledExtensionCount = (uint32_t)exts.size(); ci.ppEnabledExtensionNames = exts.data();
vkCreateInstance(&ci, nullptr, &instance);
// ash 0.38: ::default() + consuming setters (no more ::builder()).
let app_info = vk::ApplicationInfo::default()
    .api_version(vk::make_api_version(0, 1, 3, 0));         // targeted version
let layers = [c"VK_LAYER_KHRONOS_validation".as_ptr()];      // debug only
let create_info = vk::InstanceCreateInfo::default()
    .application_info(&app_info)
    .enabled_layer_names(&layers)
    .enabled_extension_names(&extension_ptrs);                 // surface exts + debug_utils
let instance = unsafe { entry.create_instance(&create_info, None)? };

03Devices & queues

Enumerate the physical devices, score them, and pick one that supports the swapchain extension. Then find the queue families you need: a graphics family (a capability bit) and a present family (a surface-specific query). Create the logical VkDevice from one queue-create-info per unique family index.

Graphics and present may be the same family, or not

They are not guaranteed distinct and not guaranteed the same[1]. Store each as an optional index; if they differ, create two queues (and the swapchain uses concurrent sharing or an explicit ownership transfer). The official tutorial simplifies to "one family that does both"; a real engine handles both cases. Two more: dedupe the unique indices (passing the same index twice in the queue array is invalid), and a feature must be enabled, not just available, dynamicRendering is a field in VkPhysicalDeviceVulkan13Features (core in 1.3)[2], or the VK_KHR_dynamic_rendering extension before that.

Logical device, unique queues, dynamic-rendering feature
// present support is a SEPARATE, surface-specific query:
VkBool32 presentSupport = VK_FALSE;
vkGetPhysicalDeviceSurfaceSupportKHR(phys, familyIndex, surface, &presentSupport);

std::set<uint32_t> unique = { graphicsFamily, presentFamily };  // dedupe!
std::vector<VkDeviceQueueCreateInfo> qcis;
float prio = 1.0f;
for (uint32_t f : unique) qcis.push_back({ VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO, nullptr, 0, f, 1, &prio });

VkPhysicalDeviceVulkan13Features f13{ VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN_1_3_FEATURES };
f13.dynamicRendering = VK_TRUE;                          // enable the feature, not just check it
const char* devExts[] = { VK_KHR_SWAPCHAIN_EXTENSION_NAME };  // DEVICE extension
VkDeviceCreateInfo dci{ VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO, &f13 };
dci.queueCreateInfoCount = (uint32_t)qcis.size(); dci.pQueueCreateInfos = qcis.data();
dci.enabledExtensionCount = 1; dci.ppEnabledExtensionNames = devExts;
vkCreateDevice(phys, &dci, nullptr, &device);
vkGetDeviceQueue(device, graphicsFamily, 0, &graphicsQueue);
vkGetDeviceQueue(device, presentFamily, 0, &presentQueue);
let surface_loader = ash::khr::surface::Instance::new(&entry, &instance);
let present = unsafe { surface_loader.get_physical_device_surface_support(phys, i, surface)? };

use std::collections::HashSet;
let unique: HashSet<u32> = [graphics_family, present_family].into_iter().collect();  // dedupe
let qcis: Vec<_> = unique.iter().map(|&f| vk::DeviceQueueCreateInfo::default()
    .queue_family_index(f).queue_priorities(&[1.0])).collect();

let mut f13 = vk::PhysicalDeviceVulkan13Features::default().dynamic_rendering(true);
let dev_exts = [ash::khr::swapchain::NAME.as_ptr()];           // DEVICE extension
let dci = vk::DeviceCreateInfo::default()
    .queue_create_infos(&qcis).enabled_extension_names(&dev_exts).push_next(&mut f13);
let device = unsafe { instance.create_device(phys, &dci, None)? };
let graphics_queue = unsafe { device.get_device_queue(graphics_family, 0) };

04The swapchain

The (VK_KHR_swapchain) is the queue of presentable images tied to the surface. You choose a surface format (prefer B8G8R8A8_SRGB with the sRGB color space, but enumerate vkGetPhysicalDeviceSurfaceFormatsKHR and fall back, the pair is not guaranteed), a present mode, an image count, and an extent, then create one VkImageView per image to render into.

Three swapchain traps

05Shaders → SPIR-V

Vulkan does not take GLSL. It consumes SPIR-V bytecode, which you precompile offline with glslc (from the SDK); HLSL and Slang are also valid SPIR-V sources[7]. For the hardcoded triangle there's no vertex buffer at all: the vertex shader indexes a constant array with the built-in gl_VertexIndex, and the draw is vkCmdDraw(cmd, 3, 1, 0, 0).

The hardcoded-triangle vertex shader (GLSL → SPIR-V)
// triangle.vert, compiled with: glslc triangle.vert -o triangle.vert.spv
#version 450
layout(location = 0) out vec3 fragColor;
vec2 positions[3] = vec2[](vec2(0.0, -0.5), vec2(0.5, 0.5), vec2(-0.5, 0.5));
vec3 colors[3]    = vec3[](vec3(1,0,0), vec3(0,1,0), vec3(0,0,1));
void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);  // gl_VertexIndex, not gl_VertexID
    fragColor = colors[gl_VertexIndex];
}

Read the .spv as uint32-aligned bytes (ash has ash::util::read_spv[3]) and wrap it in a VkShaderModule. Note: Vulkan's clip space is Y-down with a 0..1 depth range, so a triangle that looks right in OpenGL renders upside down here; for the hardcoded triangle we just pick positions that look right, and the camera tutorial handles the projection properly.

06The graphics pipeline

The is the concrete form of the "baked state object" from the GPU Pipeline tutorial: it freezes the shader stages and the fixed-function state (input assembly, rasterizer, multisample, color blend, the pipeline layout) into one object[8].

Make viewport and scissor dynamic so a resize doesn't recreate the pipeline

If you bake the viewport into the pipeline, every window resize forces a new pipeline. Instead declare VK_DYNAMIC_STATE_VIEWPORT and VK_DYNAMIC_STATE_SCISSOR and set them at record time[8]. Also: the VkPipelineLayout (the descriptor and push-constant interface, empty for this triangle) is a separate object from the pipeline; beginners conflate them.

07Dynamic rendering

We use (core in Vulkan 1.3), which removes VkRenderPass and VkFramebuffer entirely. At pipeline creation you chain a VkPipelineRenderingCreateInfo (with the color attachment format) into the pipeline's pNext and set renderPass = VK_NULL_HANDLE; at record time you bracket drawing with vkCmdBeginRendering / vkCmdEndRendering[9].

Dynamic rendering hands you the layout transitions

This is the number-one dynamic-rendering bug. The render pass used to transition the swapchain image automatically (UNDEFINED → COLOR_ATTACHMENT_OPTIMAL before drawing, → PRESENT_SRC_KHR after). Without a render pass, you insert those with a pipeline barrier each frame[9]. And the attachment format in VkPipelineRenderingCreateInfo must match the format you pass to vkCmdBeginRendering. Dynamic rendering is the desktop default, but render passes still matter on tile-based mobile GPUs and for subpass techniques, so it's not universally "better."

Wire dynamic rendering into the pipeline (Rust · ash)
// No render pass: declare the attachment format the pipeline renders to.
let mut rendering = vk::PipelineRenderingCreateInfo::default()
    .color_attachment_formats(&[swapchain_format]);          // must match begin_rendering
let pipeline_info = vk::GraphicsPipelineCreateInfo::default()
    .stages(&stages)
    .dynamic_state(&dynamic_state)                         // VIEWPORT + SCISSOR
    .layout(pipeline_layout)
    .push_next(&mut rendering);                            // render_pass stays NULL
let pipeline = unsafe {
    device.create_graphics_pipelines(vk::PipelineCache::null(), &[pipeline_info], None)
}.unwrap()[0];

08Command buffers

A VkCommandPool (created for the graphics queue family) allocates command buffers. Each frame you record: transition the image to color-attachment, vkCmdBeginRendering, bind the pipeline, set the dynamic viewport and scissor, vkCmdDraw(3, 1, 0, 0), vkCmdEndRendering, transition to present.

Two command-buffer facts

A command buffer can only be submitted to the queue family its pool was created for, that binding is fixed. And command pools are not internally synchronized: you can't record into buffers from one pool on two threads at once, so the standard pattern is one pool per thread and per frame-in-flight. The three triangle vertices come from gl_VertexIndex 0/1/2, not a bound buffer.

09The render loop & sync

Per frame: wait the in-flight fence, acquire the next swapchain image, record, submit, present. (The loop below is shown in Rust/ash; the C++ is the identical vk* calls in the same order.) The synchronization is the part everyone gets wrong, so be precise about the two primitives[4]:

The #1 confusion, and frames in flight

Trying to vkWaitForFences on a semaphore, or expecting a binary semaphore to block the CPU, is the classic mistake: semaphores order GPU work, only fences (or timeline semaphores) are CPU-waitable[4]. The canonical pair is an image-available semaphore (signaled by acquire, waited by the submit) and a render-finished semaphore (signaled by the submit, waited by present, so present never shows a half-drawn frame). Frames in flight (usually 2) duplicates the command buffer and sync objects per frame so the CPU can record frame N+1 while the GPU works on N, and is not the same as the swapchain image count[10]. (One simplification here: a fully robust renderer ties the render-finished semaphore to the swapchain image, not the frame-in-flight, so it never re-signals a semaphore a prior present is still waiting on.)

The per-frame draw, submit, present (Rust · ash)
unsafe {
    device.wait_for_fences(&[in_flight_fence], true, u64::MAX)?;   // CPU waits the FENCE
    device.reset_fences(&[in_flight_fence])?;

    let (image_index, _suboptimal) = swapchain_loader
        .acquire_next_image(swapchain, u64::MAX, image_available, vk::Fence::null())?;
    // ... record cmd: barrier→begin_rendering→bind→draw(3,1,0,0)→end_rendering→barrier ...

    let submit = vk::SubmitInfo::default()
        .wait_semaphores(&[image_available])                       // GPU waits the SEMAPHORE
        .wait_dst_stage_mask(&[vk::PipelineStageFlags::COLOR_ATTACHMENT_OUTPUT])
        .command_buffers(&[cmd])
        .signal_semaphores(&[render_finished]);
    device.queue_submit(graphics_queue, &[submit], in_flight_fence)?;  // fence signaled on GPU done

    let present = vk::PresentInfoKHR::default()
        .wait_semaphores(&[render_finished])                        // present waits render-finished
        .swapchains(&[swapchain]).image_indices(&[image_index]);
    swapchain_loader.queue_present(present_queue, &present)?;          // match OUT_OF_DATE → recreate
}

The widget animates the CPU and GPU lanes across frames in flight. Drop the count to 1 to watch the CPU stall on the fence every frame:

Only the fence touches the CPU lane; the semaphores gate GPU work and present. With one frame in flight the CPU blocks on the fence each frame (no overlap, lower throughput). Raise it to 2 or 3 and the CPU records ahead. Drop the acquire wait and the GPU starts before the image is ready, the tear/validation-error class of bug.
FIFO queues frames and blocks the CPU when full, v-synced, no tearing, and the only mode guaranteed to exist. MAILBOX replaces the waiting image with the newest one for lower latency, at the cost of discarded (wasted) frames. IMMEDIATE presents instantly and tears. The others must be queried; always fall back to FIFO.

Wrong answers, and why: the CPU can only wait on a fence (not a binary semaphore or a barrier); and a missing present mode is a capability you must query, not a VRAM or validation issue.

10Resize & shutdown

When the window resizes, the swapchain goes stale and the acquire/present calls report it: VK_ERROR_OUT_OF_DATE_KHR means you must recreate; VK_SUBOPTIMAL_KHR still presents but should be recreated soon[11].

Resize and shutdown traps

11Pitfalls

Waiting on a semaphore from the CPUSemaphores order GPU work; the CPU waits a fence. The #1 sync mistake.
MAILBOX assumedOnly FIFO is guaranteed; query present modes and fall back.
Dynamic rendering, blank screenYou skipped the manual UNDEFINED→COLOR→PRESENT layout barriers the render pass used to do.
Swapchain ext in the instance listSwapchain is a device extension; surface is the instance one.
Same queue index twiceDedupe graphics/present family indices in the queue-create array.
Pipeline recreated on resizeMake viewport/scissor dynamic so a resize doesn't need a new pipeline.
frames-in-flight == image countDifferent things; in-flight is CB+sync duplication (usually 2), not image count.
Validation errors at exitDestroy in reverse order after vkDeviceWaitIdle; ash handles need manual destroy.

12What's next

There's a triangle on screen, and the entire scaffold (instance, device, swapchain, pipeline, sync) is reusable. Next, Textures & Materials adds images, samplers, and descriptor sets so the triangle can be textured, then the 2D Renderer turns this into a sprite batcher for the 2D-game capstone. The full path is on the series hub.

  1. Khronos. Vulkan Tutorial. docs.vulkan.org/tutorial. The canonical step-by-step: validation layers, physical-device/queue-family selection, and the explicit-API model.
  2. Victor Blanco. Vulkan Guide. vkguide.dev. Vulkan 1.3 with dynamic rendering, structured for an engine; logical-device feature enablement.
  3. ash (Rust). docs.rs/ash and the README. Version 0.38: ::default() + consuming setters, push_next, ash::khr/ash::ext loader modules.
  4. The Khronos Group. Vulkan Specification, "Synchronization and Cache Control." docs.vulkan.org. Semaphores order GPU queue work (not CPU-waited); fences are the device-to-host sync.
  5. The Khronos Group. Vulkan Specification, WSI / VkSurfaceCapabilitiesKHR. docs.vulkan.org. The currentExtent == 0xFFFFFFFF sentinel and the min/max extent clamp.
  6. The Khronos Group. Vulkan Specification, VkPresentModeKHR. docs.vulkan.org. FIFO is the only present mode required to be supported; MAILBOX/IMMEDIATE are optional.
  7. Khronos. Vulkan Tutorial, "Shader modules." docs.vulkan.org. Vulkan consumes SPIR-V bytecode (via glslc), not GLSL source; the hardcoded-triangle shader and gl_VertexIndex.
  8. Khronos. Vulkan Tutorial, "Fixed functions." docs.vulkan.org. The baked pipeline state and dynamic viewport/scissor so resize needn't recreate it.
  9. Lesley Lai. "Vulkan dynamic rendering." lesleylai.info. VkPipelineRenderingCreateInfo wiring and the manual image-layout barriers dynamic rendering requires.
  10. Khronos. Vulkan Tutorial, "Frames in flight." docs.vulkan.org. Per-frame command buffers and sync objects; frames in flight is not the swapchain image count.
  11. Khronos. Vulkan Tutorial, "Swap chain recreation." docs.vulkan.org. OUT_OF_DATE vs SUBOPTIMAL, the resize flag, and the minimize (0,0) case.
  12. Charles Giessen / LunarG. vk-bootstrap. github.com/charles-lunarg/vk-bootstrap. Collapses instance/device/swapchain creation from hundreds of lines to dozens.
  13. AMD GPUOpen. Vulkan Memory Allocator. gpuopen.com. The de-facto allocator most non-AAA codebases use instead of hand-rolled vkAllocateMemory.
  14. Sascha Willems. Vulkan C++ examples. github.com/SaschaWillems/Vulkan. The canonical example corpus, including a 1.3 dynamic-rendering triangle.
  15. gfx-rs. wgpu. wgpu.rs. The safe, cross-backend (Vulkan/Metal/D3D12/WebGPU) Rust layer that sits on top of explicit APIs like this one.

See also