Going 3D: Perspective, Depth & Meshes
The 2D renderer already has a device, a swapchain, a pipeline, and a draw loop. Going 3D is additive, not a rewrite: swap the orthographic projection for a perspective one, make the depth buffer mandatory, give vertices a Z and a normal, and feed a model matrix per object. We wire all of it into the existing Vulkan scaffold and put a glTF mesh on screen, in C++ and Rust.
01From 2D to 3D
Four concrete changes turn the sprite renderer into a 3D one. Nothing about the instance, device, swapchain, or synchronization changes; this is additive.
- Orthographic → perspective. The projection now produces a non-unit w, and the GPU's perspective divide gives foreshortening: distant things shrink.
- The becomes mandatory. In 2D, back-to-front draw order sufficed (the 2D renderer's painter's approach); in 3D, arbitrary overlap needs per-pixel depth (the depth test).
- Vertices gain Z and a normal. Position is a
vec3; a normal rides along for lighting later. - A model matrix per object. 2D pushed one view-projection; 3D needs an M per object, which forces the question of how to feed many matrices to the GPU.
02Perspective projection
The matrix maps a view frustum (defined by vertical FOV, aspect, near, and far) into clip space such that the later divide by w produces foreshortening. The 3D Math tutorial derives the matrix; here are the Vulkan-specific deltas that bite.
Three independent things get conflated. Vulkan's NDC is right-handed; its depth range is 0 to 1 (OpenGL is −1 to 1); and its framebuffer is Y-down. The Y-down mapping is separate from both the depth range and your world handedness[4]. So GLM_FORCE_DEPTH_ZERO_TO_ONE fixes the depth range but not Y, you still flip proj[1][1] *= -1 (or use a negative viewport height). In Rust, glam::Mat4::perspective_rh already targets 0 to 1; perspective_rh_gl is the −1 to 1 one, picking the wrong one silently breaks depth[3].
A plain [0,1] depth buffer wastes precision: the 1/z mapping bunches values near the camera, so far geometry z-fights. Reversed-Z (map near→1, far→0) with a float depth buffer nearly cancels that nonlinearity; Reed's conclusion is "in any perspective projection situation, just use a floating-point depth buffer with reversed-Z"[2]. It's the additional near↔far swap on top of the 0-to-1 range you already have, plus flipping the depth compare to GREATER and clearing depth to 0 (the GPU Pipeline tutorial's framing). It needs a float format; on a UNORM buffer the math doesn't help.
Drag the FOV, near, and far: the frustum reshapes and the projected objects foreshorten. Toggle to orthographic for parallel projection:
// define BEFORE including glm: GLM_FORCE_RADIANS, GLM_FORCE_DEPTH_ZERO_TO_ONE
glm::mat4 projection = glm::perspective(glm::radians(60.0f), aspect, 0.1f, 1000.0f);
projection[1][1] *= -1.0f; // flip Y for Vulkan's Y-down NDC (depth macro does NOT do this)
glm::mat4 view = glm::lookAt(cameraPos, cameraTarget, glm::vec3(0,1,0));
glm::mat4 clipFromModel = projection * view * model; // P * V * M, column vectors
// glam: perspective_rh already targets 0..1 depth (perspective_rh_gl is the -1..1 one)
let mut projection = Mat4::perspective_rh(60.0_f32.to_radians(), aspect, 0.1, 1000.0);
projection.y_axis.y *= -1.0; // flip Y for Vulkan's Y-down NDC
let view = Mat4::look_at_rh(camera_pos, camera_target, Vec3::Y);
let clip_from_model = projection * view * model; // P * V * M
03The camera & MVP
The view matrix is the inverse of the camera's world transform; lookAt(eye, center, up) builds it directly. Two cameras cover most needs: an orbit camera (spherical coordinates around a target) and a free-fly camera (position plus yaw/pitch, WASD and mouse).
The full transform, for column vectors, reads right to left: model, then view, then projection (the 3D Math convention).
Multiply the vertex by model first (place it in the world), then view (move it into the camera's frame), then projection (apply perspective). With column vectors the matrices sit on the left and apply right-to-left, so the chain is written in the reverse of the order it runs. P and V are the same for every vertex in a frame; M changes per object, which is the next problem. Hover any symbol to see what it stands for.
(Row-vector, row-major libraries reverse the order to position · M · V · P, scope to whichever convention your math library uses.)
Need a refresher on "read right to left"?
With column vectors (the convention glm and glam use), a point is a column and matrices stack on its left: P · V · M · position. Matrix multiplication is right-associative in effect here, so the matrix nearest the vector acts first. The vector meets M first, then V, then P, even though the line reads P-V-M left to right.
The flip is purely the convention. Row-vector libraries (DirectX's XMMATRIX) write the same transform as position · M · V · P, left to right in application order. Same result, mirror-image notation. The 3D Math tutorial derives both.
04The depth buffer
This is the densest delta over the color-only triangle. Four steps: pick a supported depth format, create the depth image and view, enable depth in the pipeline, and wire the depth attachment into dynamic rendering[1].
- Query the format with
vkGetPhysicalDeviceFormatProperties, checkingVK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BITagainst candidatesD32_SFLOAT,D32_SFLOAT_S8_UINT,D24_UNORM_S8_UINT. Don't assume a format is available. - Create the depth image + view (usage
DEPTH_STENCIL_ATTACHMENT, aspectDEPTH, device-local), and recreate it on swapchain resize. - Enable depth in the pipeline:
depthTestEnableanddepthWriteEnabletrue,depthCompareOp = LESS(orGREATER_OR_EQUALfor reversed-Z). - Wire dynamic rendering: set
depthAttachmentFormatat pipeline creation, and add a depthVkRenderingAttachmentInfowithloadOp = CLEAR[9].
The depth attachment needs loadOp = CLEAR with depth 1.0 (or 0.0 for reversed-Z); skip it and you read last frame's depth, garbage occlusion (objects flickering or vanishing). And the test needs both depthTestEnable (the comparison happens) and depthWriteEnable (passing fragments write their depth). Test-on/write-off is a real mode: it's how blended transparents occlude against opaques without occluding each other (cross-ref the 2D renderer's draw order).
Two 3D objects overlap. Toggle the depth test, uncheck the per-frame clear, and close the depth gap until the surfaces z-fight:
// pipeline creation: add the depth format alongside the color format (the delta over the triangle)
let mut rendering = vk::PipelineRenderingCreateInfo::default()
.color_attachment_formats(&[swapchain_format])
.depth_attachment_format(depth_format); // NEW
let depth_stencil = vk::PipelineDepthStencilStateCreateInfo::default()
.depth_test_enable(true).depth_write_enable(true)
.depth_compare_op(vk::CompareOp::LESS); // GREATER_OR_EQUAL for reversed-Z
// record time: a depth attachment beside the color one, cleared each frame
let depth_attachment = vk::RenderingAttachmentInfo::default()
.image_view(depth_image_view)
.image_layout(vk::ImageLayout::DEPTH_ATTACHMENT_OPTIMAL)
.load_op(vk::AttachmentLoadOp::CLEAR) // MUST clear, or garbage occlusion
.store_op(vk::AttachmentStoreOp::DONT_CARE)
.clear_value(vk::ClearValue { depth_stencil: vk::ClearDepthStencilValue { depth: 1.0, stencil: 0 } });
let rendering_info = vk::RenderingInfo::default()
.color_attachments(&color_attachments)
.depth_attachment(&depth_attachment);
05The 3D vertex
The vertex grows to position (vec3), normal (vec3), and UV (vec2). The vertex input description gains two attributes:
struct Vertex { glm::vec3 position; glm::vec3 normal; glm::vec2 uv; };
VkVertexInputBindingDescription binding{ 0, sizeof(Vertex), VK_VERTEX_INPUT_RATE_VERTEX };
VkVertexInputAttributeDescription attrs[3] = {
{ 0, 0, VK_FORMAT_R32G32B32_SFLOAT, offsetof(Vertex, position) },
{ 1, 0, VK_FORMAT_R32G32B32_SFLOAT, offsetof(Vertex, normal) },
{ 2, 0, VK_FORMAT_R32G32_SFLOAT, offsetof(Vertex, uv) },
};
#[repr(C)]
struct Vertex { position: [f32; 3], normal: [f32; 3], uv: [f32; 2] }
let binding = vk::VertexInputBindingDescription::default()
.binding(0).stride(size_of::<Vertex>() as u32).input_rate(vk::VertexInputRate::VERTEX);
let attrs = [
vk::VertexInputAttributeDescription::default().location(0).format(vk::Format::R32G32B32_SFLOAT).offset(0),
vk::VertexInputAttributeDescription::default().location(1).format(vk::Format::R32G32B32_SFLOAT).offset(12),
vk::VertexInputAttributeDescription::default().location(2).format(vk::Format::R32G32_SFLOAT).offset(24),
];
The normal needs the inverse-transpose of the model's upper-3×3 under non-uniform scale (the 3D Math tutorial covers why); the lighting that uses it lands in the next module, here the vertex just carries and transforms it.
06Loading a mesh
Prove the pipeline with a hardcoded cube (24 vertices for per-face normals, 36 indices in an index buffer), then load real geometry. The standard interchange format is glTF 2.0, "the JPEG of 3D": a runtime-ready, royalty-free format that shipping engines and viewers load[5].
A buffer is a blob of bytes (a .bin or the binary chunk of a .glb); a bufferView is an (offset, length, stride) slice of it; an accessor types that slice (component type, element type like VEC3, count). Positions, normals, and indices are each accessors[6]. glTF is right-handed, +Y up, +Z forward, meters, with counter-clockwise front faces (the winding the culler reads). Three traps: indices may be unsigned byte/short/int (handle the component type), attributes can be interleaved or sparse (respect the stride), and importing into a left-handed engine flips an axis and inverts winding. glTF is a delivery format, not an authoring one, artists export to it.
let (document, buffers, _images) = gltf::import("model.gltf")?; // resolves .bin / GLB
for mesh in document.meshes() {
for primitive in mesh.primitives() {
let reader = primitive.reader(|b| Some(&buffers[b.index()]));
let positions: Vec<[f32; 3]> = reader.read_positions().unwrap().collect();
let normals: Vec<[f32; 3]> = reader.read_normals().unwrap().collect();
let indices: Vec<u32> = reader.read_indices().unwrap().into_u32().collect();
// upload positions/normals/uvs into a vertex buffer + indices into an index buffer
}
}
// cgltf: single-file C99 loader
cgltf_options options = {0};
cgltf_data* data = NULL;
cgltf_parse_file(&options, "model.gltf", &data);
cgltf_load_buffers(&options, data, "model.gltf"); // reads .bin, decodes data URIs
// walk data->meshes[i].primitives[j].attributes[k].data (accessor) -> buffer_view -> buffer
cgltf_free(data);
07Per-object data
One object: push the model matrix (or premultiplied MVP) as a push constant, 64 bytes fits the 128-byte floor (the 2D renderer did this). Many objects need more room, and there's a ladder.
- UBO: one uniform buffer (std140 layout), bound by a descriptor.
- Dynamic UBO: one buffer, a per-object offset (
UNIFORM_BUFFER_DYNAMIC) that must be a multiple ofminUniformBufferOffsetAlignment[8]. - SSBO: a storage buffer (std430), indexed by
gl_InstanceIndexor a draw ID, what GPU-driven renderers use.
In std140 a vec3 aligns to 16 bytes (not 12), a mat4 is 64, and array/struct members round up to 16[8]. A C or Rust struct with a bare vec3 followed by a float won't match the shader's block unless you pad, the classic garbled-transform bug. Pad explicitly (or avoid vec3 in uniform blocks). And UBOs use std140, not std430 (that's SSBO/push-constant territory). For a dynamic UBO, query the offset alignment, don't hard-code it (it can be 256 on some hardware).
08Culling & winding
Back-face culling skips triangles facing away: cullMode = BACK plus a frontFace winding. glTF declares counter-clockwise front faces. But there's a gotcha that makes the model invisible or inside-out.
Flipping Y (either proj[1][1] *= -1 or a negative viewport height) reverses the triangle winding as seen on screen, so the faces you meant to keep get culled and the model renders inside-out or vanishes. The fix is to flip frontFace (CCW↔CW) to compensate[4]. So with the Y-flip and glTF's CCW data, you set frontFace = CLOCKWISE. This is not "Vulkan culls backwards", it's the Y-flip interaction, and it applies whichever Y-flip method you chose.
09The scene graph
Objects rarely live in world space directly; they hang off a hierarchy, the . Each node has a local transform; its world transform is computed by traversing parents before children. glTF's node tree maps straight onto this.
A node's place in the world is its parent's world transform composed with its own local one. Because parent.world is on the left, the parent must be computed first; walk the tree top-down (parents before children) and every node sees a finished parent. Flip the two factors and the child orbits the world origin instead of its parent, the classic bug the widget shows.
The optimization is dirty flags: only recompute a subtree's world transforms when something in it moved. The widget shows a parent driving its children, and the cost of recomputing everything versus just the dirty subtree:
Wrong answers, and why: Vulkan is right-handed (the symptoms are the Y-flip and the 0-to-1 range, not handedness or format); and an inside-out model is a winding/cull issue from the Y-flip, not a depth or index-width issue.
10Pitfalls
11What's next
There's a lit-ready mesh on screen with correct depth. Next is making it look good: PBR Materials & Lighting uses the normal we carried and the metallic-roughness textures to shade the surface physically, then shadows, then deferred rendering. The full 3D path is on the series hub.
- Khronos. Vulkan Tutorial, "Depth Buffering." docs.vulkan.org. The format query, the depth image/view, the depth-stencil state, and the dynamic-rendering depth attachment with clear.
- Nathan Reed. "Depth Precision Visualized." reedbeta.com. The 1/z precision problem and reversed-Z with a float depth buffer.
- glam (Rust). docs.rs/glam.
perspective_rh(0 to 1) vsperspective_rh_gl(−1 to 1), andlook_at_rh. (Cite by API; glam moves fast across minor versions.) - AnKi 3D Engine. "Vulkan's coordinate system." anki3d.org. Y-down NDC, 0-to-1 depth, the negative viewport height, and the
frontFaceinversion (the inside-out symptom). - The Khronos Group. glTF 2.0 Specification. registry.khronos.org. The runtime interchange format: right-handed, +Y up, meters, CCW front faces, and the buffer/bufferView/accessor model.
- The Khronos Group. glTF Tutorial, "Buffers, BufferViews, Accessors." github.khronos.org. The three-level data model, explained plainly.
- cgltf (github.com/jkuhlmann/cgltf, single-file C) and the gltf crate (docs.rs/gltf, Rust). The loaders:
cgltf_parse_file/cgltf_load_buffersandgltf::import+ the primitive reader. - The Khronos Group. Vulkan guide, "Shader Memory Layout" and "Descriptor Dynamic Offset." docs.vulkan.org. std140 alignment (vec3→16, mat4→64) and dynamic-UBO offset alignment.
- The Khronos Group. Vulkan Specification,
VkPipelineRenderingCreateInfo. registry.khronos.org. ThedepthAttachmentFormatfor dynamic rendering, which must match the bound depth view. - Sascha Willems. Vulkan glTF scene rendering and Vulkan-glTF-PBR. github.com/SaschaWillems. The production reference for a glTF-to-engine loader and per-object data.