Build a Game Engine · 3D Rendering

Post-Processing & Anti-Aliasing

The lighting pass wrote an HDR image; this is the back half of the frame that turns it into the final picture. A chain of full-screen passes (bloom, exposure, grading) ends at tone-map and present, and an anti-aliasing stage fights the jaggies. The two ideas worth getting right: bloom is a multi-scale pyramid, not a blur, and modern AA is temporal, with all the ghosting that implies.

Time~55 min LevelSenior PrereqsThe Deferred Rendering tutorial (full-screen passes, the HDR target, motion vectors) and PBR (HDR + tone mapping). StackGLSL · C++ & Rust

01The post chain

A post-processing chain is an ordered sequence of full-screen passes over the HDR render target the lighting pass produced. Each pass reads one texture and writes another, so you ping-pong between two HDR targets (A→B, then B→A): a pass never reads the image it's writing. The chain stays HDR/linear up to the tone-map, which is the HDR→LDR gate; a few effects (FXAA, LDR grading, grain, vignette) run after it in LDR.

Order matters, and you can't sample the target you're writing

The exact order varies by engine, but one rule is invariant: energy-linear effects (bloom, exposure) go before tone-map; display-space effects (FXAA, LDR grading) go after. Add bloom after tone-map and it's no longer light energy, it looks wrong. And in Vulkan you can't sample an image you're currently rendering into within one region without VK_KHR_dynamic_rendering_local_read (and that's tile-local); the portable pattern is a separate pass plus an image barrier (COLOR_ATTACHMENT → SHADER_READ_ONLY) before the next pass samples it.

02Bloom

is the glow around bright sources. Three stages: a bright-pass extracts highlights from the HDR image, a downsample/upsample pyramid spreads them across scales, and a composite adds the result back into the HDR image before tone-map. The teaching point: bloom is not a blur.

A single big Gaussian is wrong; use the pyramid, and tame fireflies

One large Gaussian on one mip is both expensive (a wide kernel) and visibly blocky and pulsating. The modern method (Jimenez, Call of Duty: Advanced Warfare; LearnOpenGL's "Physically Based Bloom") is a progressive downsample then progressive upsample with a small custom kernel per level, which reads like real light scatter and is temporally stable^[1]^[2]. And a naive HDR bright-pass lets one super-bright sub-pixel flicker (a firefly); fix it with a Karis average (weight each tap by 1/(1+luma)) on the first downsample only.

The widget shows the bright-pass, the mip pyramid, and the composite. Drag the threshold to make fireflies appear, and toggle the naive single-blur to see it go blocky:

The progressive downsample / tent upsample (GLSL)

// 13-tap downsample (36 texels via bilinear). e = center; a..m are the 13 tap positions.
vec3 down = e*0.125
          + (a+c+g+i)*0.03125
          + (b+d+f+h)*0.0625
          + (j+k+l+m)*0.125;          // weights sum to 1
// On the FIRST downsample only, Karis-average each 2x2 group to kill fireflies:
//   weight(c) = 1.0 / (1.0 + dot(c, vec3(0.2126,0.7152,0.0722)));

// 3x3 tent upsample, added back into the next-larger mip (additive).
vec3 up = (e*4.0 + (b+d+f+h)*2.0 + (a+c+g+i)) * (1.0/16.0);

03Exposure

(the HDR→LDR operator, Reinhard or ACES) was covered in PBR, and it's not a clamp^[3]. What's new here is , a linear pre-scale on scene radiance before the operator, and auto-exposure (eye adaptation): measure the scene's luminance, pick a target exposure, and ease toward it.

Use a log-average, and smooth it, or it pumps

Measure luminance with a log-average / geometric mean (a luminance histogram in a compute shader, or a log-luminance downsample), not an arithmetic mean, so a few bright pixels don't dominate^[4]. And the adaptation must be temporally smoothed (ease toward the target by 1 − exp(−dt · speed)), or the exposure pumps, flashing over-bright then correcting as the average jumps. Real eyes adapt asymmetrically (dark-adapt slow, light-adapt fast); model it if you want, but say it's a choice.

Walk the scene from dark to bright and watch the exposure adapt; turn smoothing off to see the pump:

04Color grading

Color grading applies a final artistic color transform, usually baked into a 3D LUT (a 32³ RGBA texture) sampled per pixel. The one decision: where in the chain it runs.

LUT placement changes the result

The classic pipeline grades in LDR after tone-map; modern engines (Unreal) grade in a log or linear working space before/at tone-map so the look is display-agnostic. Both ship; they give different results, and grading after tone-map can't recover clipped highlights while grading before can. State which one you're shipping rather than assuming there's one right place.

05The AA problem

Aliasing has three distinct causes, and they need different fixes^[6]:

Geometric edge aliasing: point-sampling triangle coverage at pixel centers makes stair-stepped edges.
Shading / specular aliasing: a high-frequency BRDF response (a sharp specular on a bumpy normal map) sampled once per pixel sparkles.
Temporal aliasing: sub-pixel features and slow motion make edges crawl and shimmer frame to frame.

Don't blame one cause

"Aliasing is caused by X" is wrong; these are separate problems. Geometric aliasing is fixed by coverage supersampling (MSAA); specular aliasing is a shading-frequency problem fixed by normal/roughness prefiltering (Toksvig, LEAN), not by MSAA; temporal crawl is fixed by accumulating over time (TAA). One technique rarely solves all three.

06MSAA

Multisample anti-aliasing tests coverage and depth at N sample positions per pixel, but runs the pixel shader once per pixel (per covered triangle) and writes that result to the covered samples; a resolve averages the samples down^[5]. So MSAA supersamples coverage at edges cheaply.

Geometry edges only, and expensive with deferred

Because shading is per-pixel, MSAA antialiases geometric edges only, not shader/specular aliasing, and not alpha-test cutouts (those need alpha-to-coverage). It's cheap-ish in forward (only edge pixels pay extra). But in deferred it needs a per-sample G-buffer (N× the fattest bandwidth in the pipeline) and per-sample lighting at edges, which is exactly why the deferred era pivoted to post and temporal AA (cross-ref the Deferred costs). MSAA is not SSAA: SSAA shades every sample (fixing shading aliasing too) at full cost.

07FXAA & SMAA

Post-process AA works on the rendered image instead of geometry. FXAA (Lottes, NVIDIA) is a single pass on the final LDR image (after tone-map and grading): convert to luma, detect edges from local luma contrast, and blur along the edge^[7]. SMAA (Jimenez et al.) is smarter morphological AA: contrast edge detection, pattern classification (straight/diagonal/corner), then a targeted blend, sharper than FXAA, with spatial and temporal tiers^[8].

Cheap, and lossy

FXAA has no geometry or temporal data, so it blurs texture detail (it can't tell a real edge from a high-contrast texture feature) and can't fix temporal crawl, it only smooths a static frame. It's a fine cheap fallback, not artifact-free. SMAA 1x is sharper but still spatial-only; its T2x tier adds temporal reprojection to catch crawl.

08TAA

is the modern default. Jitter the projection sub-pixel each frame (a different sample inside the pixel), then accumulate over time: reproject the previous frame via motion vectors and blend (an exponential moving average, ~5 to 10% current, the rest history)^[9]. Over several frames you get many sub-pixel samples, supersampling for one sample per frame.

The hard part is the history

Reprojected history is stale on motion and disocclusion, so you rectify it: clamp or clip the history color into the AABB of the current 3×3 neighborhood (often in YCoCg) before blending^[9]. Even so, TAA ghosts on disocclusion, transparency, and particles, and it softens the image (history resampling), which is why it's paired with a sharpening pass. Two correctness rules: the jitter must be a low-discrepancy sequence (Halton)^[10], and the jitter must be removed from the motion vectors or reconstruction blurs.

A TAA resolve (GLSL) and the Halton jitter (CPU)

vec3 current = texture(currentColor, uv).rgb;
vec2 histUv = uv - texture(motionVectors, uv).rg;   // reproject current -> previous
vec3 history = sampleCatmullRom(historyColor, histUv);  // sharper than bilinear

// Rectify: clamp history into the current 3x3 neighborhood AABB (ghost suppression).
vec3 lo = current, hi = current;
for (int y=-1;y<=1;++y) for (int x=-1;x<=1;++x) {
    vec3 s = textureOffset(currentColor, uv, ivec2(x,y)).rgb;   // YCoCg in production
    lo = min(lo, s); hi = max(hi, s);
}
history = clamp(history, lo, hi);                         // reject stale history
float alpha = 0.1;                                     // ~5-10% current
if (histUv != clamp(histUv, 0.0, 1.0)) alpha = 1.0;       // off-screen: no history
outColor = mix(history, current, alpha);

// Halton radical inverse: a low-discrepancy sub-pixel jitter sequence.
fn halton(mut i: u32, base: u32) -> f32 {
    let (mut r, inv) = (0.0_f32, 1.0 / base as f32);
    let mut f = inv;
    while i > 0 { r += (i % base) as f32 * f; i /= base; f *= inv; }
    r
}
let idx = frame_index % 8;
let jitter_x = halton(idx + 1, 2) - 0.5;   // base 2
let jitter_y = halton(idx + 1, 3) - 0.5;   // base 3
// add to clip: proj.z_axis.x += 2.0*jitter_x/width;  z_axis.y += 2.0*jitter_y/height;
// CRITICAL: build motion vectors WITHOUT this jitter, or reconstruction blurs.

The AA comparison runs on a moving edge so the temporal artifacts are visible. No-AA crawls, MSAA is clean but still sparkles, FXAA blurs, TAA trails:

09Upscaling & more

Temporal upscalers (DLSS, FSR2, XeSS, TAAU) are TAA that also upscales: render at lower resolution and reconstruct a higher-res image from jittered history plus motion vectors, the same machinery as §8^[11]^[12]. DLSS 2+ reconstructs with a neural network; FSR2 is analytic; both take color, depth, motion vectors, and exposure (FSR2 adds a reactive mask for alpha and particles).

Not free resolution

Temporal upscalers reconstruct, so they ghost, shimmer, and smear like TAA when the history is wrong, especially on transparency and disocclusion. They're a strong quality-per-cost win, not magic free pixels, and DLSS (ML) and FSR2 (analytic) are different algorithms, scope claims to each. The same jitter and motion-vector rules from TAA apply.

The rest of the post chain is garnish, mostly after tone-map: motion blur (gather along the per-pixel velocity, with silhouette artifacts at object edges), depth of field (a per-pixel circle of confusion from depth via the thin-lens model, a gather-based approximation), and vignette, chromatic aberration (per-channel UV offset), and film grain.

Wrong answers, and why: more MSAA samples can't fix specular (shading) aliasing; and TAA ghosting is fixed by history rectification (clamp + jitter-free motion vectors), not by leaning harder on history or dropping to FXAA.

10Pitfalls

Bloom is "a blur"A single big Gaussian is blocky. Use the progressive downsample/upsample pyramid.

Firefly flicker in bloomUnstable HDR bright-pass. Karis-average the first downsample.

Bloom after tone-mapIt's light energy; add it in HDR before the operator.

Exposure pumpingTemporally smooth auto-exposure; use a log-average, not arithmetic.

MSAA for specular sparkleMSAA is geometry edges only. Prefilter normals/roughness or use TAA.

Jitter left in motion vectorsBreaks TAA reconstruction. Remove the jitter from the velocity buffer.

More history to fix ghostingMakes it worse. Rectify history with a neighborhood clamp.

"Upscalers are free resolution"They reconstruct and can ghost/smear. A quality-per-cost win, not free.

11What's next

That completes the rendering track: from a triangle in Vulkan to a tone-mapped, anti-aliased, post-processed image. The series now turns from how things look to how they move: Skeletal Animation & Skinning, then gameplay, AI, networking, tooling, and the 3D-game capstone. The full path is on the series hub.

Jorge Jimenez. "Next Generation Post Processing in Call of Duty: Advanced Warfare." SIGGRAPH 2014. iryoku.com. The modern post chain and the progressive downsample/upsample bloom with the Karis average.
Joey de Vries. LearnOpenGL, "Physically Based Bloom." learnopengl.com. The 13-tap downsample and 3×3 tent upsample GLSL and the firefly fix.
Erik Reinhard et al. "Photographic Tone Reproduction for Digital Images." SIGGRAPH 2002. cs.utah.edu. The tone-map operator (recap; covered in PBR).
Krzysztof Narkowicz. "Automatic Exposure." knarkowicz.wordpress.com. Log-average luminance and temporally smoothed eye adaptation.
Matt Pettineo. "A Quick Overview of MSAA." therealmjp.github.io. Coverage at the sample rate, shading once per pixel, per-sample storage, the resolve.
Tomas Akenine-Möller, Eric Haines, Naty Hoffman, et al. Real-Time Rendering, 4th ed., ch. 5 & 12. realtimerendering.com. The antialiasing and image-space/post surveys.
Timothy Lottes. "FXAA" (NVIDIA whitepaper). developer.nvidia.com. Single-pass image-space luma-edge AA on the final LDR image.
Jorge Jimenez et al. "SMAA: Enhanced Subpixel Morphological Antialiasing." Eurographics 2012. iryoku.com/smaa. Morphological edge AA and the 1x/S2x/T2x/4x tiers.
Brian Karis. "High Quality Temporal Supersampling." SIGGRAPH 2014 (Unreal Engine 4). cloudfront.net. The canonical TAA: jitter, motion-vector reprojection, and neighborhood clamping.
Matt Pharr, Wenzel Jakob, Greg Humphreys. Physically Based Rendering, "The Halton Sampler." pbr-book.org. Why a low-discrepancy sequence beats grid or pseudo-random jitter.
AMD GPUOpen. FidelityFX Super Resolution 2 (FSR2). gpuopen.com. Temporal upscaling: Halton jitter, motion vectors without jitter, the reactive mask, and the inputs.
NVIDIA. "DLSS 2.0: A Big Leap in AI Rendering." developer.nvidia.com. DLSS 2+ as a temporal upscaler reconstructing from jittered low-res input, motion vectors, and depth.