Tutorial 04 · Engine Programming

Vertex Animation
Textures

Bake a simulation into pixels. Sample the pixels from a vertex shader. Render the same animation on ten thousand mesh instances at zero CPU cost. That trick is called Vertex Animation Textures, and it's the reason crowds, swarms, shattering glass, and rolling fluid surfaces ship cheaply in modern engines. We start with one rabbit and one texture, work up to four production VAT modes (soft body, rigid body, fluid, sprite), and live-decode the texture in your browser.

Time~45 min LevelJunior to mid graphics / tech-artist PrereqsYou can read HLSL or GLSL. You've heard of UVs and vertex shaders. HardwareKnowledge that the GPU has a vertex shader and a texture sampler.

01Why a VAT

A skeletal mesh is the standard way to animate a character. Each frame, the CPU evaluates a bone hierarchy (typically 80 to 200 bones, sometimes more), produces a palette of skinning matrices, hands the palette to the GPU, and the vertex shader skins each vertex by blending the bones it's weighted to. That works, has worked since 1998, and is how most player characters ship. It also costs around 50 to 200 microseconds of CPU per character per frame even before IK, foot placement, or animation blending get involved, and it forces a draw call per character because the bone palette is per-instance state. Try to put a thousand characters on screen and the CPU is the bottleneck before the GPU even wakes up.

A sidesteps the bone palette entirely. Bake the animation once (every vertex position at every frame) into the pixels of a texture. Ship the mesh as a static mesh. Replay the animation in the vertex shader by sampling the texture at (vertex_id, current_frame) and using the sampled value as the vertex position. There is no skeleton at runtime, no matrix palette, no per-instance state worth speaking of. The GPU does all the work; the CPU pushes one draw call for the whole crowd.

What you'll have by the end

A working VAT shader in HLSL that loads soft-body, rigid-body, fluid, and sprite animations from baked textures, with browser-side decoders that draw the same animations from the same encoding. By the end you'll know why Houdini's Labs ROP outputs three textures plus a mesh, why The Matrix Awakens's pedestrians cost roughly nothing past a hundred metres, why an 8K position texture is a hard ceiling on vertex count × frame count, and why the rotation channel of a rigid-body VAT is a quaternion instead of a matrix.

Three jobs VAT does well

Crowds. The driving use case. Ten thousand pedestrians in The Matrix Awakens are vertex-animated static meshes; only the few dozen near the camera fall back to true skeletal meshes^[1]. The City Sample's AnimToTexture plugin ships as the official UE5.1+ workflow for exactly this.
Baked simulations. A Houdini cloth, particle, or rigid-body sim has no rig to import. The mesh changes per frame in ways no skeleton can describe. Bake the frames into pixels and the game engine can replay the sim without knowing it was ever a sim. Half-Life: Alyx's Workshop Tools document a direct Houdini-to-Source VAT pipeline^[2]; Wildlife Studios published a teardown of using VAT on mobile to ship 4,900 animated soldiers per frame on a 2017 phone where the skeletal version managed 441^[3].
Effects that aren't rigs. Shatters, splashes, foliage gusts, flock-of-birds Niagara emitters. Anything where the motion is a recording, not a rule. SideFX's Labs VAT 3.0 node^[4] is the de-facto reference encoder; OpenVAT^[5] is the Blender-native equivalent, shipped in 2025.

The widget below races the CPU cost of skeletal animation against VAT as character count scales. Drag the slider; the skeletal curve is roughly linear in characters because the CPU has to evaluate one rig per character, while the VAT curve is flat because the work doesn't touch the CPU:

crowd size 2000

bones per rig 120

skeletal frame cost

···

VAT frame cost

···

speedup

···

02A short history of putting animation in a texture

The pattern of using a texture as a database that the vertex shader can read is older than the name "VAT". A quick tour, because the current shape only makes sense in context:

2007

Bryan Dudash, "Animated Crowd Rendering," NVIDIA SDK 10.^[6] The earliest published version of the idea. A DirectX 10 demo packs every bone matrix of every frame of every animation into one big texture, samples it in the vertex shader using SV_InstanceID, and instances 9,547 characters at 34 fps on a GeForce 8800. Republished as GPU Gems 3, Chapter 2^[7]. The vertex shader is reading the animation out of a texture; it's still skeletal under the hood, but the texture-as-animation-database shape is here.

2008

Crysis vegetation.^[8] Tiago Sousa's GPU Gems 3 chapter on Crysis vegetation describes procedural wind animation done entirely in the vertex shader, with per-vertex bend parameters baked into vertex colors. Not a texture, but the same governing idea: the per-frame deformation lives in shipped vertex data, and the vertex shader animates it.

2015

Unreal Engine ships Pivot Painter 2.^[9] Epic's MAXScript bakes per-leaf pivot points, axis vectors, and bounds sizes into texture channels. The shader reads those and rotates each leaf around its own pivot in response to wind. It's a vertex animation, baked offline, replayed by a vertex shader, and Epic's docs spell out that the system was designed to compose with "Vertex Animation tools," meaning the technique below.

2017

Houdini Labs Vertex Animation Textures 1.0. Luiz Kruel and the SideFX Labs team ship the first version of the now-canonical VAT export ROP. The node bakes a Houdini sim to a static mesh plus a position texture and a normal texture, with companion shader graphs for Unreal and Unity. The pattern crystallizes: X axis is vertex ID, Y axis is frame number, RGB is normalized position.

2019

Labs VAT 2.0.^[10] Adds RBD mode (rigid-body dynamics, with per-piece quaternion rotation), a fluid mode for topology-changing meshes, and a sprite mode for points. Half-Life: Alyx ships with Source 2 VAT shaders in its Workshop Tools^[2].

2021

Unreal Engine 5 City Sample.^[1] Epic publishes the demo project that demonstrates ten thousand pedestrians and a thousand cars at playable framerates. The crowds are vertex-animated static meshes; the underlying baker is the AnimToTexture plugin, released alongside the sample and folded into UE 5.1 the next year.

2022

Labs VAT 3.0.^[4] The current SideFX node. Replaces the deprecated normal texture with a rotation texture that carries normal + tangent, supports tangent-space normal maps correctly for the first time, adds advanced frame interpolation modes (cubic, smoothstep) and up to 9 channels of custom data. The version most production VAT pipelines run on today.

2025

OpenVAT 1.0.^[5] A free, MIT-licensed Blender add-on that bakes VAT directly from Blender's dependency graph (anything that visually deforms in the viewport, including geometry nodes, modifiers, shape keys, and simulations). Ships 16-bit EXR and PNG output, plus engine-side decoders for Unity, Unreal, and Godot. The first widely-adopted open alternative to the proprietary Houdini pipeline.

2026

VAT as a peer technique to mesh-shader and compute-skinning paths. Modern engines layer three approaches: traditional skeletal (CPU palette + GPU skinning) for hero characters; compute-shader skinning (a compute pass writes a per-frame vertex buffer) for cases that need IK or runtime modification; VAT for crowds and baked sims. Each has a clear niche; the era of "one animation system rules them all" is over.

The recurring pattern: the texture-as-database idea is from 2007, the canonical VAT shape (vertex on X, frame on Y) from 2017, and the four-mode taxonomy (soft / rigid / fluid / sprite) from 2019. By 2026, the pattern is integrated into the major engines as a peer technique rather than a workaround, and the open tools have caught up with the proprietary ones.

03The core idea: a texture is just a 2D array

Forget for a moment that textures are usually pictures. To the GPU, a texture is a 2D array of small numeric tuples (RGB or RGBA). The sampler hardware does nothing more than: given (u, v) coordinates, return the tuple at that location. There's no rule that the tuple has to be a color. It can be a position, a normal, a quaternion, or anything else four floats can carry.

The whole VAT trick is in choosing what to store and where:

Each pixel = one vertex at one moment in time. Picture a grid: the X axis is the index of the vertex (0 to N-1), the Y axis is the frame of the animation (0 to F-1). The pixel at (i, j) contains the position of vertex i at frame j.
The mesh becomes a static mesh. The triangles never change; only the positions of their corners do. You ship the mesh in some neutral pose (often the first frame of the animation, but any pose works) and let the shader move its vertices around.
The vertex shader reads its own position out of the texture. A vertex needs to know which vertex it is to know where to sample. The traditional answer is to encode the vertex ID into a UV channel of the mesh; the modern answer is to use the built-in SV_VertexID system value. Either way, the shader has the row coordinate. The frame coordinate comes from the engine, usually (time × framerate) mod frame_count.

That's the entire idea. The rest of this tutorial is about details: how to encode positions so the precision is acceptable, how to handle normals and tangents, what to do when topology changes per frame, how to compress per-piece rigid rotations into a quaternion, and how to keep the textures small enough to ship.

What's SV_VertexID and why does the shader need to know which vertex it is?

When the GPU rasterizes a mesh, the vertex shader runs once per vertex. The shader sees the per-vertex inputs you bound (position, normal, UVs, etc.), but it has no idea which vertex it's currently shading unless you tell it.

SV_VertexID is a system-generated integer in HLSL (GLSL spells it gl_VertexID) that gives every vertex shader invocation a unique index, starting at 0. It's a one-line way to ask "which vertex am I?" without baking the answer into the mesh.

The catch: SV_VertexID reflects the index in the index buffer, not the unique mesh vertex. If a vertex is shared by three triangles, it shows up three times in the index buffer and gets three different IDs. For VAT you usually want the mesh vertex ID, identical across triangles that share a vertex. The portable answer is to bake the desired ID into a UV channel (typically UV1 or UV2) so it survives the index buffer. SideFX Labs VAT does exactly this: it writes the normalized vertex index into UV1 of the exported mesh^[4].

04The texture layout, in detail

A VAT texture is a 2D array of vertex states, arranged with a specific convention almost every implementation agrees on. The convention exists because the GPU's bilinear filter happens to do something useful when the texture is laid out one way and something useless when it's laid out the other way.

Columns are vertices. A texture's X axis runs from 0 to (width − 1). Each column holds one vertex's data across every frame.
Rows are frames. Each row holds one frame of the animation, with every vertex's data in order along the row. A 64-frame animation produces a texture 64 pixels tall (or 64 × frame stride if you reserve multiple rows per frame for extra channels).
The mesh carries a normalized vertex ID in a spare UV channel. UV0 is reserved for the material's diffuse mapping, so VAT puts its lookup coordinate in UV1 (sometimes UV2 if UV1 is taken by lightmaps). The X coordinate is the vertex index, normalized to [0, 1] with a half-pixel offset to land on pixel centers.

eq. 1 \cdot vertex index \to texture U coordinate u = (vertex_id + 0.5) \div vertex_count

The + 0.5 is essential. Without it, sampling at u = i / N lands on the seam between pixels i − 1 and i; bilinear filtering averages two unrelated vertices and the mesh stretches between neighbors. With it, the sample lands on the center of pixel i, the filter has no neighbor to blend with, and you get exact values.

For the frame coordinate, the engine drives time. The V coordinate is:

eq. 2 \cdot time \to texture V coordinate v = (t \times framerate + 0.5) \div frame_count

Same half-pixel offset, for the same reason. The fractional part of t × framerate is what gives the bilinear filter useful work: it blends between frame j and frame j+1, so the animation interpolates smoothly even though it's only sampled at the bake rate.

Step through the grid in the widget. Each square is one pixel; the texture is a literal database keyed on (vertex, frame). Hover anywhere to see which vertex and which frame that pixel represents:

vertex count 16

frame count 24

play head (v) 0.50

pixels stored

···

at this v, frame

···

Mipmaps are not your friend

Position data has no spatial meaning between adjacent columns: vertex 5 and vertex 6 are not "neighbors" in any geometric sense, just neighbors in the index buffer. A mipmapped texture would average them and produce nonsense. The same logic applies vertically across non-adjacent frames. VAT textures must be authored and bound with mipmaps disabled, with point filtering on X (or careful bilinear with the half-pixel offset) and bilinear on Y if you want frame blending^[4].

05Encoding positions: the bounding-box trick

A vertex position is three floats in world (or object) space. They can be anything: (123.4, −5.6, 7.8), (0.001, 1234.5, 0.0), anything. Texture channels traditionally are not arbitrary floats. The most common pixel format, R8G8B8A8_UNORM, only holds values in [0, 1] at 8-bit precision. Even R16G16B16A16_FLOAT doesn't natively understand the units of your world.

The fix is the bounding-box remap. Compute the min and max of every vertex's position over every frame of the animation. Now every position fits in a known cube. Remap each component into [0, 1] by lerping against that cube, write the result to the texture, ship the min and max as shader constants, and decode on the way out:

eq. 3 \cdot the encode pixel = (position - bbox_min) \div (bbox_max - bbox_min)

Per-component subtract and divide. The output is in [0, 1] by construction, which is exactly what the texture format wants.

eq. 4 \cdot the decode (run in the vertex shader) position = pixel \times (bbox_max - bbox_min) + bbox_min

Two constants per axis is six floats total; modern engines bundle them as two float4 uniforms. The decode is one multiply-add per vertex. The cost is essentially free.

How much precision do you actually get?

A bounding box of side length L mapped onto an 8-bit channel gives L / 256 ≈ L / 256 units per code. For a character standing 2 m tall, that's 7.8 mm per quantization step in each axis, enough to be visible as a per-frame "swimming" or "popping" of vertices, especially during slow animations where the eye has time to register the steps^[11]. 16-bit half-float gives roughly 65,536 distinguishable values across the same range, a 256× improvement; the swimming disappears.

The three common storage choices, with their tradeoffs:

Format	Bits/channel	Precision in a 2 m box	Bytes/vertex/frame	Notes
R8G8B8A8_UNORM	8	~7.8 mm	4	Smallest, visibly quantized. Acceptable for large props at a distance.
Split 8-bit (two textures)	16 (two 8-bit lerps)	~30 µm	8	Two RGB8 textures storing high and low bytes per axis; reassembled in the shader. Works on hardware without HDR sampling.
R16G16B16A16_FLOAT	16	~30 µm	8	The modern default. EXR on disk, half-float in VRAM. Houdini Labs VAT 3.0's recommended output^[4].
R32G32B32A32_FLOAT	32	well below microns	16	Overkill for almost everything. Useful only for rigid-body pivots where the precision is multiplicative downstream.
BC6H	~8 effective	~10 mm (lossy)	1	HDR block-compressed format. 8× smaller than half-float, lossy, visible on slow deformations. Worth it for crowds; not worth it for hero meshes.

The split-8-bit format deserves a closer look because it shows up in mobile and WebGL contexts where HDR samplers aren't reliable. The trick is to take a 16-bit value and split it across two channels:

eq. 5 \cdot split-byte encode value \to (high, low)

Decode is value = high × 256 + low. Wildlife Studios' implementation uses two RGB8 textures and packs X-high, X-low, Y-high into the first, Y-low, Z-high, Z-low into the second^[3]. Slightly fiddly, but it works on every device.

The widget shows the encoding live. Pick a position with the sliders; the bounding-box remap, the 8-bit-quantized pixel, and the 16-bit-quantized pixel are all displayed. Notice how the 8-bit version snaps to discrete steps:

position X 0.50

position Y 0.30

encoded pixel

···

decoded position

···

round-trip error

···

Why include every frame in the bounding box, not just the rest pose?

The bounding box has to enclose every position the animation ever produces, including extreme deformations. If you measure the box from the rest pose and the animation later swings vertices outside it, those vertices encode at values greater than 1, which a UNORM texture clamps to 1. The result is a vertex that visibly stops at the box wall.

Houdini Labs VAT scans every frame of the input and emits the global min/max as part of the export metadata. OpenVAT does the same. The shader reads those constants and the bounding box is always correct, by construction.

06Decoding in the shader

The shader-side decode is small. Here it is in HLSL, the way Unreal and most D3D-targeted engines write it:

vat_decode.hlsl · vertex shader entry point

// Uniforms supplied by the engine, the same for every vertex of every
// instance using this VAT asset.
cbuffer VatConstants {
  float3 bboxMin;            // minimum corner of the position bounding box
  float3 bboxMax;            // maximum corner of the position bounding box
  float  frameCount;         // total number of frames stored in the texture
  float  bakeFramerate;      // the rate the simulation was sampled at (e.g. 30 fps)
  float  vertexCount;        // total number of mesh vertices
  float  currentTime;        // elapsed seconds since the animation began (engine clock)
};

// Bind a 2D texture with point sampling on the X axis (vertices) and
// linear sampling on the Y axis (frames), so the GPU's bilinear filter
// performs frame interpolation for free. Mipmaps disabled.
Texture2D<float4> positionTexture;
SamplerState vatSampler;

struct VertexInput {
  float3 position : POSITION;     // the static-mesh rest pose; mostly ignored at runtime
  float2 uv0      : TEXCOORD0;    // material UVs
  float2 uv1      : TEXCOORD1;    // VAT lookup: uv1.x is normalized vertex index + half-pixel
};

struct VertexOutput {
  float4 clipPosition : SV_POSITION;
  float2 materialUv   : TEXCOORD0;
};

VertexOutput VatVertexShader(VertexInput input) {
  // 1. Compute the V coordinate: which row of the texture are we sampling?
  // frameIndexFloat is fractional: e.g. 12.37 means "between frame 12 and 13".
  // Bilinear filtering on V will give us the lerp between those two rows for free.
  float frameIndexFloat = fmod(currentTime * bakeFramerate, frameCount);
  float sampleV = (frameIndexFloat + 0.5) / frameCount;

  // 2. The U coordinate was pre-baked into uv1.x by the exporter. It already
  // includes the half-pixel offset, so we use it directly.
  float2 sampleUv = float2(input.uv1.x, sampleV);

  // 3. Read the normalized position from the texture. SampleLevel with mip 0
  // guarantees we never get a wrongly-filtered mip even if the asset was
  // authored with mips on by accident.
  float3 normalizedPosition = positionTexture.SampleLevel(
      vatSampler, sampleUv, 0).rgb;

  // 4. Un-remap through the stored bounding box. This is the inverse of the
  // bake-time normalize.
  float3 objectSpacePosition = lerp(bboxMin, bboxMax, normalizedPosition);

  // 5. Transform to clip space the usual way. From here it's a normal vertex shader.
  VertexOutput output;
  output.clipPosition = mul(worldViewProjection, float4(objectSpacePosition, 1));
  output.materialUv   = input.uv0;
  return output;
}

Five lines of math. One texture sample, one lerp, and the standard projection. The shader doesn't know whether the animation was a cloth sim, a Houdini destruction, or a baked character cycle: those are all the same shape of texture and the same five lines.

Why SampleLevel, not Sample

The Sample intrinsic chooses a mip level based on screen-space UV derivatives. For VAT textures, the UV is not a screen-space gradient; it's a vertex index, so the implicit mip selection is nonsense. SampleLevel takes the mip level as an explicit argument; pass 0 and you always get the base level. The same reasoning applies in GLSL: use textureLod(..., 0.0), not texture(...).

The GLSL version

Identical math, different syntax. The half-pixel offset on V is the same; the texture lookup uses textureLod instead of SampleLevel:

vat_decode.glsl · GLSL 4.5 vertex shader

#version 450

layout(location=0) in vec3 inPosition;
layout(location=1) in vec2 inUv0;
layout(location=2) in vec2 inUv1;     // VAT lookup, normalized vertex index in .x

layout(binding=0) uniform VatConstants {
  vec3  bboxMin;
  vec3  bboxMax;
  float frameCount;
  float bakeFramerate;
  float vertexCount;
  float currentTime;
  mat4  worldViewProjection;
};

layout(binding=1) uniform sampler2D positionTexture;

layout(location=0) out vec2 outUv;

void main() {
  float frameIndexFloat = mod(currentTime * bakeFramerate, frameCount);
  float sampleV         = (frameIndexFloat + 0.5) / frameCount;
  vec2  sampleUv        = vec2(inUv1.x, sampleV);

  // textureLod forces mip 0 so the GPU doesn't pick a useless mipmap.
  vec3 normalizedPosition = textureLod(positionTexture, sampleUv, 0.0).rgb;
  vec3 objectSpacePosition = mix(bboxMin, bboxMax, normalizedPosition);

  gl_Position = worldViewProjection * vec4(objectSpacePosition, 1.0);
  outUv       = inUv0;
}

07Normals: three approaches and their tradeoffs

A vertex moves; its normal usually has to move with it. A cloth waves and the lighting on each face changes. A character bends an arm and the normal on the inside of the elbow flips. Skipping normals (using the static-mesh normals) makes the lighting look obviously wrong: the geometry says the surface is curving, the normal says it's flat, and you get visible shading seams.

VAT has three ways to ship per-frame normals, in increasing order of fidelity (and cost):

1. Compress into the position texture's alpha channel

A unit normal has three components but only two degrees of freedom (it's on the unit sphere). You can compress it to a single 8-bit value with one of the spherical-projection codecs (octahedral encoding is the modern favorite) and stash it in the alpha channel of the position texture. One texture, one sample, normals come along for free.

Houdini Labs VAT calls this "compress to position alpha" and documents it as the lowest-memory but lowest-quality option^[4]. The lossiness is visible on highly specular surfaces where small normal errors get amplified; for matte or diffuse materials it's usually fine.

2. A separate normal texture

Author a second texture, same vertex × frame layout, RGB = normal × 0.5 + 0.5 (the standard remap from [-1, 1] to [0, 1]). 8-bit RGB is sufficient because normals don't have units; only their direction matters and 8 bits per axis is well within the precision visible on a screen^[11].

Two textures, two samples per vertex, but each component is point-sampled separately and the decode is independent of position. This is the SideFX VAT 2.0 default and what most production pipelines used through ~2022.

3. A rotation texture with tangent

Per-frame normals are enough to light the surface but not enough to do tangent-space normal mapping. A normal map (the kind made from a high-poly bake) is defined in the tangent space of the surface, and that tangent space rotates as the surface deforms. Without the tangent, the normal map's details rotate wrong: a brick wall lit in tangent space ends up with its bricks appearing to slide as the wall flexes.

VAT 3.0 ships a rotation texture that encodes the rotation from the rest-pose tangent frame to the current-frame tangent frame as a quaternion (4 components, 16-bit half-float each). The shader reads the rotation, applies it to the rest-pose tangent and bitangent, and produces a fully-correct per-frame tangent space. Normal maps work; specular highlights track the deformation; the geometry can pretend it has surface detail again^[4].

Object-space normal maps are a useful shortcut

If you don't need tangent-space normal maps (say, the surface details are baked into a procedural that doesn't care about tangent orientation), you can use an object-space normal map instead. Object-space normals don't rotate with the surface; they encode absolute orientations in the mesh's local frame. You combine the static object-space normal map with the per-frame VAT normal (rotating the static normal by the per-frame normal's deflection from rest) and avoid the tangent texture entirely. The OpenVAT docs recommend exactly this pattern^[5].

Why three options?

Memory. A 4096 × 1024 half-float position texture is already 32 MiB. Doubling it for normals is real money in a memory-constrained title. The three options exist so the artist can pick the tradeoff that fits the asset's role: a hero cloth gets the rotation texture; a distant crowd member gets normals-in-alpha; some ground debris gets nothing and accepts faceted shading.

08Frame interpolation

A 60 fps game running a 30 fps bake samples in between frames half the time. The simplest answer is to round to the nearest frame, which produces visibly stepped animation. The standard answer is to linearly interpolate between two adjacent frames, which the GPU's bilinear filter does for free when you sample with the half-pixel-offset V coordinate from §4. Lerp the wrong way and you get artifacts; lerp the right way and the animation looks smooth at any playback rate.

The math, written explicitly:

eq. 6 \cdot frame blending position = (1 - α) \times P floor + α \times P ceil

With bilinear filtering on V, the GPU does this for you automatically when the sample coordinate falls between two pixel centers. With point filtering on V, you read both P_floor and P_ceil by hand and lerp.

The widget below shows what the difference looks like. Toggle between point and bilinear filtering; at 30 fps bake and 60 fps playback, point produces stepped motion while bilinear is smooth. Speed the playback up to 5× the bake rate and even bilinear shows its limits, because the filter only blends two frames at a time and the rest are skipped:

playback rate 1.0×

Looping cleanly

Most baked animations loop: the bake's last frame matches its first, and the texture wraps. Hardware texture wrap modes (WRAP or REPEAT) handle this, but only if the frame stride lines up exactly with the texture height. The standard precaution is to bake an extra copy of frame 0 at the very bottom of the texture, so a bilinear sample at v = 1 − ε blends the actual last frame with the first frame as if they were adjacent. Without that copy, you get a single bad frame at the loop point where the filter blends the last and first frames of an unrelated row.

Don't lerp across topology changes

If the animation has cuts (say, a character swaps to a new pose between frame 30 and frame 31), bilinear filtering averages the two unrelated positions and produces a frame of in-between garbage. The fix is either to insert blank frames at cuts or to switch to point sampling for that asset. Houdini Labs VAT's fluid mode (§11) avoids this with a different trick: it resets the UVs every frame so there's no expectation of vertex correspondence across rows.

09Soft-body VAT: the simple case

Soft-body mode is what we've been describing so far. The mesh has stable topology (the same vertex IDs from start to finish), and only positions (and optionally normals) change per frame. Cloth, jiggle, banner waves, character cycles where you don't need IK, deforming foliage. The pipeline is:

Bake. Houdini, Blender, or Unreal samples the simulation at the chosen rate (typically 30 fps) and writes the per-frame position to row j, column i, of the position texture. Normals (or rotations) go to a second texture. The bounding box is computed across all frames.
Export the mesh. The exporter writes a static mesh with the same vertex count as the simulation, in rest pose, with the normalized vertex ID written into UV1.x. Material UVs, vertex colors, and other per-vertex attributes pass through unchanged.
Sample at runtime. The shader does the §6 decode. The vertex shader's only job beyond a standard transform is one texture lookup and one lerp.

The widget plays back a 32-frame banner cloth. Drag the wind direction; the bake is fixed, so the cloth only knows the motion it was baked with, but the playback speed and looping point are runtime parameters:

playback speed 1.0×

position precision 16-bit

What it costs

Per instance: one draw call's worth of state (the same texture binding for every instance) and however many vertices the mesh has. A 192-vertex cloth with a 192 × 32 half-float position texture costs 192 × 32 × 8 = 49 KB of texture data, shared across every cloth on screen. The GPU runs one vertex shader per vertex per instance; the per-vertex cost is one sample and one lerp on top of the standard transform.

Compared to skeletal cloth (which requires either a physics simulation per instance or a per-vertex skinning weight against many bones), this is a massive simplification, at the cost of every cloth playing the same baked motion. For props in the environment that's fine; for hero cloth you'd ship simulated cloth instead.

10Rigid-body VAT: when pieces rotate as units

A shattering window is a different problem. The mesh is made of fragments: discrete pieces, each rigid, each moving and rotating as a unit. Storing per-vertex positions for thousands of pieces' worth of vertices is wasteful: every vertex on a fragment carries the same translation and rotation, so most of the bits are redundant.

Rigid-body VAT (RBD) factors the data the way a physics engine would:

One row per piece per frame, not one row per vertex. Each fragment is one entry. The position texture stores the piece's pivot translation. A separate rotation texture stores its quaternion orientation.
The mesh carries the piece index and the per-piece pivot. Every vertex on fragment 17 has the same piece ID (17), written into a UV channel. The vertex also carries its offset from its piece's pivot, which is just the rest-pose position minus the piece's rest pivot. The exporter does that subtraction once at bake time.
The shader rotates and translates. Read the piece's rotation and translation, apply the rotation to the vertex's pivot-relative offset, add the translation. The vertex ends up in the right place; the fragment is rigid; the cost is one quaternion-rotate per vertex.

eq. 7 \cdot rigid-body vertex transform P = T piece + Q piece \cdot v offset

The dot is a quaternion-vector product, which costs about 18 floating-point ops. Cheap compared to the 9-op matrix-vector product of a 3×3 rotation matrix, and exactly half the storage (4 floats vs 9, or in practice 4 vs 12 if you store affine 3×4).

Why quaternions instead of matrices

Three reasons. First, storage: 4 floats vs 9 (or 12 for an affine row). Second, interpolation: SLERP between two quaternions produces a clean rotation; lerping two matrices produces non-rigid intermediates that have to be re-orthogonalized. The bilinear filter on the rotation texture gives you a NLERP (normalized linear interpolation), which is good enough for almost all real animations. Third, numerical stability: a sequence of rotations through a matrix accumulates drift; a sequence through a quaternion does not, because renormalization to the unit sphere is cheap and frequent.

Houdini Labs VAT 2.0 and 3.0 both write the quaternion into UV channels of the mesh as packed 16-bit values^[10], with "high precision" and "very high precision" modes that swap between unencoded and encoded representations per piece. The choice is exposed because some scenes have rotations that compress badly under the encoding scheme.

piece count 24

scrub 0.00

The pivot is baked into the mesh, not the texture

Each piece's rest-pose pivot (the point around which the piece rotates) has to be exposed to the shader somehow. The standard answer is to write it into yet another UV channel as a constant per piece (every vertex on piece 17 has the same UV3 value). That way the shader can compute the rest-relative offset on the fly: v_offset = inputPosition − pivot[pieceId]. Some exporters bake the subtraction at export time and store v_offset directly in the vertex position, which is slightly faster at runtime and slightly less flexible.

11Fluid VAT: when topology changes per frame

A water splash or a smoke wisp doesn't have a stable mesh. Marching cubes (or whatever isosurface extractor the sim uses) produces a different vertex count and a different connectivity every frame. Vertex IDs from frame 12 are meaningless in frame 13. The whole "column = vertex" assumption collapses.

Fluid VAT handles this with two changes from the soft-body design:

The mesh is the union of all frames' geometry, with enough triangles to cover the worst frame. A "vertex" no longer has a stable identity; it's just a generic surface point.
UVs are flattened per frame. The bake re-unwraps each frame's mesh independently and stores the unwrap as one row of screen-space UV indirection. At sample time, the shader reads the indirection texture to find the position texture coordinates, then reads the position. Two samples instead of one, but the topology problem dissolves.

Houdini's exporter sets things up so the shipped mesh has high vertex count and a per-frame UV unwrap that gets refreshed every frame the same way the positions do^[4]. Snap's documentation calls this "Dynamic Remeshing" mode and notes that it requires the input geometry to be re-unwrapped per frame in the source DCC^[12].

Memory grows quickly because the worst-case mesh has to be shipped, but the technique is the only way to render a baked fluid surface from a static-mesh runtime. The alternative (shipping the whole sim as an Alembic and reading it on the CPU per frame) is much more expensive on bandwidth and CPU time.

12Sprite VAT: just points

The fourth and simplest mode. There's no mesh; there are particles, each one a position. The position texture stores one row per particle. The mesh you ship is a single quad (the sprite card) and it's instanced once per particle. The vertex shader reads the particle's row of the position texture, places the quad at that position, optionally rotates it to face the camera, and you have an animated particle system.

Houdini's Labs VAT names this "Particle Sprites" mode and lets you pick the card shape (square, triangle, hexagon, or custom)^[4]. The texture costs are minimal: a few thousand particles times a few hundred frames is well under a megabyte. The whole effect (sparks, snow, fireflies) ships as one draw call and one texture.

In practice, sprite VAT competes with classical particle systems where the engine evaluates particle motion at runtime. Sprite VAT loses the ability to react to gameplay (a particle can't ricochet off geometry), but wins on cost (no per-frame CPU) and reproducibility (the bake is identical every play).

13A complete shader, with normals and frame blending

Putting §5 through §10 together. Below is a soft-body VAT vertex shader that does position decode, manual frame blending (so the engine controls the lerp regardless of the texture's sampler state), and tangent-space normal decoding from a second texture. About 60 lines of HLSL. The Rust+WGSL version is structurally identical; only the syntax changes.

vat_full.hlsl · soft-body with blended frames and tangent-space normals

// Engine-supplied uniforms. Constant across every vertex of every instance
// using this VAT asset.
cbuffer VatConstants {
  float3   bboxMin;            // position bounding box minimum corner
  float3   bboxMax;            // position bounding box maximum corner
  float    frameCount;         // total number of frames stored
  float    bakeFramerate;      // sample rate of the bake (typically 30)
  float    currentTime;        // elapsed seconds since the animation began
  float4x4 worldViewProjection;
};

// Position texture: each pixel encodes (x, y, z) normalized into the bounding box.
// Bound with mipmaps disabled and point sampling so we control filtering manually.
Texture2D<float4> positionTexture;

// Normal texture: each pixel encodes a unit normal as (n * 0.5 + 0.5).
Texture2D<float4> normalTexture;

SamplerState pointSampler;

struct VertexInput {
  float3 restPosition : POSITION;     // rest-pose position (unused at runtime)
  float2 materialUv   : TEXCOORD0;    // UVs for diffuse / normal mapping
  float2 vatLookup    : TEXCOORD1;    // vatLookup.x = normalized vertex ID (with +0.5 baked in)
};

struct VertexOutput {
  float4 clipPosition  : SV_POSITION;
  float2 materialUv    : TEXCOORD0;
  float3 worldNormal   : TEXCOORD1;
};

// Sample two adjacent frames and lerp between them. We do this manually instead
// of relying on the sampler's bilinear filter so the wrap-around case is correct:
// the (frameCount - 1) → 0 transition is a loop, and the bilinear filter would
// otherwise blend with the next row of pixels (which doesn't exist).
float3 SamplePositionBlended(float vertexU, float frameIndexFloat) {
  float  frameLo    = floor(frameIndexFloat);
  float  frameHi    = fmod(frameLo + 1.0, frameCount);   // wraps cleanly at the loop point
  float  blendAlpha = frac(frameIndexFloat);

  float  vLo        = (frameLo + 0.5) / frameCount;
  float  vHi        = (frameHi + 0.5) / frameCount;

  float3 normalizedLo = positionTexture.SampleLevel(pointSampler,
                                float2(vertexU, vLo), 0).rgb;
  float3 normalizedHi = positionTexture.SampleLevel(pointSampler,
                                float2(vertexU, vHi), 0).rgb;

  float3 normalized   = lerp(normalizedLo, normalizedHi, blendAlpha);
  return lerp(bboxMin, bboxMax, normalized);
}

// Same as above but for the normal texture. Normals don't need a bounding box;
// they decode with the standard *2 - 1 inverse remap.
float3 SampleNormalBlended(float vertexU, float frameIndexFloat) {
  float frameLo    = floor(frameIndexFloat);
  float frameHi    = fmod(frameLo + 1.0, frameCount);
  float blendAlpha = frac(frameIndexFloat);

  float vLo = (frameLo + 0.5) / frameCount;
  float vHi = (frameHi + 0.5) / frameCount;

  float3 encodedLo = normalTexture.SampleLevel(pointSampler,
                          float2(vertexU, vLo), 0).rgb;
  float3 encodedHi = normalTexture.SampleLevel(pointSampler,
                          float2(vertexU, vHi), 0).rgb;

  // lerp(encoded, encoded) then *2 - 1 is the same as lerping the decoded values,
  // since both are affine. Normalize because lerped unit vectors aren't unit anymore.
  float3 encoded   = lerp(encodedLo, encodedHi, blendAlpha);
  return normalize(encoded * 2.0 - 1.0);
}

VertexOutput VatVertexShader(VertexInput input) {
  float  frameIndexFloat = fmod(currentTime * bakeFramerate, frameCount);
  float  vertexU         = input.vatLookup.x;

  float3 objectPosition  = SamplePositionBlended(vertexU, frameIndexFloat);
  float3 objectNormal    = SampleNormalBlended(vertexU, frameIndexFloat);

  VertexOutput output;
  output.clipPosition = mul(worldViewProjection, float4(objectPosition, 1));
  output.materialUv   = input.materialUv;
  output.worldNormal  = objectNormal;     // pass to pixel shader for lighting
  return output;
}

What's intentionally missing

This implementation is meant to read clearly, not be every-feature complete.

No tangent-frame reconstruction. Adding tangent-space normal maps requires a third texture (or a rotation quaternion) and an extra mul by the per-frame tangent basis. The VAT 3.0 rotation-texture path covers this.
No animation indexing. The shader plays one bake; a real implementation has an animation-state-machine equivalent that selects between several baked clips stacked vertically in the same texture, with offsets per clip.
No per-instance time offset. To desynchronize an instanced crowd, each instance needs its own currentTime; in practice that's a per-instance attribute the engine passes through.
No rigid-body path. Add a piece-ID UV, a rotation texture, and the §10 transform.
No screen-space culling of the texture itself. A huge VAT bound to many instances is fine on the CPU but can stress the GPU's texture cache. Production engines page the texture data or use BC6H to shrink it.

14VAT vs skeletal animation: when to use which

Both approaches survive in production because they optimize for different things. The summary, with citations for the specific numbers:

Axis	Skeletal animation	Vertex animation textures
CPU cost per character	~50-200 µs (bone evaluation, blend tree, IK, palette upload). Linear in characters.	Near zero. The work is one texture binding and one per-instance time uniform.
Draw calls	Typically one per character (per-instance bone palette is per-character state).	One per visible mesh, every instance batched. UE5 City Sample renders ~10,000 pedestrians in a few dozen draws^[1].
VRAM per asset	~1-5 MB per skeletal mesh + per-clip animation data (bone tracks).	Depends on vertex × frame product. A 5,000-vertex character with 60 frames at half-float ≈ 5,000 × 60 × 8 = 2.3 MB per clip.
Runtime flexibility	Full: any blend tree, IK, foot placement, ragdoll. Animations compose arbitrarily.	Limited: play the bake, scrub the bake, lerp between two baked frames. No procedural composition.
Authoring complexity	Rig + skin weight setup. Familiar workflow for character animators.	Bake-time setup in the source DCC (Houdini/Blender/Unreal). Has to be re-baked when the animation changes.
Memory growth pattern	Linear in clip length, sub-linear in mesh size (bones don't scale with vertices).	Linear in both clip length and mesh size. Long animations on dense meshes hit the 8K texture cap quickly.
Supports topology changes	No (mesh topology is fixed; vertex weights are per-vertex).	Yes, in fluid mode (§11), with a memory penalty for worst-case vertex count.
Supports IK / runtime modification	Yes.	No. The animation is what was baked.

The City Sample's mixed strategy is the canonical example: every pedestrian has two representations, a high-fidelity skeletal mesh and a baked vertex-animated static mesh. Within a few metres of the camera, characters are skeletal so the player can see their faces, their IK foot placement, their reactive animations. Past that radius the system swaps to VAT, the CPU cost drops to negligible, and ten thousand of them fit in the frame budget^[1]. Neither approach alone would carry the demo.

A useful rule of thumb

If the character will ever be controlled by the player, or interact with the player, or appear in a cutscene close to the camera, ship it skeletal. If the character is one of many, far away, deterministic, and visually identical to its peers, ship it VAT. If you're not sure, ship both and switch at a distance threshold. That's the modern default.

15The memory wall: 8K × 8K and what fits

VAT memory scales as vertex_count × frame_count × bytes_per_vertex. Each of those three terms grows independently, and the product can get large fast. Worse, the GPU has a hard cap on texture dimensions: most modern hardware allows 16,384 × 16,384, but practical pipelines stop at 8,192 × 8,192 to stay portable^[13]. Above 8K you're outside what mobile, last-gen consoles, and lower-tier laptop GPUs can guarantee.

Concretely, a half-float position texture at 8,192 × 8,192 is 8,192 × 8,192 × 8 = 512 MB. No one ships that. A more realistic budget is a few tens of MB per VAT asset, which buys you something like:

Asset	Vertices	Frames	Format	Position texture size
Banner cloth (loop)	192	32	R16G16B16A16	49 KB
Crowd pedestrian (one clip)	2,048	64	R16G16B16A16	1.0 MB
Crowd pedestrian (BC6H)	2,048	64	BC6H	131 KB
Shattering building (RBD)	128 pieces	120	R16G16B16A16 + rotation	123 KB
Fluid splash (worst-case mesh)	8,192	96	R16G16B16A16	6.0 MB
Hero cloth (4-second loop)	4,096	120	R16G16B16A16	3.75 MB

The widget below lets you dial in vertex count, frame count, and format and reports the resulting texture size and whether it fits inside the 8K × 8K cap. It also reports the implied storage if the asset is one of N instances all using the same texture (the win of VAT is exactly that the texture is shared):

vertex count 2048

frame count 64

storage format R16G16B16A16

texture dims

···

texture size

···

fits 8K cap

···

Compression: BC6H buys you 8×

Half-float RGBA at 8 bytes per pixel is the safe choice but rarely the budget-friendly one. BC6H compresses HDR data with the standard 4×4 block scheme of the BCn family, giving you about 1 byte per pixel: an 8× reduction in VRAM at the cost of some precision loss^[14]. For crowd VAT and other distance-viewed assets the loss is invisible; for hero-mesh cloth it's borderline. The decision is per-asset and almost always worth it for crowds.

BC6H is not always usable: it requires HDR input, can't represent values outside its dynamic range without clamping, and isn't supported on the very oldest GPUs. The fallback is to stay at half-float for hero assets and BC6H for the rest. Both ship together in many AAA pipelines.

16Try it yourself

The playground below runs a JavaScript port of the VAT decode against a procedurally-generated bake. The library is exposed as MPGVat; you can adjust vertex count, frame count, playback rate, position precision, and frame interpolation mode and watch the same mesh respond. Press Run (or Ctrl+Enter / Cmd+Enter). The right pane animates the result:

▸ playground.js · live VAT decode in your browser

// Build a procedural soft-body VAT: a 24-vertex cloth strip, 64 frames,
// driven by a sine wave plus a wind gust.
const vat = MPGVat.create({
  vertexCount: 24,
  frameCount: 64,
  bakeFramerate: 30,
  precision: '16-bit',   // try '8-bit' to see vertex swimming
  interpolation: 'linear', // try 'point' or 'cubic'
});

// Define the bake. Each call sets (vertex, frame) -> position.
for (let frame = 0; frame < 64; frame++) {
  const t = frame / 64;
  for (let v = 0; v < 24; v++) {
    const u = v / 23;
    const wave = Math.sin(u * 6 + t * Math.PI * 2) * 0.2;
    const gust = Math.sin(t * Math.PI * 4) * u * 0.4;
    vat.set(v, frame, [u, wave + gust, 0]);
  }
}

// Play it back at 1x speed for 2 seconds; report stats.
vat.play({ speedX: 1.0, durationSeconds: 2 });

Try dropping the precision to '8-bit' and re-running. The round-trip error climbs into the millimeters and the animation visibly steps. Bump the playback speed past 3× and switch the interpolation to 'point'; even the silky 16-bit version starts looking jittery because the bake rate can't keep up. The settings interact, which is the point.

17How Unreal does it

Unreal Engine 5 ships two overlapping VAT-shaped systems. Most projects use both depending on the asset.

AnimToTexture plugin.^[1] The official path for character VAT. Bakes a Skeletal Mesh plus a set of Animation Sequences into a Static Mesh and a pair of textures (Bone Position + Bone Rotation, or Vertex Position depending on mode). Generates a material function that does the sampling and decoding. Released alongside the City Sample (2021), became part of UE 5.1 (2022), still ships as a default plugin in 5.4+^[15]. The City Sample's pedestrian crowds are AnimToTexture under the hood.
Houdini Engine + Labs VAT 3.0.^[4] The path for everything else: baked sims, cloth, splashes, shatters. SideFX ships the Unreal-side material functions inside the Houdini Engine plugin; the workflow is to bake in Houdini and load the resulting mesh + textures into Unreal. Used by every studio with a Houdini-heavy effects pipeline.

City Sample's two-tier crowd

The instructive part of the City Sample is how the two systems compose. Every pedestrian asset exists twice: once as a skeletal mesh with a full animation blueprint, and once as an AnimToTexture-baked static mesh with the same animations. The MASS framework decides per-instance which representation to use based on distance to the player. Near-camera pedestrians animate per-frame with the rig; far-camera pedestrians sample the baked texture and cost almost nothing. The handoff happens at around 30 metres and is largely invisible because the bake captures the same animations the rig produces.

Niagara mesh particles

Niagara, Unreal's particle system, can emit static-mesh instances. When the mesh is VAT-animated, the per-particle "look" is a baked animation that plays per-instance: flocks of birds, schools of fish, swarms of insects. The Niagara mesh-particle module exposes a per-instance time offset so the swarm doesn't synchronize. Total runtime cost is one draw call for the swarm.

18How Unity does it

Unity doesn't ship an official AnimToTexture equivalent, so the VAT story is a mosaic of community tools and engine-supplied building blocks.

Houdini Engine + Labs VAT 3.0. Same as the Unreal story. The Houdini-baked mesh and textures import into Unity, and the Unity-side shader graph nodes that ship with the SideFX Labs Unity package handle the decode^[4].
OpenVAT.^[5] The open-source Blender baker ships a Unity decoder alongside its Unreal and Godot ones. The shader graphs are designed to be readable and extendable; the maintainers' goal is a portable VAT spec rather than a vendor-specific format.
Shader Graph subgraphs. Most production Unity VAT pipelines wrap the sampling logic in a Shader Graph subgraph that takes vertex UV1 and time as inputs and outputs the displaced position and normal. The subgraph can then be dropped into any URP or HDRP master node.
DOTS / Entities Graphics. Unity's data-oriented runtime, which is the path Unity recommends for tens-of-thousands-of-instances workloads, integrates with VAT through Shader Graph + a per-entity material property override (the per-instance time). Bonjour Interactive Lab's Unity3D-VATUtils^[16] is a common open-source reference.

Unity doesn't ship the equivalent of AnimToTexture as a first-party plugin, but the SideFX-supplied material functions and the OpenVAT shader graphs cover the same ground. The gap is mostly tooling and editor integration rather than runtime capability.

19Pitfalls and how to spot them

VAT failures usually look like the animation, but wrong. A list of the classes I've watched ship.

sRGB on the position texture

VAT textures encode non-color data: positions, normals, quaternions. They must be flagged as linear in the engine's texture import settings, never sRGB. If the engine applies the standard sRGB-to-linear conversion on a VAT texture, the encoded values come back through a 2.2-power curve and the entire animation snaps into wrong positions. The fix is one checkbox in the importer; the symptom (mesh squashed into a corner of its bounding box) is unmistakable once you've seen it.

Mipmaps generated by default

Most engines generate mipmaps automatically on import. Mipmaps on a VAT texture average together unrelated vertices, which produces nonsense at any reduced mip. The asset starts looking fine, then a different vertex shader path picks a non-zero mip and the mesh collapses to a smear. The fix is to disable mip generation at import; the runtime SampleLevel(..., 0) backs this up but isn't sufficient on its own (some shader paths still pick mips unintentionally).

Bilinear filtering on X

Bilinear is right for the Y (frame) axis and wrong for the X (vertex) axis. Bilinear on X averages adjacent vertex columns, blending unrelated positions; you get a mesh whose vertices appear to slide toward their neighbors. The fix is to set the sampler to point filtering on the X axis or to add the half-pixel offset (§4) so the sample lands at a pixel center every time. Most engines don't expose per-axis sampler state, so the half-pixel-offset path is the portable answer.

Tangent-space normal maps that look right at rest, slide during animation

A normal map baked in the rest pose's tangent space stays correct only as long as the tangent space doesn't move. Under VAT deformation the tangent rotates, and a tangent-space normal map's details rotate with the surface, except they don't, because the shader doesn't have the per-frame tangent. You get sliding-detail artifacts: surfaces look like they have a moving texture rather than fixed normal detail. The two fixes are (a) ship a rotation texture and reconstruct the tangent frame (VAT 3.0's approach^[4]) or (b) use an object-space normal map, which doesn't care about tangent orientation.

Wrong bounding box

If the bake's bounding box doesn't enclose every vertex of every frame, the out-of-range vertices clamp to the box wall and visibly stop. The symptom is a deformation that looks correct until a peak frame, when part of the mesh appears stuck on a plane. The fix is to recompute the bounding box across all frames and re-export; SideFX VAT and OpenVAT both do this automatically, but a hand-built exporter is easy to get wrong.

Loop seam

A bake without a copy of frame 0 at the texture's end has a discontinuity at the loop point: the bilinear filter blends the actual last frame with the wrong row (either clamped or wrapped to the wrong texel). The symptom is one bad frame at the loop point that looks like a pose jump. The fix is to append frame 0 a second time at the bottom of the texture so the wrap is clean (§8).

Single-instance time on a crowd

If every instance reads the same global time uniform, every instance plays the same frame, and a crowd of pedestrians all walks in synchronized lockstep. The fix is to bind a per-instance time offset (Unreal exposes this through PerInstanceCustomData, Unity through material property blocks or DOTS entity properties) and add it to the global time before computing the V coordinate. A few hundred milliseconds of random offset per instance breaks the lockstep without needing more bakes.

Decompressed-too-soon precision

Some shaders do their lerp before the bounding-box decode, others after. Mathematically, lerping then decoding gives a different result than decoding then lerping, but only if the lerp factor itself is in non-affine space (which it isn't for standard bilinear filtering). In practice the two orderings agree, but the in-between values during a manual blend can clip to [0, 1] if you decode first and lerp second on a UNORM texture, because the decoded values are in object space and not bounded to [0, 1]. Decode after the lerp for safety.

20Where to go from here

VAT is a small, well-understood technique. Once you have the pattern in your head, the practical learning is reading other people's exporters and shader graphs to see the variations.

Read these tools

SideFX Labs VAT 3.0.^[4] The reference implementation. The Houdini node is open-source (HDA inside SideFXLabs); reading the network is a tour of every encoding choice in the field.
OpenVAT.^[5] The Blender-native baker. Source is MIT, the shader graphs for Unity / Unreal / Godot ship alongside. Smaller than the SideFX equivalent, more readable.
Unreal AnimToTexture plugin. Ships with UE 5.1+, sources are in the engine repo under Engine/Plugins/Animation/AnimToTexture. The material functions are the canonical reference for the Unreal-side decode.
Bonjour Interactive Lab's Unity3D-VATUtils.^[16] Unity-side decoders for Houdini VAT, including the rigid-body quaternion path.

Read these references

Dudash, B. (2007). Skinned Instancing, NVIDIA SDK^[6]; republished as GPU Gems 3 Chapter 2^[7]. The earliest published version of the texture-as-animation-database idea.
Vasconcelos, L.O. Texture Animation: Applying Morphing and Vertex Animation Techniques.^[3] The Wildlife Studios mobile case study; concrete numbers on a phone-tier device.
Dimitrov, S. (2021). Vertex Animation Texture (VAT).^[11] A clear walkthrough of the encoding tradeoffs.
Valve. Half-Life: Alyx Workshop Tools / Houdini Vertex Animation.^[2] Source 2's documented VAT pipeline. Useful as a sanity check that the pattern is engine-portable.

The final exam

Five questions on the whole tutorial. If you can answer all five without scrolling back, you've got the fundamentals.

21Sources & further reading

Numbered citations refer to the superscripts above. Everything below is freely available on the open web or linked from a vendor's documentation page.

A note on originality

The prose, code, CSS, and interactive demos on this page are original writing. The bounding-box remap and the (vertex × frame) layout follow SideFX Labs VAT 3.0 [4]; the precursor "texture-as-bone-database" pattern is from Dudash's NVIDIA work [6][7]. Performance comparisons against skeletal animation cite Wildlife Studios' mobile teardown [3] and Unreal's City Sample [1]. The four-mode taxonomy (soft body, rigid body, fluid, sprite) is Houdini Labs' convention [4]; OpenVAT [5] covers the open-source Blender path.

Epic Games. (2021). City Sample & AnimToTexture plugin. Unreal Engine. unrealengine.com. The crowd-rendering reference for AAA scale; the AnimToTexture plugin was released alongside this demo and shipped as part of UE 5.1.
Valve. Half-Life: Alyx Workshop Tools — Modeling / Houdini Vertex Animation. Valve Developer Community. developer.valvesoftware.com. The Source 2 documentation for importing Houdini-baked VAT, including the texture conventions.
Vasconcelos, L.O. Texture Animation: Applying Morphing and Vertex Animation Techniques. Wildlife Studios Tech Blog. medium.com. Mobile case study: 4,900 VAT-animated soldiers in one draw call on a Galaxy S8 where skeletal could manage 441.
SideFX. Labs Vertex Animation Textures 3.0 render node. Houdini documentation. sidefx.com. The current canonical VAT exporter; documents the four modes (Soft, Rigid, Fluid, Sprite), the rotation texture, and the encoding options.
sharpen3d. OpenVAT — Vertex Animation Toolkit. openvat.org; github.com/sharpen3d/openvat. MIT-licensed Blender-native VAT baker with engine-side decoders for Unity, Unreal, and Godot. 16-bit EXR output, JSON metadata for the remap.
Dudash, B. (2007). Skinned Instancing. NVIDIA SDK 10 whitepaper. PDF. The earliest published version of using a texture as the bone-matrix database read from a vertex shader; the structural precursor to all modern VAT.
Dudash, B. (2008). Animated Crowd Rendering. GPU Gems 3, Chapter 2. developer.nvidia.com. The republished version of the 2007 whitepaper; ~9,547 instanced animated characters at 34 fps on a GeForce 8800.
Sousa, T. (2008). Vegetation Procedural Animation and Shading in Crysis. GPU Gems 3, Chapter 16. developer.nvidia.com. Per-vertex wind-bend parameters in vertex colors, vertex-shader-driven animation. Same era as Dudash; structurally similar idea.
Epic Games. Pivot Painter Tool 2.0. Unreal Engine documentation. dev.epicgames.com. Bakes per-leaf pivots and axis vectors into textures, designed to compose with vertex-animation systems.
SideFX. Labs Vertex Animation Textures 2.0 render node. Houdini documentation. sidefx.com. The 2.0 version introduced the rigid-body, fluid, and sprite modes; superseded by 3.0 but documents the original four-mode design.
Dimitrov, S. (2021). Vertex Animation Texture (VAT). stoyan3d.wordpress.com. Walkthrough of the encoding choices including the bounding-box normalization and the 8-bit precision tradeoff.
Snap Inc. Vertex Animation Textures Guide. Lens Studio documentation. developers.snap.com. Lens Studio's VAT pipeline; documents the four modes (Softbody, Rigidbody, Fluid, Sprite) and their texture outputs.
SideFX. vertex animation texture limit? SideFX Forums. sidefx.com/forum. Discussion of the 8K hard cap on practical VAT texture dimensions and how vertex count × frame count factors against it.
Microsoft Learn. BC6H Texture Block Compression. learn.microsoft.com. The HDR BCn format; 16 bytes per 4×4 block (1 byte per pixel), supports half-float dynamic range.
Epic Games. (2024). AnimToTexture in UE 5.4 and Niagara / MASS integration. Unrealcode.net. unrealcode.net. Walkthrough of the modern plugin usage with Niagara and the MASS framework for large crowd systems.
Bonjour Interactive Lab. Unity3D-VATUtils. GitHub. github.com/Bonjour-Interactive-Lab. Unity-side shaders and utilities for consuming Houdini-baked VAT including the rigid-body quaternion path.
Microsoft Learn. SV_VertexID semantic. HLSL documentation. learn.microsoft.com. The system-generated vertex-index value that some VAT implementations use in place of a baked UV1 lookup.
SideFX. Vertex Animation Textures in Unreal Engine 5. Tutorial series. sidefx.com/tutorials. Walkthrough of the Houdini → Unreal VAT workflow with the Labs 3.0 node and the Unreal material functions.
Cocos Creator. Vertex Animation Texture (VAT). Cocos documentation. docs.cocos.com. Another engine's VAT implementation, useful for cross-checking the shape of the pipeline against Unity / Unreal.
keijiro. HdrpVatExample. GitHub. github.com/keijiro/HdrpVatExample. Reference Unity HDRP project for VAT with Shader Graph and Visual Effect Graph.