Build a Game Engine · Animation

Animation: Blending, State Machines & IK

The Skeletal Animation module built the machinery, a skeleton, the matrix palette, clip sampling with slerp, the cross-fade. This is everything upstream of the palette: how the engine decides which pose to produce each frame (a state machine or motion matching), blends and layers it, and adjusts it with IK before it ever reaches the skin.

Time~55 min LevelSenior PrereqsThe Skeletal Animation module (the palette, slerp, the cross-fade, additive-as-delta, two-bone IK) and Behavior Trees (decisions drive transitions). StackC++ & Rust

01Where this fits

The Skeletal Animation module ends at: a per-joint local pose → walk the hierarchy to globals → build the matrix palette → skin the vertices. Everything in this module produces or modifies that local pose, then hands it to the same matrix palette. The full per-frame pipeline:

pose source (state machine or motion matching) → layer blends → IK adjusts → globals → palette → skin

Each stage reuses the skeletal module's primitives, the local-space per-joint cross-fade (slerp rotations, lerp translation/scale), additive-as-delta, and the two-bone analytic IK. This module does the depth pass on what that one previewed; it doesn't re-derive them.

02The state machine

An is the standard authoring model (Unreal's AnimGraph, Unity's Mecanim)^[2]^[3]. Each state produces one pose (a clip or a blend tree). Each transition is {toState, condition, blendDuration}. The key point: a transition is a timed cross-fade between the source state's pose and the destination's, with weight clamp(elapsed / blendDuration, 0, 1), the exact local-space cross-fade from Skeletal Animation, driven by time instead of a slider.

A transition is not an instant switch, and inertialization is the cheaper alternative

A hard cut (weight jumps 0→1) pops, a visible discontinuity in joint angles. And blend in local space per joint then recompute globals; blending global matrices gives non-rigid, shrinking in-betweens (the LBS failure). A dual-pose cross-fade evaluates both states during the transition (double cost); the standard production alternative is inertialization (Bollo, Gears of War 4): evaluate only the destination and blend via a decaying offset curve, so you don't pay for two pose graphs mid-transition^[5]. Blend durations are authored per transition (~0.2 s for locomotion, 0.05 to 0.1 s for hit reactions), not a fixed constant.

A speed input drives transitions between states; the in-progress cross-fade shows as a blend-weight bar. Toggle the hard cut to see the pop:

An animation state machine (transitions are timed cross-fades)

struct Transition { int toState; ConditionFn condition; float blendDuration; };
struct State { PoseSource source; std::vector<Transition> transitions; };  // source = clip or blend tree

Pose StateMachine::evaluate(float dt, const Params& p) {
    if (targetState < 0)                                  // idle: check transitions in priority order
        for (auto& t : states[currentState].transitions)
            if (t.condition(p)) { targetState = t.toState; elapsed = 0; blend = t.blendDuration; break; }

    Pose from = states[currentState].source.sample(dt, p);  // local-space pose (from M21)
    if (targetState < 0) return from;

    Pose to = states[targetState].source.sample(dt, p);
    elapsed += dt;
    float weight = clamp(elapsed / blend, 0.0f, 1.0f);
    Pose blended = crossFade(from, to, weight);          // M21: per-joint slerp/lerp, LOCAL space
    if (weight >= 1.0f) { currentState = targetState; targetState = -1; }  // done
    return blended;
}

struct Transition { to_state: usize, condition: fn(&Params) -> bool, blend_duration: f32 }
struct State { source: PoseSource, transitions: Vec<Transition> }   // source = clip or blend tree

fn evaluate(&mut self, dt: f32, p: &Params) -> Pose {
    if self.target.is_none() {                          // idle: check transitions
        for t in &self.states[self.current].transitions {
            if (t.condition)(p) { self.target = Some(t.to_state); self.elapsed = 0.0; self.blend = t.blend_duration; break; }
        }
    }
    let from = self.states[self.current].source.sample(dt, p);
    let Some(target) = self.target else { return from };
    let to = self.states[target].source.sample(dt, p);
    self.elapsed += dt;
    let weight = (self.elapsed / self.blend).clamp(0.0, 1.0);
    let blended = cross_fade(&from, &to, weight);          // M21 local-space blend
    if weight >= 1.0 { self.current = target; self.target = None; }
    blended
}

03Blend spaces

A state's pose is often a , not a single clip. A 1D blend places a parameter (speed) on a line of sorted samples (idle, walk, run); find the two bracketing samples and blend by the normalized fraction. A 2D blend mixes by two parameters (e.g. forward-speed × strafe-speed) and needs a real 2D weighting scheme^[3].

2D blending is not two 1D blends

You can't decompose a 2D blend into two independent 1D blends, off the axes (a diagonal strafe) the weights come out wrong. Unity ships Gradient Band Interpolation (Johansen): each sample's influence is min over j of [1 − (p−p_i)·(p_j−p_i) / |p_j−p_i|²], clamped and normalized, which is connectivity-free and density-invariant^[4]. Barycentric (triangulation) is a valid alternative but needs a mesh of the samples and extrapolates badly outside the hull. Directional locomotion uses the polar variant (angle + magnitude) so a slow-walk sample doesn't bleed past a fast-run sample in the same direction. Whichever scheme, it only computes weights, the blend itself is still the local-space per-joint one.

Drag the query point; the gradient-band weights of the nearby samples update live. Toggle the naive two-1D-blends mode to see it go wrong on the diagonals:

04Layered animation

Real characters do two things at once: run and reload. Layered animation stacks poses with a bone mask selecting which joints a layer affects. An upper-body layer (aim, reload) plays over a lower-body locomotion layer; the mask is what stops them fighting over the spine and hips.

Override layers vs additive layers

An override layer replaces the masked joints' pose (up to the layer weight). An additive layer adds a pose difference, a delta from a reference pose (the additive-as-delta rule), composed on top of any base: one aim-offset or lean or recoil delta works over walk, run, and crouch without authoring every combination. They're different operations, a layer is one or the other; additive is not "playing clip B over clip A."

05Motion matching

(Clavet, Ubisoft, For Honor) is the modern data-driven alternative to a hand-built state machine^[6]. Instead of authoring states and transitions, you keep a database of mocap frames, each tagged with a feature vector (the future trajectory, plus foot positions/velocities and hip velocity). Each search, build a query vector from the desired trajectory + current pose, find the nearest database frame by a weighted cost, and blend to it (via inertialization).

Not magic, and it didn't kill the state machine

The search often runs only a few times a second, not necessarily every frame (some engines search more often), between searches the chosen clip just plays and inertialization smooths the jump^[7]. It's not authoring-free, you curate a large, clean mocap database (the "dance cards": walks, plants, and strafes at a spread of angles). The nearest-neighbor search has real cost (accelerated with KD-trees and, at scale, learned models). And it does not replace state machines everywhere, in For Honor the game logic still ran as a state machine; only animation selection was automated. Both ship. The feature vector contents vary by engine (don't claim one canonical schema).

06IK in depth

Forward kinematics gives positions from angles; finds the angles to reach a target. Two-bone IK is analytic and exact (law of cosines + a pole vector). General chains use iterative solvers:

FABRIK (Aristidou & Lasenby): a forward pass (end effector → root) and a backward pass (root → end effector), each placing joints on a line at the original bone lengths. Converges in few iterations with the lowest cost of the common solvers^[8].
CCD (cyclic coordinate descent): rotate one joint at a time, end → root, to aim the end effector at the target; iterate. Simple and fast, but can produce unnatural poses (end joints straighten first)^[9].
Jacobian methods (conceptual): linearize end-effector motion vs joint angles and step; smooth, handles many DOF, but slow and singularity-prone.

Iterative solvers are approximate, clamp on unreachable, and FABRIK must reset the root

Unlike two-bone, FABRIK/CCD converge over iterations (more iterations = closer but more cost) and can't reach an unreachable target, clamp to max reach (FABRIK stretches to a straight line in one pass). FABRIK's backward pass must reset the root to its anchor, or the whole chain drifts off, the most common FABRIK bug. Foot IK is a combination: raycast down per foot, drop the pelvis to the lowest foot, two-bone leg IK to plant each foot, rotate the ankle to the surface normal, and interpolate the offsets over time, applying them instantly gives "robot legs." Look-at distributes rotation across spine joints with limits (no 180° head snaps). And IK runs after the pose and before the palette, it modifies the pose, never replaces it. No solver is universally best.

FABRIK (forward + backward passes; the backward pass resets the root)

// positions[0] = root (anchored); positions[n-1] = end effector. boneLen captured at bind.
void solveFABRIK(std::vector<vec3>& pos, const std::vector<float>& boneLen,
                 vec3 target, int maxIters = 10, float tol = 1e-3f) {
    const size_t n = pos.size(); const vec3 root = pos[0];
    float reach = 0; for (float L : boneLen) reach += L;
    if (length(target - root) > reach) {                 // UNREACHABLE: stretch straight (1 pass)
        for (size_t i = 0; i + 1 < n; ++i)
            pos[i+1] = pos[i] + normalize(target - pos[i]) * boneLen[i];
        return;
    }
    for (int it = 0; it < maxIters; ++it) {
        if (length(pos[n-1] - target) < tol) break;     // converged
        pos[n-1] = target;                              // FORWARD: end -> target, inward
        for (int i = (int)n-2; i >= 0; --i)
            pos[i] = pos[i+1] + normalize(pos[i] - pos[i+1]) * boneLen[i];
        pos[0] = root;                                  // BACKWARD: reset root (critical!), outward
        for (size_t i = 0; i + 1 < n; ++i)
            pos[i+1] = pos[i] + normalize(pos[i+1] - pos[i]) * boneLen[i];
    }
}

// pos[0] = root (anchored); pos[n-1] = end effector.
fn solve_fabrik(pos: &mut [Vec3], bone_len: &[f32], target: Vec3, max_iters: u32, tol: f32) {
    let n = pos.len(); let root = pos[0];
    let reach: f32 = bone_len.iter().sum();
    if (target - root).length() > reach {                // UNREACHABLE: stretch straight
        for i in 0..n-1 { pos[i+1] = pos[i] + (target - pos[i]).normalize() * bone_len[i]; }
        return;
    }
    for _ in 0..max_iters {
        if (pos[n-1] - target).length() < tol { break; }   // converged
        pos[n-1] = target;                              // FORWARD: inward
        for i in (0..n-1).rev() { pos[i] = pos[i+1] + (pos[i] - pos[i+1]).normalize() * bone_len[i]; }
        pos[0] = root;                                  // BACKWARD: reset root (critical!), outward
        for i in 0..n-1 { pos[i+1] = pos[i] + (pos[i+1] - pos[i]).normalize() * bone_len[i]; }
    }
}

Drag the target; FABRIK iterates to reach it. Pull it out of range to see the chain stretch straight and stop short:

07Combining it all

The full per-frame animation pipeline, end to end:

A state machine or motion matching produces a base local pose.
Layers blend on top (masked override + additive deltas).
IK adjusts the result (foot plant, look-at, hand reach), modifying the pose.
Recompute globals from the local pose (walk the hierarchy).
Build the matrix palette and skin in the vertex shader (the Skeletal Animation module).

Every stage here is upstream of the palette; the skeletal module is where it all lands. The behavior tree sits one level up again, it sets the parameters and conditions that drive the state machine's transitions.

Wrong answers, and why: a transition pop is a missing cross-fade (not clip length; and you blend local TRS, never global matrices); and FABRIK is iterative/approximate with a root-reset in the backward pass (it isn't exact, and you clamp unreachable targets rather than stretch bones).

08Pitfalls

Hard-cut transitionsPop. A transition is a timed cross-fade (or inertialization).

Blending global matricesNon-rigid in-betweens. Blend local TRS, then recompute globals.

2D blend as two 1D blendsWrong on diagonals. Use a real 2D scheme (gradient band).

Layer without a maskUpper and lower body fight. Mask which joints a layer drives.

"Motion matching is magic"Curated database, few-Hz search; game logic stays a state machine.

FABRIK without the root resetThe chain drifts off its anchor. Reset the root in the backward pass.

Instant foot IKRobot legs. Raycast + pelvis drop + leg IK, interpolated over time.

"FABRIK is exact"It's iterative; it clamps on unreachable. Two-bone is the exact one.

09What's next

That's the upstream half of character animation: decide the pose (state machine or motion matching), blend and layer it, and adjust it with IK, all feeding the matrix palette. With this, the engine series covers every subsystem from the cache line to a shipped 3D game. Back to the series hub for the full map.

Jason Gregory. Game Engine Architecture, 3rd ed., "Animation Systems." gameenginebook.com. Clips, blend trees, layered/additive blending, and the action state machine.
Epic Games. "State Machines" and "Transition Rules" (Unreal Engine). dev.epicgames.com. States as AnimGraphs producing a pose; transition blend duration; Standard / Inertialization / Custom blend logic.
Unity Technologies. "Animation State Machines" and "2D Blend Trees." docs.unity3d.com. Mecanim states, transitions with a blend duration, Any State, and the three 2D blend modes.
Rune Skovbo Johansen. Automated Semi-Procedural Animation for Character Locomotion (MSc thesis, 2009), §6.3 Gradient Band Interpolation. runevision.com. The 2D blend-weight algorithm Unity ships; Cartesian vs polar; why barycentric/RBF fall short.
David Bollo. "Inertialization: High-Performance Animation Transitions in Gears of War." GDC 2018. gdcvault.com. Handling transitions as a decaying post-process offset instead of evaluating two pose graphs.
Simon Clavet. "Motion Matching and The Road to Next-Gen Animation." GDC 2016 (Ubisoft, For Honor). gdcvault.com. Declarative animation; runtime search for the frame matching the current pose + desired future.
O3DE. "Motion Matching in O3DE, a Data-Driven Animation Technique." docs.o3de.org. A concrete feature schema, KD-tree search, and the few-times-per-second search rate.
Andreas Aristidou and Joan Lasenby. "FABRIK: A fast, iterative solver for the Inverse Kinematics problem." Graphical Models 73(5), 2011. andreasaristidou.com. The forward/backward point-on-line solver; lowest cost / fewest iterations; unreachable → straight line.
Ryan Juckett. "Cyclic Coordinate Descent in 2D." ryanjuckett.com. CCD: iterate joints end to root, rotating each to aim the end effector at the target.
Daniel Holden. "Simple Two Joint IK." theorangeduck.com. The analytic two-bone IK (law of cosines + bend axis) this module recaps.
Guillaume Blanc. ozz-animation "foot_ik" sample. guillaumeblanc.github.io. Raycast + pelvis adjustment + two-bone leg IK + ankle-to-normal foot planting.