Graph Traversal
Every system in an engine that has dependencies is a graph: the scene hierarchy, the render-pass DAG, the asset import chain, the navmesh, the AI behavior tree. The two algorithms that walk them are breadth-first search and depth-first search. From those two come topological sort, strongly connected components, shortest paths, A*, and the rest of the pathfinding family. We build each one from scratch, run them live on the same maze so the difference is visible, and end on the AAA case studies that decide which variant ships.
01Why traverse a graph
A modern engine's frame is a stack of graph walks. The scene graph is a tree (or a DAG, once you allow instancing): the engine traverses it every frame to compute world transforms. The render graph[1] is a DAG of passes whose dependencies decide barrier insertion, resource lifetimes, and async-compute scheduling, built by a topological sort, walked depth-first. The asset import pipeline is a DAG of source files, import settings, and cooked artifacts; cooking visits it bottom-up so dependencies are baked before dependents. The navmesh is a planar graph of triangles or polygons that A* searches whenever an NPC decides to walk. AI behavior trees are DFS-walked every tick. Animation blend graphs are evaluated by topological order. Even the shader compile cache is a graph: an HLSL include-tree the toolchain walks to compute a content hash.
All of that runs on two procedures: breadth-first search and depth-first search. Everything else is a specialization. Dijkstra is BFS with a priority queue. A* is Dijkstra with a heuristic. Topological sort is DFS with a postorder. Tarjan's SCC is DFS with a lowlink trick. Get the two base algorithms right and the rest reduces to "what data structure do I use for the frontier."
A working mental model of every graph walk that ships in a real engine. Adjacency representations and when each one is right. BFS and DFS with the three-color convention, edge classification, and the recursion-depth bug that crashes the scene graph on deeply nested prefabs. Topological sort by Kahn's algorithm and by DFS postorder. Strongly connected components by Tarjan's lowlink. Dijkstra, BellmanโFord, and the A* admissibility theorem. Bidirectional search, jump-point search, hierarchical pathfinding (HPA*). And the engine case studies: how Recast and Detour [2] build the navmesh Unreal, Unity, and Godot ship, why F.E.A.R. ran A* over actions instead of waypoints[3], and how a render-graph DAG turns into a barrier list.
The cost of the wrong traversal
A scene graph with 50,000 entities, traversed once per frame at 60 fps, gives you 333 ns per node to compute the world transform and push it onto the GPU upload buffer. A BFS using std::queue<Entity*> with the default std::deque allocator burns several hundred nanoseconds per push and pop on heap traffic alone, well before you touch the node itself. The same walk implemented as a depth-first stack over a flat array of preorder-sorted entities is a sequential scan: one cache line per few nodes, no allocation, no pointer chase. The asymptotic class is the same. The constant factor is the entire frame.
The widget below puts the two on the same axis: entity count along X, microseconds along Y. The "pointer-chase BFS" curve assumes one DRAM-miss-equivalent per node (a hundred-ish nanoseconds on modern hardware, Norvig's ballpark[4]). The "flat-array DFS preorder" curve assumes one L1 hit per node (about half a nanosecond). The gap is between two and three orders of magnitude:
02A short history
Graph theory predates computing by two centuries (Euler, Seven Bridges of Kรถnigsberg, 1736). Graph algorithms arrived with electromechanical switching and didn't slow down. The dates that matter for engine work:
ARecastNavMesh), Godot, O3DE, and adopted into Unity's NavMesh build pipeline. The dominant CPU-pathfinding codebase in shipped games since.
03Graphs and how to store them
A graph is a set of vertices and a set of edges between them. Engine-side, vertices are entities, render passes, scene nodes, navmesh polygons; edges are parent-child links, resource dependencies, navmesh adjacencies, behavior-tree transitions. The shape of the graph and what you do with it dictates the storage:
- Directed vs undirected. An edge from u to v may or may not imply an edge from v to u. Scene graph: directed (parent โ child). Navmesh adjacency: undirected (you can walk both ways). Render-pass dependency: directed and acyclic.
- Weighted vs unweighted. An edge may carry a cost: distance, time, fuel, an animation blend weight, a navmesh polygon area. Unweighted graphs use BFS to find the shortest path. Weighted graphs need Dijkstra (non-negative weights) or BellmanโFord (any weights, including negative).
- Sparse vs dense. The classification is the ratio of |E| to |V|ยฒ. A planar navmesh is sparse: each polygon has 3 to 8 neighbors, so |E| = O(|V|). A full complete graph like a "distance from every city to every city" matrix is dense: |E| = ฮ(|V|ยฒ). The representation you pick has to match.
Three storage layouts cover every case you'll meet:
| Layout | Storage | "Edge exists?" | "Neighbors of v" | When |
|---|---|---|---|---|
| Adjacency list | ฮ(|V| + |E|) | O(deg v) | O(deg v) | Sparse graphs (most engine graphs). Default choice. |
| Adjacency matrix | ฮ(|V|ยฒ) | O(1) | ฮ(|V|) | Dense graphs, small fixed-size graphs (8ร8 connectivity tables), FloydโWarshall. |
| Edge list | ฮ(|E|) | O(|E|) | O(|E|) | Bulk-edit, sorting edges by weight (Kruskal MST), input/output to disk. |
Adjacency list is the default. Mehlhorn and Sanders[21] give the full asymptotic comparison; for sparse graphs the matrix wastes space at ฮ(|V|ยฒ) when most cells are zero, and neighbor enumeration becomes ฮ(|V|) per vertex (a full row scan) instead of ฮ(deg v). The break-even is around |E| โ |V|ยฒ / 32 on a bit-packed matrix, but you'll never hit it on a navmesh or render graph.
The cache-friendly adjacency list is not a std::vector<std::vector<int>>. It's two flat arrays in CSR layout: one offsets array and one edges array, both contiguous. SNAP[22], Boost.Graph, and most production graph libraries default to CSR for traversal:
// CSR adjacency list. Used by SNAP, Boost.Graph, Galois, every BFS benchmark.
struct CsrGraph {
std::vector<int> offsets; // size |V| + 1
std::vector<int> edges; // size |E|
};
// Neighbors of vertex v: contiguous span in 'edges'.
inline std::span<const int> neighbors(const CsrGraph& g, int v) {
return { &g.edges[g.offsets[v]],
static_cast<size_t>(g.offsets[v + 1] - g.offsets[v]) };
}
The reason CSR is universal: neighbor enumeration is one indirection and a sequential read. A std::vector<std::vector<int>> hits DRAM on the outer indirection and the inner vector's heap buffer, two cache misses per vertex visit. On the Graph500[23] benchmark inputs that show up in academic papers (Kronecker R-MAT graphs at scales 22 through 30), BFS on a CSR adjacency list runs at hundreds of millions of edges per second on a single core; the same algorithm on the nested-vector layout is typically 5โ10ร slower[24].
The widget below shows the same 32-vertex graph stored three ways. Pick a vertex; the representation highlights the work each one does to answer "what are v's neighbors?"
04Breadth-first search
BFS visits vertices in increasing order of edge-distance from the source. Level 0 is the source. Level 1 is everything one edge away. Level 2 is everything two edges away that wasn't already on level 1. The data structure that enforces this order is a FIFO queue.
The three-color convention (white = unvisited, gray = in the queue, black = expanded) comes from Cormen, Leiserson, and Rivest's 1990 first edition of Introduction to Algorithms[25]. Earlier sources, including Tarjan 1972, use a two-state distinction; the three-color version makes the queue invariant explicit and is the version every textbook now teaches.
// BFS. Returns parent[] and dist[]. Visits each vertex once, each edge once.
// Complexity: O(|V| + |E|), space O(|V|).
//
// 'parent[v] = -1' for unreached vertices; the source has parent = source.
// 'dist[v]' is the edge-count from source to v.
void bfs(const CsrGraph& graph, int source,
std::vector<int>& parent,
std::vector<int>& dist)
{
const int vertexCount = graph.offsets.size() - 1;
parent.assign(vertexCount, -1);
dist .assign(vertexCount, -1);
std::queue<int> frontier; // gray vertices: in the queue, not yet expanded
parent[source] = source; // sentinel: source is its own parent
dist [source] = 0;
frontier.push(source);
while (!frontier.empty()) {
int current = frontier.front(); frontier.pop();
// 'current' transitions from gray to black here.
for (int neighbor : neighbors(graph, current)) {
if (parent[neighbor] != -1) continue; // already gray or black
parent[neighbor] = current;
dist [neighbor] = dist[current] + 1;
frontier.push(neighbor); // newly gray
}
}
}
A few things are worth pointing out that the textbook usually skims past:
- The check is on discovery, not on expansion. Once a vertex is in the queue, it must not be added again, or the queue size blows up to ฮ(|E|) instead of ฮ(|V|). The textbook version of this bug shows up as out-of-memory on dense graphs.
- BFS finds the shortest path in edge count, not in weight. Unweighted shortest path is BFS. Weighted shortest path needs Dijkstra (ยง10). Running BFS on a weighted graph and pretending the weights aren't there is a real bug seen in shipped pathfinding code.
- The complexity is ฮ(|V| + |E|). Not |V| + |E|; we touch every vertex once and every edge once. On a CSR graph this is a single sweep through both arrays.
- Reconstructing the path is a backward walk through
parentfrom the goal until you hit the source. The path comes out reversed; either build a forward array andstd::reverseit, or carry it on a stack as you walk.
The widget below paints a maze in BFS levels. Level 0 is the source. Level 1 is everything one edge away. The wavefront expands radially because every cell on level k is exactly k moves from the start. The color band corresponds to the BFS dist[] field:
Bidirectional BFS
If you know both endpoints (source s and goal g) you can launch two BFS frontiers, one from each, and stop when they meet[15]. The reason this is faster is geometric: on a uniform graph with branching factor b, single-direction BFS explores ~bd vertices to find a goal at depth d; bidirectional BFS explores ~2bd/2, the square root of the single-direction count. The catch is implementation discipline: the two frontiers must alternate, and detecting the meeting point requires checking each newly-gray vertex against the other direction's visited set.
05Depth-first search
DFS swaps the queue for a stack and otherwise looks identical. The change in data structure changes the order of visits: instead of expanding outward in layers, DFS picks one neighbor, recurses into it, follows that branch as deep as it goes, and only backtracks when there's nothing left to expand. The order of discovery is a preorder traversal of a spanning tree rooted at the source.
DFS comes in two equivalent forms: recursive (uses the call stack as the data structure) and explicit-stack iterative (uses a std::vector as the stack). The recursive form is what every algorithm textbook shows. The iterative form is what every shipping engine actually runs.
// Recursive DFS. Reads cleanly. Crashes on a deep enough graph.
//
// Each recursive call adds one stack frame (~64 bytes on x86-64 with no inlining).
// The default thread stack is 1 MiB on Windows, 8 MiB on Linux. Divide by 64,
// and your scene-graph traversal blows up around 16k to 130k deep.
void dfsRecursive(const CsrGraph& graph, int current,
std::vector<int>& visited)
{
visited[current] = 1;
for (int neighbor : neighbors(graph, current)) {
if (visited[neighbor]) continue;
dfsRecursive(graph, neighbor, visited);
}
}
// Iterative DFS with an explicit stack. Same order of visits, no recursion limit.
// The 'stack' here is a heap-allocated std::vector, bounded by graph depth, not by
// the thread stack. This is the version that ships.
void dfsIterative(const CsrGraph& graph, int source,
std::vector<int>& visited)
{
std::vector<int> stack;
stack.push_back(source);
while (!stack.empty()) {
int current = stack.back(); stack.pop_back();
if (visited[current]) continue; // popped a stale entry
visited[current] = 1;
// Push neighbors in reverse so the leftmost neighbor is expanded first.
// Without this, the order is reversed compared to the recursive version.
auto adj = neighbors(graph, current);
for (auto it = adj.rbegin(); it != adj.rend(); ++it) {
if (!visited[*it]) stack.push_back(*it);
}
}
}
Both versions are O(|V| + |E|). The iterative version has one wrinkle: by the time you pop a vertex off the stack, it may have been visited by another path that pushed earlier. The if (visited[current]) continue; check guards that case. Forgetting it doesn't break correctness on a tree, but on a general graph with cross edges it visits the same vertex multiple times.
The three colors and edge classification
The three-color version of DFS, marking vertices white (unvisited), gray (on the recursion stack), or black (finished and popped), gives you something BFS doesn't: every edge gets a classification by when its other endpoint changed color[25]:
- Tree edge. The edge that first discovered the gray vertex on the other side. The set of tree edges across a DFS forms the DFS spanning forest.
- Back edge. Edge from a gray vertex to one of its ancestors in the DFS tree. The presence of a back edge is exactly the condition for a cycle. This is the textbook cycle detector, and the one your engine's render-graph compiler uses to refuse a cyclic dependency.
- Forward edge. Edge from a vertex to a black descendant in its own subtree. Only appears in directed graphs.
- Cross edge. Edge from one DFS subtree to a different, already-finished subtree. Only appears in directed graphs.
Maintaining the three colors is a one-line change: set the color to gray on entry to the recursive call, set it to black on exit. The cost is one extra write per vertex. In return you get cycle detection, topological sort (ยง8), strongly connected components (ยง9), and bridge / articulation-point detection, all for the price of one extra array.
The most common DFS bug in shipped engine code is recursive DFS on user-authored content. A scene with 50,000 entities nested as a linear chain (easy to author by accident in a procedural-prop placer) overflows the thread stack on the recursive traversal. Symptoms: a stack-overflow crash in retail, never reproduced internally because the test scenes have shallow hierarchies. Fix: switch to the iterative version, or set an explicit depth limit and return an error. Unity, Unreal, and Godot have all shipped fixes for this exact failure mode at one point or another.
06BFS vs DFS, side by side
The shape of the frontier is the visible difference. BFS expands layer by layer; DFS spelunks one branch. Both visit every reachable vertex exactly once. Both are O(|V| + |E|). They find different answers to the question "how do I get from s to g": BFS finds the path with the fewest edges; DFS finds any path, often a long one. The widget runs both on the same labyrinth, same source, same goal, at the same pace, so the difference is unambiguous:
07When to use which
The decision rule, ordered roughly by how often it comes up:
| Question | Answer | Why |
|---|---|---|
| Shortest path in edge count, unweighted graph? | BFS | BFS visits in order of distance from source. The first time you pop a vertex, it's the shortest path. |
| Shortest path with non-negative weights? | Dijkstra (ยง10) | BFS but the queue is a priority queue keyed by accumulated cost. |
| Shortest path with a geometry-based heuristic? | A* (ยง11) | Dijkstra with the priority bumped by an admissible estimate of remaining distance. |
| Detect a cycle in a directed graph? | DFS | A back edge (gray-to-gray) exists iff the graph has a cycle. One pass of DFS, one extra color array. |
| Order tasks so dependencies finish first (build system, render graph, animation blend)? | Topological sort (ยง8) | Kahn (BFS-flavored, indegree queue) or DFS postorder. Either works on a DAG. |
| Group nodes by mutual reachability (which assets form a cycle in your import chain)? | Tarjan's SCC (ยง9) | DFS with lowlink. Outputs the cycle membership for every node in one pass. |
| Memory-constrained tree search for an exact solution at large depth? | Iterative deepening (ยง14) | Repeated DFS at increasing depth limits. O(d) memory instead of BFS's O(b^d). |
| Just need to visit every reachable vertex? | Either | Both are O(|V| + |E|). Use DFS for cache locality on a flat array; use BFS for layer ordering. |
Two patterns aren't on the table because they're more subtle. Tree problems with sibling order that matters (behavior trees, the <canvas> render tree, UI hierarchies) want DFS preorder: parent, then children left-to-right. BFS gives layer order, which is almost never what UI or AI wants. Game-state search trees with branching that's too wide for full exploration (chess, Go, RTS opponent modeling) want neither pure BFS nor pure DFS; they want best-first search like Monte Carlo Tree Search or alpha-beta pruning, both of which are BFS/DFS variants with smarter expansion order.
08Topological sort
A DAG always has a linear order on its vertices that respects every edge: every dependency before every dependent. The classic example is a build system: compile util.cpp before main.cpp if main depends on it. The classic engine example is the render graph: the depth pre-pass must finish before the opaque pass that samples its output. Topological sort produces that order.
Two algorithms exist. Kahn's algorithm (1962[12]) is the BFS-flavored one: maintain an indegree count for each vertex, push every zero-indegree vertex onto a queue, pop and emit, decrement the indegree of each neighbor, push any neighbor whose indegree just hit zero. The DFS-postorder variant (CLRS chapter 20[25]; the algorithm-as-such is folklore from the DFS framework, no single 1960s paper claims it) runs DFS and emits each vertex when it turns black; the reversed sequence is a topological order.
// Kahn's topological sort. Returns an order respecting every edge,
// or an empty vector if the graph has a cycle.
//
// Complexity: O(|V| + |E|). Each vertex is enqueued once, each edge is
// inspected once when its source is processed.
std::vector<int> topoSortKahn(const CsrGraph& graph) {
const int vertexCount = graph.offsets.size() - 1;
std::vector<int> indegree(vertexCount, 0);
// First pass: compute every vertex's indegree.
for (int v = 0; v < vertexCount; ++v)
for (int neighbor : neighbors(graph, v))
indegree[neighbor]++;
std::queue<int> ready; // vertices with no remaining prerequisites
for (int v = 0; v < vertexCount; ++v)
if (indegree[v] == 0) ready.push(v);
std::vector<int> order;
order.reserve(vertexCount);
while (!ready.empty()) {
int current = ready.front(); ready.pop();
order.push_back(current);
for (int neighbor : neighbors(graph, current)) {
// 'Consume' the edge from current to neighbor: one fewer prerequisite.
if (--indegree[neighbor] == 0) ready.push(neighbor);
}
}
// If we did not emit every vertex, the unprocessed ones are in a cycle.
if ((int)order.size() != vertexCount) return {};
return order;
}
Kahn's algorithm has a free side benefit: if you ever exit the loop with vertices still unprocessed, those vertices are in (or downstream of) a cycle. Build systems use this to print a useful error: "circular dependency between A, B, C" with the cycle members enumerated. The DFS-postorder version detects cycles by spotting a gray-to-gray edge (a back edge); both work but Kahn's is easier to write the "tell me which nodes are in the cycle" version of.
The widget runs Kahn's on an animation blend graph. Each node has an indegree counter (how many prerequisites haven't shipped); only zero-indegree nodes can be emitted. The order it produces is one valid topological order; there are usually many, and Kahn's picks the one driven by queue order (which itself depends on the initial scan):
09Strongly connected components
A strongly connected component is a maximal set of vertices where every pair can reach each other via directed edges. In a DAG, every SCC is a single vertex (no cycles). In a general directed graph, the SCCs are exactly the cycles, contracted. Engine relevance: when a build system, asset cooker, or render graph reports "circular dependency," what it wants to tell you is "the following SCC has more than one node, here are the names." The two-line answer above ("find the back edge") tells you a cycle exists; the SCC algorithm tells you exactly which nodes are in it.
Two algorithms ship. Tarjan's (1972[16]) runs in one DFS pass, maintaining a discovery-index per vertex and a "lowlink", the smallest discovery index reachable through the DFS subtree. A vertex whose lowlink equals its own discovery index is the root of an SCC, and the stack contents above it form the component. KosarajuโSharir's (Kosaraju 1978 unpublished; Sharir 1981[17]) runs two DFS passes: one to get a finish-order on the original graph, a second on the transposed graph in reverse finish-order, where each DFS tree is one SCC. Both are O(|V| + |E|). Tarjan's is faster in practice because it touches the graph once and skips the transpose; Kosaraju's is shorter to write.
// Tarjan's SCC algorithm. One DFS pass, output one component per call to onComponent.
// 'discovery' is the order each vertex was first seen; 'lowlink' is the smallest
// discovery index reachable from this vertex's subtree.
void tarjanScc(const CsrGraph& graph, std::function<void(std::span<const int>)> onComponent) {
const int vertexCount = graph.offsets.size() - 1;
std::vector<int> discovery(vertexCount, -1);
std::vector<int> lowlink (vertexCount, -1);
std::vector<bool> onStack (vertexCount, false);
std::vector<int> componentStack;
int nextIndex = 0;
auto strongconnect = [&](auto& self, int current) -> void {
discovery[current] = nextIndex;
lowlink [current] = nextIndex;
nextIndex++;
componentStack.push_back(current);
onStack[current] = true;
for (int neighbor : neighbors(graph, current)) {
if (discovery[neighbor] == -1) {
self(self, neighbor); // tree edge: recurse
lowlink[current] = std::min(lowlink[current], lowlink[neighbor]);
} else if (onStack[neighbor]) {
// Back edge into the current SCC under construction.
lowlink[current] = std::min(lowlink[current], discovery[neighbor]);
}
// Else: cross or forward edge into a *finished* SCC. Skip.
}
// If 'current' is the root of an SCC, pop it off the stack.
if (lowlink[current] == discovery[current]) {
std::vector<int> component;
while (true) {
int popped = componentStack.back();
componentStack.pop_back();
onStack[popped] = false;
component.push_back(popped);
if (popped == current) break;
}
onComponent(component);
}
};
for (int v = 0; v < vertexCount; ++v)
if (discovery[v] == -1) strongconnect(strongconnect, v);
}
The recursive form blows up on deep graphs the same way recursive DFS does. Production code uses an explicit stack with a small state machine; there are well-tested iterative versions in the LLVM (used to identify SCCs in the call graph for inlining) and Boost.Graph codebases. The version above is the textbook one. Use the iterative form in shipping code, especially when the graph can be authored by users.
10Shortest paths
BFS already solves the shortest-path problem when every edge costs the same: the first time it pops a vertex, that vertex is as close to the source as it will ever be. Weights break that. Give one edge a cost of 10 while its neighbors cost 1, and BFS, which counts edges and not cost, hands back a two-edge path costing 11 over a four-edge path costing 4. Dijkstra's algorithm is the fix, and A* (ยง11) is just Dijkstra with a heuristic bolted on top.
What Dijkstra does
Dijkstra's algorithm[9] finds the lowest-cost path from one source to every other vertex, on a graph whose edge weights are all non-negative. It keeps one number per vertex, a tentative distance (0 for the source, โ for everything else), and a growing set of settled vertices whose distance is already final. Then it repeats a single move until no unsettled vertex is reachable:
- Settle the nearest. Take the unsettled vertex with the smallest tentative distance. Its tentative distance is now final: mark it settled.
- Relax its edges. For each neighbor reached by an edge of weight w, check whether (the settled vertex's distance) + w beats the neighbor's current tentative distance. If it does, lower the neighbor's distance and record the settled vertex as its parent.
That is the entire algorithm. Structurally it is BFS with the frontier changed from a FIFO queue to a min-priority queue keyed on tentative distance. BFS's FIFO queue always returns the vertex that has waited longest, which on a unit-weight graph is exactly the nearest unsettled vertex; keying on accumulated cost instead makes "nearest" mean cheapest-so-far rather than fewest-edges. Relaxing then does one thing BFS never has to: it can lower a vertex's tentative distance after first discovery, because on a weighted graph a cheaper route can surface later. Give every edge weight 1 and that never fires, so the priority queue hands vertices back in the same distance order BFS's FIFO queue does. That coincidence is what "Dijkstra is BFS with a priority queue" means.
Path reconstruction is the same backward walk BFS uses: each relaxation records a parent, and the shortest path to any vertex is its parent chain followed back to the source. With a binary heap for the priority queue the whole search is O((|V| + |E|) log |V|); a Fibonacci heap lowers the bound to O(|E| + |V| log |V|), and bucket or radix queues do better still when edge weights are small integers. In practice the Fibonacci-heap constants are bad enough that a plain binary heap usually wins anyway.
Why settling the nearest vertex is safe
Dijkstra rests on one claim: the moment you settle the unsettled vertex with the smallest tentative distance, that distance is already final. Nothing found later can beat it. Suppose a cheaper path to that vertex did exist. Trace it from the source; it has to leave the settled set somewhere, at some still-unsettled vertex y. The cost to reach y along that path is at least y's own tentative distance, because relaxing the settled frontier already found the cheapest way in; y is unsettled, so its tentative distance is at least the one you are about to settle; and every edge past y adds a non-negative amount. The "cheaper" path therefore costs at least as much, and the contradiction is gone.[25] That single step is where the non-negative-weight requirement lives: one negative edge past y could drag the total back down, the guarantee fails, and you need BellmanโFord instead (covered below, and flagged again in ยง16).
The widget runs Dijkstra on terrain with cost. Pale cells are cheap (cost 1); the amber-hatched bands are slow (cost 8). It settles one vertex per tick, tints each by its final cost from the source, and labels the accumulated cost. The wavefront is a cost contour, not the even ring BFS painted in ยง4: it surges through cheap ground and lags against the slow bands, because a longer detour over cheap cells outranks a short hop through expensive ones.
// Dijkstra's algorithm with a binary heap.
// Returns a dist[] array; dist[v] = shortest cost from source to v, or +inf.
//
// Complexity: O((|V| + |E|) log |V|). Each vertex is pushed at most |V| times in
// the worst case (the 'lazy deletion' style below); the priority queue handles
// up to |E| insertions. For dense graphs an array-based priority queue is faster.
void dijkstra(const WeightedCsr& graph, int source,
std::vector<double>& dist,
std::vector<int>& parent)
{
const int vertexCount = graph.offsets.size() - 1;
dist .assign(vertexCount, std::numeric_limits<double>::infinity());
parent.assign(vertexCount, -1);
using Entry = std::pair<double, int>; // (cost, vertex)
std::priority_queue<Entry, std::vector<Entry>, std::greater<Entry>> frontier;
dist[source] = 0.0;
frontier.emplace(0.0, source);
while (!frontier.empty()) {
auto [cost, current] = frontier.top(); frontier.pop();
if (cost > dist[current]) continue; // stale entry, skip
for (auto [neighbor, weight] : weighted_neighbors(graph, current)) {
double newCost = cost + weight;
if (newCost < dist[neighbor]) {
dist [neighbor] = newCost;
parent[neighbor] = current;
frontier.emplace(newCost, neighbor); // lazy: leave stale entries
}
}
}
}
Three implementation details to get right or you'll have a slow or wrong Dijkstra:
- Lazy deletion. When a vertex's distance improves, the old entry stays in the heap. The
if (cost > dist[current]) continue;skips it on pop. The alternative, finding and removing the old entry, needs a heap that supports decrease-key, whichstd::priority_queuedoesn't. - Negative weights kill correctness. Dijkstra's invariant ("the first time we pop v, dist[v] is final") relies on edges being non-negative. A negative edge can update a "finished" vertex retroactively. If your weights are sometimes negative, use BellmanโFord.
- Floating-point ties. Two paths with equal cost can take different routes depending on heap order; on a navmesh with diagonal moves and floating-point edge lengths, the chosen path can flip frame-to-frame for the same query. Round to a stable precision, or break ties on a secondary deterministic key (vertex id).
The rest of the shortest-path family
Dijkstra is the default, but it is one member of a family. Each of the others gives up Dijkstra's single-source scope or its non-negative-weight assumption to buy something Dijkstra can't:
- BFS is Dijkstra for unit weights, at O(|V| + |E|) with a plain FIFO queue instead of a heap. When every edge costs the same, the heap is wasted overhead.
- 0-1 BFS handles edge weights that are only 0 or 1. Use a
std::deque: push 0-weight neighbors to the front, 1-weight neighbors to the back. Still O(|V| + |E|), no heap needed. - BellmanโFord (Bellman 1958[8], Ford 1956[6], Moore 1959[7]) is the one to reach for when weights can be negative. Relax every edge |V|โ1 times. O(|V| ยท |E|) worst case, and it detects negative cycles for free: if any edge still relaxes on the |V|th pass, a negative cycle exists.
- FloydโWarshall computes shortest paths between every pair of vertices in O(|V|ยณ). For a small dense graph (a 16ร16 cost table between AI tactical regions) it's the right tool; for a navmesh with 10,000 polys it's not.
11A* and admissible heuristics
A* is Dijkstra with one change: the priority isn't g(v) (cost from source) but g(v) + h(v), where h(v) is an estimate of the cost from v to the goal[13]. The estimate biases expansion toward the goal: vertices that look close get popped first. If the heuristic is good, A* expands a small fraction of the vertices Dijkstra would. If the heuristic is h(v) = 0, A* is exactly Dijkstra.
The 1968 paper's main contribution wasn't the algorithm (heuristic search was already in the air) but the proof of when it stays optimal:
If the heuristic h is admissible (never overestimates the true remaining cost h*(v)), then A* with that heuristic finds an optimal solution.
If the heuristic is also consistent (sometimes called monotonic): h(v) โค cost(v, u) + h(u) for every edge vโu, then A* never expands a vertex more than once. The 1972 SIGART correction[14] tightens the relationship between these two conditions and the optimality of the open-set behavior.
On a 2D grid, the canonical admissible heuristics are:
- Manhattan distance |ฮx| + |ฮy|: admissible for 4-connected grids (only N/S/E/W moves).
- Octile distance max(|ฮx|, |ฮy|) + (โ2โ1) ยท min(|ฮx|, |ฮy|): admissible for 8-connected grids with diagonal cost โ2.
- Euclidean distance โ(ฮxยฒ + ฮyยฒ): admissible for any grid; loose for the 4- and 8-connected cases. Use only when the grid allows arbitrary-angle moves (rare in tiles, common on navmesh).
On a navmesh, the canonical heuristic is straight-line distance between polygon centroids. It's admissible (actual walking distance โฅ straight-line distance) and works well as long as the polygon decomposition is reasonably uniform. Multiply the heuristic by a constant > 1 (weighted A*) and A* expands fewer nodes but loses optimality; this is the standard speed/quality knob in shipped pathfinding[26].
The widget runs Dijkstra and A* side by side on the same grid, same source and goal. A heuristic-weight slider goes from 0 (= Dijkstra) past 1 (admissible) to 2 (greedy and possibly suboptimal). Watch the expanded set shrink as the heuristic gets stronger, and break optimality once the weight exceeds 1:
12Bidirectional search
Pohl's observation[15]: if a uniform-branching graph has branching factor b and the goal is at depth d, a single-direction BFS explores ~bd vertices to find it. Two BFS frontiers (one from the source, one from the goal, alternating expansion) meet in the middle after expanding ~2 ยท bd/2. The branching factor still applies, but the exponent halves. On a graph with b = 10 and d = 12, the single-direction count is 1012; the bidirectional count is 2 ยท 106. The same speedup carries over to bidirectional A*, with the additional subtlety that the two heuristics have to be consistent across the meeting boundary or the path you reconstruct isn't optimal.
The catch is detection. The frontiers must exchange visited sets; the meeting point isn't necessarily the first vertex one frontier pops that the other has seen; it's the vertex that minimizes distfwd(v) + distback(v). Get that wrong and the path you return isn't shortest.
13JPS and hierarchical pathfinding
Two optimizations turn A* from "works fine on 100ร100 grids" into "works on 4096ร4096 grids at interactive rates":
Jump Point Search (Harabor & Grastien, AAAI 2011[20]) exploits a symmetry property of uniform-cost 8-connected grids: many paths through open space are interchangeable. JPS prunes the search by skipping intermediate cells that no optimal path needs to expand, jumping directly to "interesting" cells (corners of obstacles, the goal). Reported speedups on the standard Dragon Age and HOG2 benchmarks are an order of magnitude over plain A*. JPS is restricted to uniform-cost grids (it breaks if your terrain has variable cost), but for tile-based games with binary walkable/unwalkable terrain it's the dominant choice.
Hierarchical Pathfinding A* (HPA*, Botea, Mรผller, Schaeffer, 2004[19]) partitions the map into clusters at one or more abstraction levels. At cook time, compute the shortest path between every pair of cluster-boundary nodes (the "abstract graph"). At query time, A* runs in three phases: connect the source to its cluster's boundary nodes, A* on the abstract graph between clusters, then refine each abstract edge into a real path. The Botea paper reports about 10ร speedup on standard benchmarks with ~1% path-length suboptimality. Variants (HPA*-LRTA, PRA*, MM*) tune the trade.
Most shipped game pathfinding stacks aren't pure A* or pure JPS or pure HPA*; they're a navmesh built by Recast[2] (cluster-into-polygons-at-cook-time, like HPA* but the partition is geometric) walked at runtime by Detour, which runs weighted A* on the polygon graph with a Euclidean heuristic. The HPA*-style hierarchy is implicit in the navmesh; the A* runs on the polygon graph, which has tens of thousands of polys instead of millions of cells. Recast/Detour is the codebase behind Unreal's ARecastNavMesh class, Unity's NavMesh, Godot's navigation, and dozens of indie titles. Mikko Mononen released it as open source in 2009 under zlib.
14Iterative deepening
BFS finds the shortest path but uses memory proportional to the frontier; for a uniform tree with branching b and goal at depth d, that's O(bd). DFS uses memory proportional to depth (O(d)) but doesn't find shortest paths.
Iterative-deepening DFS (Korf 1985[18]) gets both. Run depth-limited DFS with limit 1; if not found, throw it away and run again with limit 2; then 3; and so on. The repeated work looks wasteful but isn't: on a uniformly branching tree, the last iteration does most of the work (bd nodes), and all earlier iterations combined do less (the geometric sum is bd/(bโ1)). For b = 10 the overhead is about 11%.
IDA* in the same Korf paper combines this with A*'s heuristic: at each iteration, the cutoff is on f(v) = g(v) + h(v), not depth. IDA* is what every classical-AI puzzle solver from the late 1980s onward uses for problems where the state space is too large for A*'s open set to fit in memory (sliding-tile puzzles, Rubik's cube, sokoban). For game-engine navigation it's almost never the right choice (your navmesh fits in memory and A* is faster), but for game-state search trees it's standard.
15Case studies
1 ยท The render-graph compile
Modern engines (Frostbite's FrameGraph[27], Unreal's RDG, Unity's SRP RenderGraph) compile each frame's rendering work into a DAG of passes, where each pass declares the resources it reads and writes. The compile step does three things, all graph traversals:
- Topological sort (ยง8) gives a linear execution order respecting every read-after-write dependency. Kahn's algorithm runs over the indegree of each pass.
- Reverse-reachability culling (BFS from the swap-chain output, backwards along edges) marks every pass that contributes to a final visible output. Passes whose outputs nothing reads are dead-code-eliminated.
- Resource lifetime computation (a forward DFS labeling each resource's first-use and last-use pass) lets the renderer alias transient resources: two render targets whose lifetimes don't overlap can share the same physical allocation.
The cycle detector in step 1 is the same DFS-back-edge check from ยง5. If a pass writes A then reads A in the same frame, that's a cycle, and the compiler refuses it.
2 ยท F.E.A.R.'s GOAP: A* over actions
F.E.A.R. (Monolith, 2005) shipped a planner instead of a hand-authored behavior tree. Jeff Orkin's GDC 2006 talk[3] describes the system: each NPC has a list of actions (each with preconditions and effects), and the AI runs A* on a graph where nodes are world states and edges are actions. The heuristic is the count of unsatisfied goal conditions. The frame-rate cost is bounded by an action cap (~12 actions per agent on PC) so the search space stays tractable. F.E.A.R.'s NPCs felt smart partly because they would compose action sequences the designers hadn't scripted (vault a cover, flank, melee) directly out of the search. The state graph is implicit; A* generates it as it expands.
3 ยท Asset cooking and the SCC report
Every engine's asset cooker walks a dependency DAG: an animation graph references a skeleton, which references a mesh, which references a material, which references shaders and textures. Cooking visits the DAG bottom-up (reverse topological order: each dependency is baked before each dependent). When a content author accidentally introduces a cycle (usually two materials that reference each other through a function), the cooker uses Tarjan's SCC (ยง9) to report the cycle members. The error message "circular dependency detected: M_RustyBarrel โ MF_RustNoise โ M_RustyBarrel" comes directly out of the SCC algorithm: the cycle is the SCC with more than one node, and the components are the lowlink-marked stack pops.
4 ยท Scene-graph traversal: flat preorder beats recursive
Unity and Unreal both store the scene hierarchy in a flat array sorted by preorder traversal: parents appear before their children, and siblings are contiguous. Computing world transforms is a single sequential pass over the array: each entity's world transform is its local transform composed with its parent's world transform, and the parent has been processed already because of the preorder layout. No queue. No stack. No pointer chase. The asymptotic class is the same as recursive DFS; the constant factor is between one and two orders of magnitude better. Unity's DOTS Transform system is the explicit version of this design[28]; Unreal's Mass Entity / ECS subsystem moved toward the same shape over the 5.x cycle.
5 ยท CryEngine and Unreal navigation
CryEngine documents its pathfinding as A* over a triangulated navmesh[29], with a special Volume Navigation Region for full-3D flight in Crysis. Unreal's navigation system documentation[30] describes the navmesh as Recast-derived and the runtime walker as Detour. Unity's NavMesh build pipeline uses a Recast-based mesh construction phase[31]. The dominance of this codebase across AAA engines is one of the cleaner examples of open-source absorbing the field.
16Pitfalls
Bugs in graph-traversal code, ranked by frequency:
- Marking visited on expand instead of on discovery. The BFS bug: if you mark a vertex as visited when you pop it instead of when you push it, the same vertex can be pushed multiple times by different neighbors. The queue grows to ฮ(|E|) instead of ฮ(|V|). On dense graphs this OOMs.
- Recursive DFS on user content. Already covered in ยง5. The fix is the iterative form. If you keep the recursive form, set an explicit depth limit and fail loudly when it's exceeded.
- BFS on a weighted graph. BFS returns the path with the fewest edges, not the fewest weight units. If your edges have weights and you care about cost, use Dijkstra. The bug usually shipped as "the NPC chose the longer road because it had fewer waypoints."
- Dijkstra with negative weights. Dijkstra's correctness depends on the invariant that once a vertex is popped, its distance is final. A negative edge can update a finalized vertex, and Dijkstra never reconsiders it. Use BellmanโFord if your weights are sometimes negative.
- A* with an inadmissible heuristic. If h(v) > h*(v) for some v, A* can return a suboptimal path. Often deliberate (weighted A* is faster), but should be explicit, not an oversight in the heuristic.
- Floating-point ties in shortest-path. Two paths with equal cost can be returned by different runs of the same query, especially with floats. Break ties on a deterministic secondary key (lowest vertex id, last edge added), or round costs to a stable precision.
- SCC report says "no cycle" but the build fails with a circular dependency. The dependency graph the build system sees and the SCC algorithm runs on can disagree, usually because one is computed before a code-gen step and the other after. The SCC algorithm is correct; verify you're feeding it the post-codegen graph.
- Topological sort isn't stable. Kahn's order depends on the queue order, which depends on the initial scan order. If you rely on a specific topological order (for reproducible builds, for the order in which render passes execute), tie-break the queue on a deterministic key like vertex id.
- Adjacency-list reallocation invalidating iterators mid-traversal. Calling
graph.add_edge(...)while a BFS is in progress and the underlyingstd::vector<int>reallocates leaves the BFS iterating into freed memory. Either snapshot the graph, or use a representation that doesn't move (CSR with capacity reserved up front).
17What's next
- Best-first search beyond A*. D* / D* Lite for dynamic environments (paths through a changing graph), Theta* for any-angle paths, MM* for memory-bounded search. Standard reading: Stentz 1995, Daniel et al. 2010.
- Min-cost flow and max flow. EdmondsโKarp (BFS-based augmenting paths), Dinic, push-relabel. The graph-algorithm family that handles allocation problems: bandwidth, scheduling, network reliability.
- Minimum spanning trees. Kruskal (sort edges, union-find), Prim (BFS with a priority queue, weight key). Dijkstra's 1959 paper covers Prim in addition to shortest path; both algorithms are in the same three-page note.
- Graph partitioning and clustering. METIS, KaHIP. For HPA*-style hierarchical pathfinding, scene-graph load balancing across worker threads, and asset-bundle planning.
- GPU graph algorithms. Direction-optimizing BFS (Beamer 2012[24]), Gunrock, nvGRAPH. For per-frame pathfinding on swarms of thousands of units.
- Persistent data structures. Building a graph traversal that snapshots its visited set at each step for time-travel debugging. Debugging in release covers the broader topic.
For the engine programmer, the most useful next step is to look at your engine's render-graph compiler and read the topological sort. It's two screens of code that decide the order in which everything renders. If it's Kahn's algorithm, you'll recognize it.
Sources
- Yuriy O'Donnell. "FrameGraph: Extensible Rendering Architecture in Frostbite." Game Developers Conference 2017. gdcvault.com/play/1024612. The canonical engineering write-up on render-graph compilation, including the topological sort, resource-aliasing, and reverse-reachability culling described in ยง15.
- Mikko Mononen et al. "Recast Navigation." GitHub repository (zlib license), originally released 2009. github.com/recastnavigation/recastnavigation. README explicitly lists Mononen as the original author and notes the library "has shipped in countless indie games, AAA titles, and commercial and open-source game engines such as Unity, Unreal, Godot, and O3DE."
- Jeff Orkin. "Three states and a plan: The A.I. of F.E.A.R." Game Developers Conference 2006. gamedevs.org/uploads/three-states-plan-ai-of-fear.pdf. The source for F.E.A.R.'s GOAP system; the agent FSM is reduced to three states (Goto / Animate / Use Smart Object) and planning is A* over actions.
- Peter Norvig. "Teach Yourself Programming in Ten Years," "Approximate timing for various operations" table. norvig.com/21-days.html. Memory-hierarchy latency ballparks used to estimate per-node traversal cost: L1 ~0.5 ns, branch mispredict ~5 ns, DRAM ~100 ns.
- รdouard Lucas. Rรฉcrรฉations mathรฉmatiques, Vol. I, 2nd ed., Paris: Gauthier-Villars, 1882. The chapter on labyrinths attributes the maze procedure (depth-first search with edge marking) to Charles Pierre Trรฉmaux. archive.org hosts a public-domain scan.
- Lester R. Ford, Jr. Network Flow Theory. Paper P-923, The RAND Corporation, August 14, 1956. rand.org/pubs/papers/P923. The relaxation-style shortest-path procedure that, with Bellman 1958 and Moore 1959, is the joint origin of BellmanโFord.
- Edward F. Moore. "The shortest path through a maze." In Proceedings of the International Symposium on the Theory of Switching, Part II, April 2โ5, 1957, Annals of the Computation Laboratory of Harvard University, Vol. 30, pp. 285โ292. Harvard University Press, 1959. The first published BFS; motivated by routing toll telephone traffic at Bell Labs.
- Richard Bellman. "On a Routing Problem." Quarterly of Applied Mathematics 16(1):87โ90, 1958. DOI: 10.1090/qam/102435. ams.org/journals/qam/1958-16-01. The dynamic-programming functional equation for shortest paths.
- Edsger W. Dijkstra. "A note on two problems in connexion with graphs." Numerische Mathematik 1(1):269โ271, 1959. DOI: 10.1007/BF01386390. link.springer.com/article/10.1007/BF01386390. Three pages, two algorithms: shortest path between two given vertices, and minimum spanning tree.
- Thomas J. Misa, Philip L. Frana. "An Interview with Edsger W. Dijkstra." Communications of the ACM 53(8):41โ47, August 2010. cacm.acm.org/research/an-interview-with-edsger-w-dijkstra. Dijkstra's own account of designing the shortest-path algorithm in about twenty minutes in a 1956 Amsterdam cafรฉ "without pencil and paper."
- C. Y. Lee. "An algorithm for path connections and its applications." IRE Transactions on Electronic Computers EC-10(3):346โ365, 1961. DOI: 10.1109/TEC.1961.5219222. BFS independently re-derived for circuit routing; the "Lee algorithm" of EDA textbooks.
- Arthur B. Kahn. "Topological sorting of large networks." Communications of the ACM 5(11):558โ562, November 1962. DOI: 10.1145/368996.369025. dl.acm.org/doi/10.1145/368996.369025. The indegree-queue algorithm; motivated by ordering a 30,000-activity PERT chart at Westinghouse.
- Peter E. Hart, Nils J. Nilsson, Bertram Raphael. "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." IEEE Transactions on Systems Science and Cybernetics SSC-4(2):100โ107, 1968. DOI: 10.1109/TSSC.1968.300136. ieeexplore.ieee.org/document/4082128. The A* paper; introduces the algorithm and proves the admissibility theorem.
- Peter E. Hart, Nils J. Nilsson, Bertram Raphael. "Correction to 'A formal basis for the heuristic determination of minimum cost paths'." SIGART Newsletter 37:28โ29, 1972. DOI: 10.1145/1056777.1056779. dl.acm.org/doi/10.1145/1056777.1056779. The one-page correction that tightens the admissibility/consistency relationship.
- Ira Pohl. Bi-directional and heuristic search in path problems. Ph.D. dissertation, Stanford University, May 1969. Also issued as SLAC report SLAC-104. slac.stanford.edu/pubs/slacreports/reports04/slac-r-104.pdf. The original framing of bidirectional heuristic search; the conventionally cited "Pohl 1971" is the Machine Intelligence 6 chapter that followed.
- Robert Tarjan. "Depth-first search and linear graph algorithms." SIAM Journal on Computing 1(2):146โ160, 1972. DOI: 10.1137/0201010. epubs.siam.org/doi/10.1137/0201010. The paper that formalizes DFS as a general technique and gives the linear-time SCC and biconnected-components algorithms.
- Micha Sharir. "A strong-connectivity algorithm and its applications in data flow analysis." Computers & Mathematics with Applications 7(1):67โ72, 1981. DOI: 10.1016/0898-1221(81)90008-0. sciencedirect.com/science/article/pii/0898122181900080. The two-pass SCC algorithm; credited to an unpublished 1978 lecture by S. Rao Kosaraju at Johns Hopkins, hence "KosarajuโSharir."
- Richard E. Korf. "Depth-first iterative-deepening: An optimal admissible tree search." Artificial Intelligence 27(1):97โ109, 1985. DOI: 10.1016/0004-3702(85)90084-0. sciencedirect.com/science/article/abs/pii/0004370285900840. The IDDFS paper; also introduces IDA*.
- Adi Botea, Martin Mรผller, Jonathan Schaeffer. "Near optimal hierarchical path-finding." Journal of Game Development 1(1):7โ28, 2004. webdocs.cs.ualberta.ca/~mmueller/ps/2004/hpastar.pdf. The HPA* paper; reports ~10ร speedup on standard benchmarks with ~1% path-length suboptimality.
- Daniel Harabor, Alban Grastien. "Online graph pruning for pathfinding on grid maps." In Proc. AAAI Conference on Artificial Intelligence (AAAI-11), pp. 1114โ1119, 2011. ojs.aaai.org/index.php/AAAI/article/view/7994. The Jump Point Search paper; A*-on-grid speedups via symmetry pruning, order-of-magnitude on HOG2 benchmarks.
- Kurt Mehlhorn, Peter Sanders. Algorithms and Data Structures: The Basic Toolbox. Springer, 2008, chapter 8 "Graph Representation." people.mpi-inf.mpg.de/~mehlhorn/ftp/NewToolbox/grepresent.pdf. The asymptotic comparison of adjacency list, adjacency matrix, and edge list representations.
- Jure Leskovec, Rok Sosiฤ. "SNAP: A general-purpose network analysis and graph-mining library." ACM Transactions on Intelligent Systems and Technology 8(1):1:1โ1:20, 2016. DOI: 10.1145/2898361. snap.stanford.edu/snap. Stanford's reference graph library; uses CSR throughout.
- Graph500 Steering Committee. "Graph500 benchmark specification." graph500.org/?page_id=12. The reference BFS benchmark spec used in the parallel-graph-algorithms literature; defines scale-S as a Kronecker graph with 2S vertices.
- Scott Beamer. Understanding and Improving Graph Algorithm Performance. Ph.D. thesis, UC Berkeley, 2016. escholarship.org/uc/item/4jp775f7. Direction-optimizing BFS, the GAP benchmark suite, and per-graph throughput figures for BFS on real-world inputs (typically 108โ109 edges/s per core depending on graph structure).
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms, 4th ed. MIT Press, 2022. ISBN 9780262046305. mitpress.mit.edu/9780262046305. Chapter 20 covers BFS, DFS, topological sort, and SCC. Chapters 22โ23 cover shortest paths. The 1st edition (1990) introduced the white/gray/black DFS notation.
- Amit Patel. "Introduction to A*." Red Blob Games, ongoing. redblobgames.com/pathfinding/a-star/introduction.html. Best-in-class interactive explanation of BFS, Dijkstra, and A* with weighted variants. The companion implementation guide is the reference most game-AI courses point to.
- Yuriy O'Donnell. "FrameGraph: Extensible Rendering Architecture in Frostbite." GDC 2017. Same talk as [1]. The render-graph compiler in ยง15 is described in detail starting at minute 22.
- Unity Technologies. "Transforms package documentation." Unity DOTS / Entities documentation. docs.unity3d.com/Packages/com.unity.entities/transforms-concepts.html. Source for the flat-preorder transform hierarchy layout described in case study 4.