All tutorials Mighty Professional
Tutorial 13 ยท Engine Programming

Graph Traversal

Every system in an engine that has dependencies is a graph: the scene hierarchy, the render-pass DAG, the asset import chain, the navmesh, the AI behavior tree. The two algorithms that walk them are breadth-first search and depth-first search. From those two come topological sort, strongly connected components, shortest paths, A*, and the rest of the pathfinding family. We build each one from scratch, run them live on the same maze so the difference is visible, and end on the AAA case studies that decide which variant ships.

Time~55 min LevelJunior to mid; review for senior PrereqsYou can read C++ or pseudocode. Queues and stacks. Big O (read the prior tutorial if it feels rusty). HardwareNone. A feel for cache locality helps in ยง3 and ยง15.

01Why traverse a graph

A modern engine's frame is a stack of graph walks. The scene graph is a tree (or a DAG, once you allow instancing): the engine traverses it every frame to compute world transforms. The render graph[1] is a DAG of passes whose dependencies decide barrier insertion, resource lifetimes, and async-compute scheduling, built by a topological sort, walked depth-first. The asset import pipeline is a DAG of source files, import settings, and cooked artifacts; cooking visits it bottom-up so dependencies are baked before dependents. The navmesh is a planar graph of triangles or polygons that A* searches whenever an NPC decides to walk. AI behavior trees are DFS-walked every tick. Animation blend graphs are evaluated by topological order. Even the shader compile cache is a graph: an HLSL include-tree the toolchain walks to compute a content hash.

All of that runs on two procedures: breadth-first search and depth-first search. Everything else is a specialization. Dijkstra is BFS with a priority queue. A* is Dijkstra with a heuristic. Topological sort is DFS with a postorder. Tarjan's SCC is DFS with a lowlink trick. Get the two base algorithms right and the rest reduces to "what data structure do I use for the frontier."

What you'll have by the end

A working mental model of every graph walk that ships in a real engine. Adjacency representations and when each one is right. BFS and DFS with the three-color convention, edge classification, and the recursion-depth bug that crashes the scene graph on deeply nested prefabs. Topological sort by Kahn's algorithm and by DFS postorder. Strongly connected components by Tarjan's lowlink. Dijkstra, Bellmanโ€“Ford, and the A* admissibility theorem. Bidirectional search, jump-point search, hierarchical pathfinding (HPA*). And the engine case studies: how Recast and Detour [2] build the navmesh Unreal, Unity, and Godot ship, why F.E.A.R. ran A* over actions instead of waypoints[3], and how a render-graph DAG turns into a barrier list.

The cost of the wrong traversal

A scene graph with 50,000 entities, traversed once per frame at 60 fps, gives you 333 ns per node to compute the world transform and push it onto the GPU upload buffer. A BFS using std::queue<Entity*> with the default std::deque allocator burns several hundred nanoseconds per push and pop on heap traffic alone, well before you touch the node itself. The same walk implemented as a depth-first stack over a flat array of preorder-sorted entities is a sequential scan: one cache line per few nodes, no allocation, no pointer chase. The asymptotic class is the same. The constant factor is the entire frame.

The widget below puts the two on the same axis: entity count along X, microseconds along Y. The "pointer-chase BFS" curve assumes one DRAM-miss-equivalent per node (a hundred-ish nanoseconds on modern hardware, Norvig's ballpark[4]). The "flat-array DFS preorder" curve assumes one L1 hit per node (about half a nanosecond). The gap is between two and three orders of magnitude:

Live ยท Traversal cost at frame scale
pointer-chase BFS
ยทยทยท
flat-array DFS
ยทยทยท
1 ms scene-graph budget
1000 ยตs
Both curves are linear in n; the asymptotic class is identical. The slope is set by the cost of visiting a node, not by the algorithm. Pointer chasing through a heap-allocated std::queue hits DRAM on most pops because the dequeued entity pointer is cold. A preorder-sorted flat array streams from cache. Same traversal, two orders of magnitude apart.

02A short history

Graph theory predates computing by two centuries (Euler, Seven Bridges of Kรถnigsberg, 1736). Graph algorithms arrived with electromechanical switching and didn't slow down. The dates that matter for engine work:

1882
Trรฉmaux's maze rule. ร‰douard Lucas publishes Rรฉcrรฉations mathรฉmatiques[5] and attributes a labyrinth-solving procedure to Charles Pierre Trรฉmaux, a 19th-century French telegraph engineer. The rule (mark each passage as you enter and leave, never re-enter a passage marked twice) is depth-first search on a planar graph, eighty years before anyone wrote it as code.
1956
Ford publishes a relaxation-style shortest-path procedure. Lester R. Ford Jr., Network Flow Theory, RAND paper P-923[6]. The functional equation that, two years later, becomes one half of the Bellmanโ€“Ford algorithm.
1957
Moore presents BFS at Harvard. "The shortest path through a maze" at the International Symposium on the Theory of Switching, April 1957; published in the proceedings volume in 1959[7]. Moore's motivation was routing toll telephone traffic at Bell Labs. The algorithm is the first published BFS.
1958
Bellman publishes the DP functional equation. Richard Bellman, "On a Routing Problem," Quarterly of Applied Mathematics 16(1):87โ€“90[8]. With Ford 1956 and Moore 1959, this becomes the textbook "Bellmanโ€“Ford" shortest-path algorithm. Some authors call it "Bellmanโ€“Fordโ€“Moore" for accurate attribution.
1959
Dijkstra writes his shortest-path algorithm in a cafรฉ. Edsger Dijkstra, "A note on two problems in connexion with graphs," Numerische Mathematik 1:269โ€“271[9]. By his own account in a 2010 CACM interview[10] he designed the algorithm in about twenty minutes, in a cafรฉ in Amsterdam with his fiancรฉe, without pencil or paper. The paper covers two problems: shortest path between two given vertices, and the minimum spanning tree.
1961
Lee re-derives BFS for circuit routing. C.Y. Lee, "An algorithm for path connections and its applications," IRE Transactions on Electronic Computers EC-10(3)[11]. The "Lee algorithm" is BFS on a 2D grid; it is still in the textbook for EDA maze routing and is mathematically identical to Moore's procedure.
1962
Kahn publishes topological sort. Arthur B. Kahn, "Topological sorting of large networks," Communications of the ACM 5(11):558โ€“562[12]. Motivated by ordering activities in a 30,000-node PERT chart at Westinghouse Baltimore. The indegree-zero queue is what every build system since (make, Bazel, Ninja, Unreal's render graph) ultimately runs.
1968
Hart, Nilsson, and Raphael publish A*. Peter Hart, Nils Nilsson, Bertram Raphael, "A formal basis for the heuristic determination of minimum cost paths," IEEE Transactions on Systems Science and Cybernetics SSC-4(2):100โ€“107[13]. Written at SRI to drive Shakey the Robot. The 1972 correction in SIGART Newsletter[14] tightens the admissibility/consistency conditions for the optimality proof.
1969
Pohl's bidirectional search dissertation. Ira Pohl, Bi-directional and Heuristic Search in Path Problems, Stanford Ph.D., issued as SLAC report SLAC-104[15]. Two BFS frontiers walk toward each other from the source and the goal; the meeting point is the path. The 1971 Machine Intelligence 6 chapter is the version most papers cite.
1972
Tarjan formalizes DFS. Robert Tarjan, "Depth-first search and linear graph algorithms," SIAM Journal on Computing 1(2):146โ€“160[16]. The paper that turns DFS from a folk procedure into a general technique, with linear-time algorithms for strongly connected components and biconnected components. Every modern engine's cycle detector descends from this.
1981
Sharir publishes the two-pass SCC algorithm. Micha Sharir, "A strong-connectivity algorithm and its applications in data flow analysis," Computers & Mathematics with Applications 7(1):67โ€“72[17]. Sharir credits an unpublished 1978 lecture by S. Rao Kosaraju at Johns Hopkins; the algorithm is now called Kosarajuโ€“Sharir. Simpler than Tarjan's, two DFS passes instead of one.
1985
Korf's iterative-deepening DFS. Richard Korf, "Depth-first iterative-deepening: An optimal admissible tree search," Artificial Intelligence 27(1):97โ€“109[18]. The same paper introduces IDA*. IDDFS gets BFS's optimality with DFS's memory. It's how every classical-AI puzzle solver from the late 1980s onward fits in memory.
2004
Botea, Mรผller, Schaeffer publish HPA*. "Near optimal hierarchical path-finding," Journal of Game Development 1(1):7โ€“28[19]. The paper that turns large-map A* into a multi-level search: cluster the map, solve inter-cluster paths once, do local A* per query. Roughly 10ร— speedup with ~1% path-length suboptimality.
2009
Recast and Detour go open source. Mikko Mononen releases the navmesh generator (Recast) and runtime (Detour) under zlib[2]. Used directly by Unreal (the class is literally ARecastNavMesh), Godot, O3DE, and adopted into Unity's NavMesh build pipeline. The dominant CPU-pathfinding codebase in shipped games since.
2011
Harabor and Grastien publish Jump Point Search. "Online graph pruning for pathfinding on grid maps," AAAI 2011[20]. JPS exploits the symmetry of uniform-cost 8-connected grids to skip entire runs of intermediate nodes A* would otherwise expand. Order-of-magnitude speedups on the HOG2 grid benchmarks; standard in modern grid-based games.

03Graphs and how to store them

A graph is a set of vertices and a set of edges between them. Engine-side, vertices are entities, render passes, scene nodes, navmesh polygons; edges are parent-child links, resource dependencies, navmesh adjacencies, behavior-tree transitions. The shape of the graph and what you do with it dictates the storage:

Three storage layouts cover every case you'll meet:

LayoutStorage"Edge exists?""Neighbors of v"When
Adjacency list ฮ˜(|V| + |E|) O(deg v) O(deg v) Sparse graphs (most engine graphs). Default choice.
Adjacency matrix ฮ˜(|V|ยฒ) O(1) ฮ˜(|V|) Dense graphs, small fixed-size graphs (8ร—8 connectivity tables), Floydโ€“Warshall.
Edge list ฮ˜(|E|) O(|E|) O(|E|) Bulk-edit, sorting edges by weight (Kruskal MST), input/output to disk.

Adjacency list is the default. Mehlhorn and Sanders[21] give the full asymptotic comparison; for sparse graphs the matrix wastes space at ฮ˜(|V|ยฒ) when most cells are zero, and neighbor enumeration becomes ฮ˜(|V|) per vertex (a full row scan) instead of ฮ˜(deg v). The break-even is around |E| โ‰ˆ |V|ยฒ / 32 on a bit-packed matrix, but you'll never hit it on a navmesh or render graph.

The cache-friendly adjacency list is not a std::vector<std::vector<int>>. It's two flat arrays in CSR layout: one offsets array and one edges array, both contiguous. SNAP[22], Boost.Graph, and most production graph libraries default to CSR for traversal:

// CSR adjacency list. Used by SNAP, Boost.Graph, Galois, every BFS benchmark.
struct CsrGraph {
    std::vector<int> offsets;   // size |V| + 1
    std::vector<int> edges;     // size |E|
};

// Neighbors of vertex v: contiguous span in 'edges'.
inline std::span<const int> neighbors(const CsrGraph& g, int v) {
    return { &g.edges[g.offsets[v]],
             static_cast<size_t>(g.offsets[v + 1] - g.offsets[v]) };
}

The reason CSR is universal: neighbor enumeration is one indirection and a sequential read. A std::vector<std::vector<int>> hits DRAM on the outer indirection and the inner vector's heap buffer, two cache misses per vertex visit. On the Graph500[23] benchmark inputs that show up in academic papers (Kronecker R-MAT graphs at scales 22 through 30), BFS on a CSR adjacency list runs at hundreds of millions of edges per second on a single core; the same algorithm on the nested-vector layout is typically 5โ€“10ร— slower[24].

The widget below shows the same 32-vertex graph stored three ways. Pick a vertex; the representation highlights the work each one does to answer "what are v's neighbors?"

Live ยท Three ways to store a graph
storage
ยทยทยท
cells read for lookup
ยทยทยท
neighbors found
ยทยทยท
Adjacency list reads exactly deg(v) cells to enumerate v's neighbors. The matrix reads all |V| cells in v's row whether they are edges or not. The edge list reads every edge in the graph and tests each one. At 15% density the matrix already wastes most of its lookup; at 5% it's a near-empty row scan, and the edge list is doing |E| comparisons for what should be O(deg v) work.

04Breadth-first search

BFS visits vertices in increasing order of edge-distance from the source. Level 0 is the source. Level 1 is everything one edge away. Level 2 is everything two edges away that wasn't already on level 1. The data structure that enforces this order is a FIFO queue.

The three-color convention (white = unvisited, gray = in the queue, black = expanded) comes from Cormen, Leiserson, and Rivest's 1990 first edition of Introduction to Algorithms[25]. Earlier sources, including Tarjan 1972, use a two-state distinction; the three-color version makes the queue invariant explicit and is the version every textbook now teaches.

// BFS. Returns parent[] and dist[]. Visits each vertex once, each edge once.
// Complexity: O(|V| + |E|), space O(|V|).
//
// 'parent[v] = -1' for unreached vertices; the source has parent = source.
// 'dist[v]' is the edge-count from source to v.
void bfs(const CsrGraph& graph, int source,
         std::vector<int>& parent,
         std::vector<int>& dist)
{
    const int vertexCount = graph.offsets.size() - 1;
    parent.assign(vertexCount, -1);
    dist  .assign(vertexCount, -1);

    std::queue<int> frontier;       // gray vertices: in the queue, not yet expanded
    parent[source] = source;        // sentinel: source is its own parent
    dist  [source] = 0;
    frontier.push(source);

    while (!frontier.empty()) {
        int current = frontier.front(); frontier.pop();
        // 'current' transitions from gray to black here.

        for (int neighbor : neighbors(graph, current)) {
            if (parent[neighbor] != -1) continue;   // already gray or black
            parent[neighbor] = current;
            dist  [neighbor] = dist[current] + 1;
            frontier.push(neighbor);                // newly gray
        }
    }
}

A few things are worth pointing out that the textbook usually skims past:

The widget below paints a maze in BFS levels. Level 0 is the source. Level 1 is everything one edge away. The wavefront expands radially because every cell on level k is exactly k moves from the start. The color band corresponds to the BFS dist[] field:

Live ยท BFS level expansion
level reached
0
cells expanded
0
queue peak
0
The peak queue size grows roughly with the perimeter of the expanded region; for a 2D grid that's O(โˆšn) on the area visited. This is why BFS on a 1000ร—1000 grid can hold tens of thousands of vertices in the queue at once. DFS on the same grid holds at most O(n) in the worst case, but for nicely connected regions usually holds far less. The memory profile is the main reason ยง14 (iterative deepening) exists.

Bidirectional BFS

If you know both endpoints (source s and goal g) you can launch two BFS frontiers, one from each, and stop when they meet[15]. The reason this is faster is geometric: on a uniform graph with branching factor b, single-direction BFS explores ~bd vertices to find a goal at depth d; bidirectional BFS explores ~2bd/2, the square root of the single-direction count. The catch is implementation discipline: the two frontiers must alternate, and detecting the meeting point requires checking each newly-gray vertex against the other direction's visited set.

05Depth-first search

DFS swaps the queue for a stack and otherwise looks identical. The change in data structure changes the order of visits: instead of expanding outward in layers, DFS picks one neighbor, recurses into it, follows that branch as deep as it goes, and only backtracks when there's nothing left to expand. The order of discovery is a preorder traversal of a spanning tree rooted at the source.

DFS comes in two equivalent forms: recursive (uses the call stack as the data structure) and explicit-stack iterative (uses a std::vector as the stack). The recursive form is what every algorithm textbook shows. The iterative form is what every shipping engine actually runs.

// Recursive DFS. Reads cleanly. Crashes on a deep enough graph.
//
// Each recursive call adds one stack frame (~64 bytes on x86-64 with no inlining).
// The default thread stack is 1 MiB on Windows, 8 MiB on Linux. Divide by 64,
// and your scene-graph traversal blows up around 16k to 130k deep.
void dfsRecursive(const CsrGraph& graph, int current,
                  std::vector<int>& visited)
{
    visited[current] = 1;
    for (int neighbor : neighbors(graph, current)) {
        if (visited[neighbor]) continue;
        dfsRecursive(graph, neighbor, visited);
    }
}

// Iterative DFS with an explicit stack. Same order of visits, no recursion limit.
// The 'stack' here is a heap-allocated std::vector, bounded by graph depth, not by
// the thread stack. This is the version that ships.
void dfsIterative(const CsrGraph& graph, int source,
                  std::vector<int>& visited)
{
    std::vector<int> stack;
    stack.push_back(source);

    while (!stack.empty()) {
        int current = stack.back(); stack.pop_back();
        if (visited[current]) continue;     // popped a stale entry
        visited[current] = 1;

        // Push neighbors in reverse so the leftmost neighbor is expanded first.
        // Without this, the order is reversed compared to the recursive version.
        auto adj = neighbors(graph, current);
        for (auto it = adj.rbegin(); it != adj.rend(); ++it) {
            if (!visited[*it]) stack.push_back(*it);
        }
    }
}

Both versions are O(|V| + |E|). The iterative version has one wrinkle: by the time you pop a vertex off the stack, it may have been visited by another path that pushed earlier. The if (visited[current]) continue; check guards that case. Forgetting it doesn't break correctness on a tree, but on a general graph with cross edges it visits the same vertex multiple times.

The three colors and edge classification

The three-color version of DFS, marking vertices white (unvisited), gray (on the recursion stack), or black (finished and popped), gives you something BFS doesn't: every edge gets a classification by when its other endpoint changed color[25]:

Maintaining the three colors is a one-line change: set the color to gray on entry to the recursive call, set it to black on exit. The cost is one extra write per vertex. In return you get cycle detection, topological sort (ยง8), strongly connected components (ยง9), and bridge / articulation-point detection, all for the price of one extra array.

Recursion-depth crash

The most common DFS bug in shipped engine code is recursive DFS on user-authored content. A scene with 50,000 entities nested as a linear chain (easy to author by accident in a procedural-prop placer) overflows the thread stack on the recursive traversal. Symptoms: a stack-overflow crash in retail, never reproduced internally because the test scenes have shallow hierarchies. Fix: switch to the iterative version, or set an explicit depth limit and return an error. Unity, Unreal, and Godot have all shipped fixes for this exact failure mode at one point or another.

06BFS vs DFS, side by side

The shape of the frontier is the visible difference. BFS expands layer by layer; DFS spelunks one branch. Both visit every reachable vertex exactly once. Both are O(|V| + |E|). They find different answers to the question "how do I get from s to g": BFS finds the path with the fewest edges; DFS finds any path, often a long one. The widget runs both on the same labyrinth, same source, same goal, at the same pace, so the difference is unambiguous:

Live ยท BFS vs DFS on the same maze
BFS expanded
0
BFS path length
ยทยทยท
DFS expanded
0
DFS path length
ยทยทยท
BFS terminates as soon as it pops the goal off the queue; the path it returns is the shortest in edge count. DFS terminates as soon as it pushes the goal onto the stack; the path it returns is whatever depth-first wandering produced. On a labyrinth with a single goal, DFS often expands more cells than BFS: it spelunks blind alleys to the end before backtracking. On a tree (no cycles) the two visit exactly the same vertices, but in different orders. The maze here has a handful of extra walls knocked down so both algorithms have alternate routes to choose between, which is what makes their exploration patterns diverge visibly.

07When to use which

The decision rule, ordered roughly by how often it comes up:

QuestionAnswerWhy
Shortest path in edge count, unweighted graph? BFS BFS visits in order of distance from source. The first time you pop a vertex, it's the shortest path.
Shortest path with non-negative weights? Dijkstra (ยง10) BFS but the queue is a priority queue keyed by accumulated cost.
Shortest path with a geometry-based heuristic? A* (ยง11) Dijkstra with the priority bumped by an admissible estimate of remaining distance.
Detect a cycle in a directed graph? DFS A back edge (gray-to-gray) exists iff the graph has a cycle. One pass of DFS, one extra color array.
Order tasks so dependencies finish first (build system, render graph, animation blend)? Topological sort (ยง8) Kahn (BFS-flavored, indegree queue) or DFS postorder. Either works on a DAG.
Group nodes by mutual reachability (which assets form a cycle in your import chain)? Tarjan's SCC (ยง9) DFS with lowlink. Outputs the cycle membership for every node in one pass.
Memory-constrained tree search for an exact solution at large depth? Iterative deepening (ยง14) Repeated DFS at increasing depth limits. O(d) memory instead of BFS's O(b^d).
Just need to visit every reachable vertex? Either Both are O(|V| + |E|). Use DFS for cache locality on a flat array; use BFS for layer ordering.

Two patterns aren't on the table because they're more subtle. Tree problems with sibling order that matters (behavior trees, the <canvas> render tree, UI hierarchies) want DFS preorder: parent, then children left-to-right. BFS gives layer order, which is almost never what UI or AI wants. Game-state search trees with branching that's too wide for full exploration (chess, Go, RTS opponent modeling) want neither pure BFS nor pure DFS; they want best-first search like Monte Carlo Tree Search or alpha-beta pruning, both of which are BFS/DFS variants with smarter expansion order.

08Topological sort

A DAG always has a linear order on its vertices that respects every edge: every dependency before every dependent. The classic example is a build system: compile util.cpp before main.cpp if main depends on it. The classic engine example is the render graph: the depth pre-pass must finish before the opaque pass that samples its output. Topological sort produces that order.

Two algorithms exist. Kahn's algorithm (1962[12]) is the BFS-flavored one: maintain an indegree count for each vertex, push every zero-indegree vertex onto a queue, pop and emit, decrement the indegree of each neighbor, push any neighbor whose indegree just hit zero. The DFS-postorder variant (CLRS chapter 20[25]; the algorithm-as-such is folklore from the DFS framework, no single 1960s paper claims it) runs DFS and emits each vertex when it turns black; the reversed sequence is a topological order.

// Kahn's topological sort. Returns an order respecting every edge,
// or an empty vector if the graph has a cycle.
//
// Complexity: O(|V| + |E|). Each vertex is enqueued once, each edge is
// inspected once when its source is processed.
std::vector<int> topoSortKahn(const CsrGraph& graph) {
    const int vertexCount = graph.offsets.size() - 1;
    std::vector<int> indegree(vertexCount, 0);

    // First pass: compute every vertex's indegree.
    for (int v = 0; v < vertexCount; ++v)
        for (int neighbor : neighbors(graph, v))
            indegree[neighbor]++;

    std::queue<int> ready;       // vertices with no remaining prerequisites
    for (int v = 0; v < vertexCount; ++v)
        if (indegree[v] == 0) ready.push(v);

    std::vector<int> order;
    order.reserve(vertexCount);

    while (!ready.empty()) {
        int current = ready.front(); ready.pop();
        order.push_back(current);

        for (int neighbor : neighbors(graph, current)) {
            // 'Consume' the edge from current to neighbor: one fewer prerequisite.
            if (--indegree[neighbor] == 0) ready.push(neighbor);
        }
    }

    // If we did not emit every vertex, the unprocessed ones are in a cycle.
    if ((int)order.size() != vertexCount) return {};
    return order;
}

Kahn's algorithm has a free side benefit: if you ever exit the loop with vertices still unprocessed, those vertices are in (or downstream of) a cycle. Build systems use this to print a useful error: "circular dependency between A, B, C" with the cycle members enumerated. The DFS-postorder version detects cycles by spotting a gray-to-gray edge (a back edge); both work but Kahn's is easier to write the "tell me which nodes are in the cycle" version of.

The widget runs Kahn's on an animation blend graph. Each node has an indegree counter (how many prerequisites haven't shipped); only zero-indegree nodes can be emitted. The order it produces is one valid topological order; there are usually many, and Kahn's picks the one driven by queue order (which itself depends on the initial scan):

Live ยท Kahn's topological sort
emitted
0
queue size
0
status
ready
Hit "add cycle" to insert a back edge. The remaining vertices never reach indegree zero, and the sort halts with a cycle reported. The vertices left in the graph are exactly the ones in or downstream of the cycle. This is the diagnostic every build system and render-graph compiler runs when it refuses a circular dependency.

09Strongly connected components

A strongly connected component is a maximal set of vertices where every pair can reach each other via directed edges. In a DAG, every SCC is a single vertex (no cycles). In a general directed graph, the SCCs are exactly the cycles, contracted. Engine relevance: when a build system, asset cooker, or render graph reports "circular dependency," what it wants to tell you is "the following SCC has more than one node, here are the names." The two-line answer above ("find the back edge") tells you a cycle exists; the SCC algorithm tells you exactly which nodes are in it.

Two algorithms ship. Tarjan's (1972[16]) runs in one DFS pass, maintaining a discovery-index per vertex and a "lowlink", the smallest discovery index reachable through the DFS subtree. A vertex whose lowlink equals its own discovery index is the root of an SCC, and the stack contents above it form the component. Kosarajuโ€“Sharir's (Kosaraju 1978 unpublished; Sharir 1981[17]) runs two DFS passes: one to get a finish-order on the original graph, a second on the transposed graph in reverse finish-order, where each DFS tree is one SCC. Both are O(|V| + |E|). Tarjan's is faster in practice because it touches the graph once and skips the transpose; Kosaraju's is shorter to write.

// Tarjan's SCC algorithm. One DFS pass, output one component per call to onComponent.
// 'discovery' is the order each vertex was first seen; 'lowlink' is the smallest
// discovery index reachable from this vertex's subtree.
void tarjanScc(const CsrGraph& graph, std::function<void(std::span<const int>)> onComponent) {
    const int vertexCount = graph.offsets.size() - 1;
    std::vector<int>  discovery(vertexCount, -1);
    std::vector<int>  lowlink  (vertexCount, -1);
    std::vector<bool> onStack   (vertexCount, false);
    std::vector<int>  componentStack;
    int nextIndex = 0;

    auto strongconnect = [&](auto& self, int current) -> void {
        discovery[current] = nextIndex;
        lowlink  [current] = nextIndex;
        nextIndex++;
        componentStack.push_back(current);
        onStack[current] = true;

        for (int neighbor : neighbors(graph, current)) {
            if (discovery[neighbor] == -1) {
                self(self, neighbor);                       // tree edge: recurse
                lowlink[current] = std::min(lowlink[current], lowlink[neighbor]);
            } else if (onStack[neighbor]) {
                // Back edge into the current SCC under construction.
                lowlink[current] = std::min(lowlink[current], discovery[neighbor]);
            }
            // Else: cross or forward edge into a *finished* SCC. Skip.
        }

        // If 'current' is the root of an SCC, pop it off the stack.
        if (lowlink[current] == discovery[current]) {
            std::vector<int> component;
            while (true) {
                int popped = componentStack.back();
                componentStack.pop_back();
                onStack[popped] = false;
                component.push_back(popped);
                if (popped == current) break;
            }
            onComponent(component);
        }
    };

    for (int v = 0; v < vertexCount; ++v)
        if (discovery[v] == -1) strongconnect(strongconnect, v);
}
What's intentionally missing

The recursive form blows up on deep graphs the same way recursive DFS does. Production code uses an explicit stack with a small state machine; there are well-tested iterative versions in the LLVM (used to identify SCCs in the call graph for inlining) and Boost.Graph codebases. The version above is the textbook one. Use the iterative form in shipping code, especially when the graph can be authored by users.

10Shortest paths

BFS already solves the shortest-path problem when every edge costs the same: the first time it pops a vertex, that vertex is as close to the source as it will ever be. Weights break that. Give one edge a cost of 10 while its neighbors cost 1, and BFS, which counts edges and not cost, hands back a two-edge path costing 11 over a four-edge path costing 4. Dijkstra's algorithm is the fix, and A* (ยง11) is just Dijkstra with a heuristic bolted on top.

What Dijkstra does

Dijkstra's algorithm[9] finds the lowest-cost path from one source to every other vertex, on a graph whose edge weights are all non-negative. It keeps one number per vertex, a tentative distance (0 for the source, โˆž for everything else), and a growing set of settled vertices whose distance is already final. Then it repeats a single move until no unsettled vertex is reachable:

  1. Settle the nearest. Take the unsettled vertex with the smallest tentative distance. Its tentative distance is now final: mark it settled.
  2. Relax its edges. For each neighbor reached by an edge of weight w, check whether (the settled vertex's distance) + w beats the neighbor's current tentative distance. If it does, lower the neighbor's distance and record the settled vertex as its parent.

That is the entire algorithm. Structurally it is BFS with the frontier changed from a FIFO queue to a min-priority queue keyed on tentative distance. BFS's FIFO queue always returns the vertex that has waited longest, which on a unit-weight graph is exactly the nearest unsettled vertex; keying on accumulated cost instead makes "nearest" mean cheapest-so-far rather than fewest-edges. Relaxing then does one thing BFS never has to: it can lower a vertex's tentative distance after first discovery, because on a weighted graph a cheaper route can surface later. Give every edge weight 1 and that never fires, so the priority queue hands vertices back in the same distance order BFS's FIFO queue does. That coincidence is what "Dijkstra is BFS with a priority queue" means.

Path reconstruction is the same backward walk BFS uses: each relaxation records a parent, and the shortest path to any vertex is its parent chain followed back to the source. With a binary heap for the priority queue the whole search is O((|V| + |E|) log |V|); a Fibonacci heap lowers the bound to O(|E| + |V| log |V|), and bucket or radix queues do better still when edge weights are small integers. In practice the Fibonacci-heap constants are bad enough that a plain binary heap usually wins anyway.

Why settling the nearest vertex is safe

Dijkstra rests on one claim: the moment you settle the unsettled vertex with the smallest tentative distance, that distance is already final. Nothing found later can beat it. Suppose a cheaper path to that vertex did exist. Trace it from the source; it has to leave the settled set somewhere, at some still-unsettled vertex y. The cost to reach y along that path is at least y's own tentative distance, because relaxing the settled frontier already found the cheapest way in; y is unsettled, so its tentative distance is at least the one you are about to settle; and every edge past y adds a non-negative amount. The "cheaper" path therefore costs at least as much, and the contradiction is gone.[25] That single step is where the non-negative-weight requirement lives: one negative edge past y could drag the total back down, the guarantee fails, and you need Bellmanโ€“Ford instead (covered below, and flagged again in ยง16).

The widget runs Dijkstra on terrain with cost. Pale cells are cheap (cost 1); the amber-hatched bands are slow (cost 8). It settles one vertex per tick, tints each by its final cost from the source, and labels the accumulated cost. The wavefront is a cost contour, not the even ring BFS painted in ยง4: it surges through cheap ground and lags against the slow bands, because a longer detour over cheap cells outranks a short hop through expensive ones.

Live ยท Dijkstra settling by accumulated cost
vertices settled
0
frontier (heap) size
0
cost to goal
ยทยทยท
Each settled cell's label is its final lowest cost from the source; once settled, it never changes, which is the invariant the safety argument above turns on. The contour is not round: a route of cheap cells that bends around a slow band beats the straight line through it, so the cheap corridors settle early and the slow bands settle late, at high cost. Run BFS (ยง4) on the same map and its wavefront would be even rings that ignore the bands, handing back a path that plows straight through the slow terrain because it has fewer cells. That gap is the bug Dijkstra exists to close.
// Dijkstra's algorithm with a binary heap.
// Returns a dist[] array; dist[v] = shortest cost from source to v, or +inf.
//
// Complexity: O((|V| + |E|) log |V|). Each vertex is pushed at most |V| times in
// the worst case (the 'lazy deletion' style below); the priority queue handles
// up to |E| insertions. For dense graphs an array-based priority queue is faster.
void dijkstra(const WeightedCsr& graph, int source,
              std::vector<double>& dist,
              std::vector<int>&    parent)
{
    const int vertexCount = graph.offsets.size() - 1;
    dist  .assign(vertexCount, std::numeric_limits<double>::infinity());
    parent.assign(vertexCount, -1);

    using Entry = std::pair<double, int>;             // (cost, vertex)
    std::priority_queue<Entry, std::vector<Entry>, std::greater<Entry>> frontier;

    dist[source] = 0.0;
    frontier.emplace(0.0, source);

    while (!frontier.empty()) {
        auto [cost, current] = frontier.top(); frontier.pop();
        if (cost > dist[current]) continue;            // stale entry, skip

        for (auto [neighbor, weight] : weighted_neighbors(graph, current)) {
            double newCost = cost + weight;
            if (newCost < dist[neighbor]) {
                dist  [neighbor] = newCost;
                parent[neighbor] = current;
                frontier.emplace(newCost, neighbor);   // lazy: leave stale entries
            }
        }
    }
}

Three implementation details to get right or you'll have a slow or wrong Dijkstra:

The rest of the shortest-path family

Dijkstra is the default, but it is one member of a family. Each of the others gives up Dijkstra's single-source scope or its non-negative-weight assumption to buy something Dijkstra can't:

11A* and admissible heuristics

A* is Dijkstra with one change: the priority isn't g(v) (cost from source) but g(v) + h(v), where h(v) is an estimate of the cost from v to the goal[13]. The estimate biases expansion toward the goal: vertices that look close get popped first. If the heuristic is good, A* expands a small fraction of the vertices Dijkstra would. If the heuristic is h(v) = 0, A* is exactly Dijkstra.

The 1968 paper's main contribution wasn't the algorithm (heuristic search was already in the air) but the proof of when it stays optimal:

Hart, Nilsson, Raphael 1968, the admissibility theorem (informal)

If the heuristic h is admissible (never overestimates the true remaining cost h*(v)), then A* with that heuristic finds an optimal solution.

If the heuristic is also consistent (sometimes called monotonic): h(v) โ‰ค cost(v, u) + h(u) for every edge vโ†’u, then A* never expands a vertex more than once. The 1972 SIGART correction[14] tightens the relationship between these two conditions and the optimality of the open-set behavior.

On a 2D grid, the canonical admissible heuristics are:

On a navmesh, the canonical heuristic is straight-line distance between polygon centroids. It's admissible (actual walking distance โ‰ฅ straight-line distance) and works well as long as the polygon decomposition is reasonably uniform. Multiply the heuristic by a constant > 1 (weighted A*) and A* expands fewer nodes but loses optimality; this is the standard speed/quality knob in shipped pathfinding[26].

The widget runs Dijkstra and A* side by side on the same grid, same source and goal. A heuristic-weight slider goes from 0 (= Dijkstra) past 1 (admissible) to 2 (greedy and possibly suboptimal). Watch the expanded set shrink as the heuristic gets stronger, and break optimality once the weight exceeds 1:

Live ยท Dijkstra vs A* with a heuristic-weight slider
Dijkstra expanded
0
A* expanded
0
Dijkstra path
0.0
A* path
0.0
At weight = 0 the two grids match: zero heuristic means A* is Dijkstra. At weight = 1 A* expands a slimmer corridor toward the goal but the path length matches Dijkstra's exactly: optimal, with fewer expansions. Past weight = 1 the heuristic over-estimates and the path A* returns can be longer than Dijkstra's: A* commits to the goal direction before exploring detours that would have been cheaper. The optimality break is visible in the path-length stat.

12Bidirectional search

Pohl's observation[15]: if a uniform-branching graph has branching factor b and the goal is at depth d, a single-direction BFS explores ~bd vertices to find it. Two BFS frontiers (one from the source, one from the goal, alternating expansion) meet in the middle after expanding ~2 ยท bd/2. The branching factor still applies, but the exponent halves. On a graph with b = 10 and d = 12, the single-direction count is 1012; the bidirectional count is 2 ยท 106. The same speedup carries over to bidirectional A*, with the additional subtlety that the two heuristics have to be consistent across the meeting boundary or the path you reconstruct isn't optimal.

The catch is detection. The frontiers must exchange visited sets; the meeting point isn't necessarily the first vertex one frontier pops that the other has seen; it's the vertex that minimizes distfwd(v) + distback(v). Get that wrong and the path you return isn't shortest.

13JPS and hierarchical pathfinding

Two optimizations turn A* from "works fine on 100ร—100 grids" into "works on 4096ร—4096 grids at interactive rates":

Jump Point Search (Harabor & Grastien, AAAI 2011[20]) exploits a symmetry property of uniform-cost 8-connected grids: many paths through open space are interchangeable. JPS prunes the search by skipping intermediate cells that no optimal path needs to expand, jumping directly to "interesting" cells (corners of obstacles, the goal). Reported speedups on the standard Dragon Age and HOG2 benchmarks are an order of magnitude over plain A*. JPS is restricted to uniform-cost grids (it breaks if your terrain has variable cost), but for tile-based games with binary walkable/unwalkable terrain it's the dominant choice.

Hierarchical Pathfinding A* (HPA*, Botea, Mรผller, Schaeffer, 2004[19]) partitions the map into clusters at one or more abstraction levels. At cook time, compute the shortest path between every pair of cluster-boundary nodes (the "abstract graph"). At query time, A* runs in three phases: connect the source to its cluster's boundary nodes, A* on the abstract graph between clusters, then refine each abstract edge into a real path. The Botea paper reports about 10ร— speedup on standard benchmarks with ~1% path-length suboptimality. Variants (HPA*-LRTA, PRA*, MM*) tune the trade.

Real-world AAA nav

Most shipped game pathfinding stacks aren't pure A* or pure JPS or pure HPA*; they're a navmesh built by Recast[2] (cluster-into-polygons-at-cook-time, like HPA* but the partition is geometric) walked at runtime by Detour, which runs weighted A* on the polygon graph with a Euclidean heuristic. The HPA*-style hierarchy is implicit in the navmesh; the A* runs on the polygon graph, which has tens of thousands of polys instead of millions of cells. Recast/Detour is the codebase behind Unreal's ARecastNavMesh class, Unity's NavMesh, Godot's navigation, and dozens of indie titles. Mikko Mononen released it as open source in 2009 under zlib.

14Iterative deepening

BFS finds the shortest path but uses memory proportional to the frontier; for a uniform tree with branching b and goal at depth d, that's O(bd). DFS uses memory proportional to depth (O(d)) but doesn't find shortest paths.

Iterative-deepening DFS (Korf 1985[18]) gets both. Run depth-limited DFS with limit 1; if not found, throw it away and run again with limit 2; then 3; and so on. The repeated work looks wasteful but isn't: on a uniformly branching tree, the last iteration does most of the work (bd nodes), and all earlier iterations combined do less (the geometric sum is bd/(bโˆ’1)). For b = 10 the overhead is about 11%.

IDA* in the same Korf paper combines this with A*'s heuristic: at each iteration, the cutoff is on f(v) = g(v) + h(v), not depth. IDA* is what every classical-AI puzzle solver from the late 1980s onward uses for problems where the state space is too large for A*'s open set to fit in memory (sliding-tile puzzles, Rubik's cube, sokoban). For game-engine navigation it's almost never the right choice (your navmesh fits in memory and A* is faster), but for game-state search trees it's standard.

15Case studies

1 ยท The render-graph compile

Modern engines (Frostbite's FrameGraph[27], Unreal's RDG, Unity's SRP RenderGraph) compile each frame's rendering work into a DAG of passes, where each pass declares the resources it reads and writes. The compile step does three things, all graph traversals:

  1. Topological sort (ยง8) gives a linear execution order respecting every read-after-write dependency. Kahn's algorithm runs over the indegree of each pass.
  2. Reverse-reachability culling (BFS from the swap-chain output, backwards along edges) marks every pass that contributes to a final visible output. Passes whose outputs nothing reads are dead-code-eliminated.
  3. Resource lifetime computation (a forward DFS labeling each resource's first-use and last-use pass) lets the renderer alias transient resources: two render targets whose lifetimes don't overlap can share the same physical allocation.

The cycle detector in step 1 is the same DFS-back-edge check from ยง5. If a pass writes A then reads A in the same frame, that's a cycle, and the compiler refuses it.

2 ยท F.E.A.R.'s GOAP: A* over actions

F.E.A.R. (Monolith, 2005) shipped a planner instead of a hand-authored behavior tree. Jeff Orkin's GDC 2006 talk[3] describes the system: each NPC has a list of actions (each with preconditions and effects), and the AI runs A* on a graph where nodes are world states and edges are actions. The heuristic is the count of unsatisfied goal conditions. The frame-rate cost is bounded by an action cap (~12 actions per agent on PC) so the search space stays tractable. F.E.A.R.'s NPCs felt smart partly because they would compose action sequences the designers hadn't scripted (vault a cover, flank, melee) directly out of the search. The state graph is implicit; A* generates it as it expands.

3 ยท Asset cooking and the SCC report

Every engine's asset cooker walks a dependency DAG: an animation graph references a skeleton, which references a mesh, which references a material, which references shaders and textures. Cooking visits the DAG bottom-up (reverse topological order: each dependency is baked before each dependent). When a content author accidentally introduces a cycle (usually two materials that reference each other through a function), the cooker uses Tarjan's SCC (ยง9) to report the cycle members. The error message "circular dependency detected: M_RustyBarrel โ†’ MF_RustNoise โ†’ M_RustyBarrel" comes directly out of the SCC algorithm: the cycle is the SCC with more than one node, and the components are the lowlink-marked stack pops.

4 ยท Scene-graph traversal: flat preorder beats recursive

Unity and Unreal both store the scene hierarchy in a flat array sorted by preorder traversal: parents appear before their children, and siblings are contiguous. Computing world transforms is a single sequential pass over the array: each entity's world transform is its local transform composed with its parent's world transform, and the parent has been processed already because of the preorder layout. No queue. No stack. No pointer chase. The asymptotic class is the same as recursive DFS; the constant factor is between one and two orders of magnitude better. Unity's DOTS Transform system is the explicit version of this design[28]; Unreal's Mass Entity / ECS subsystem moved toward the same shape over the 5.x cycle.

5 ยท CryEngine and Unreal navigation

CryEngine documents its pathfinding as A* over a triangulated navmesh[29], with a special Volume Navigation Region for full-3D flight in Crysis. Unreal's navigation system documentation[30] describes the navmesh as Recast-derived and the runtime walker as Detour. Unity's NavMesh build pipeline uses a Recast-based mesh construction phase[31]. The dominance of this codebase across AAA engines is one of the cleaner examples of open-source absorbing the field.

16Pitfalls

Bugs in graph-traversal code, ranked by frequency:

17What's next

For the engine programmer, the most useful next step is to look at your engine's render-graph compiler and read the topological sort. It's two screens of code that decide the order in which everything renders. If it's Kahn's algorithm, you'll recognize it.

Sources

  1. Yuriy O'Donnell. "FrameGraph: Extensible Rendering Architecture in Frostbite." Game Developers Conference 2017. gdcvault.com/play/1024612. The canonical engineering write-up on render-graph compilation, including the topological sort, resource-aliasing, and reverse-reachability culling described in ยง15.
  2. Mikko Mononen et al. "Recast Navigation." GitHub repository (zlib license), originally released 2009. github.com/recastnavigation/recastnavigation. README explicitly lists Mononen as the original author and notes the library "has shipped in countless indie games, AAA titles, and commercial and open-source game engines such as Unity, Unreal, Godot, and O3DE."
  3. Jeff Orkin. "Three states and a plan: The A.I. of F.E.A.R." Game Developers Conference 2006. gamedevs.org/uploads/three-states-plan-ai-of-fear.pdf. The source for F.E.A.R.'s GOAP system; the agent FSM is reduced to three states (Goto / Animate / Use Smart Object) and planning is A* over actions.
  4. Peter Norvig. "Teach Yourself Programming in Ten Years," "Approximate timing for various operations" table. norvig.com/21-days.html. Memory-hierarchy latency ballparks used to estimate per-node traversal cost: L1 ~0.5 ns, branch mispredict ~5 ns, DRAM ~100 ns.
  5. ร‰douard Lucas. Rรฉcrรฉations mathรฉmatiques, Vol. I, 2nd ed., Paris: Gauthier-Villars, 1882. The chapter on labyrinths attributes the maze procedure (depth-first search with edge marking) to Charles Pierre Trรฉmaux. archive.org hosts a public-domain scan.
  6. Lester R. Ford, Jr. Network Flow Theory. Paper P-923, The RAND Corporation, August 14, 1956. rand.org/pubs/papers/P923. The relaxation-style shortest-path procedure that, with Bellman 1958 and Moore 1959, is the joint origin of Bellmanโ€“Ford.
  7. Edward F. Moore. "The shortest path through a maze." In Proceedings of the International Symposium on the Theory of Switching, Part II, April 2โ€“5, 1957, Annals of the Computation Laboratory of Harvard University, Vol. 30, pp. 285โ€“292. Harvard University Press, 1959. The first published BFS; motivated by routing toll telephone traffic at Bell Labs.
  8. Richard Bellman. "On a Routing Problem." Quarterly of Applied Mathematics 16(1):87โ€“90, 1958. DOI: 10.1090/qam/102435. ams.org/journals/qam/1958-16-01. The dynamic-programming functional equation for shortest paths.
  9. Edsger W. Dijkstra. "A note on two problems in connexion with graphs." Numerische Mathematik 1(1):269โ€“271, 1959. DOI: 10.1007/BF01386390. link.springer.com/article/10.1007/BF01386390. Three pages, two algorithms: shortest path between two given vertices, and minimum spanning tree.
  10. Thomas J. Misa, Philip L. Frana. "An Interview with Edsger W. Dijkstra." Communications of the ACM 53(8):41โ€“47, August 2010. cacm.acm.org/research/an-interview-with-edsger-w-dijkstra. Dijkstra's own account of designing the shortest-path algorithm in about twenty minutes in a 1956 Amsterdam cafรฉ "without pencil and paper."
  11. C. Y. Lee. "An algorithm for path connections and its applications." IRE Transactions on Electronic Computers EC-10(3):346โ€“365, 1961. DOI: 10.1109/TEC.1961.5219222. BFS independently re-derived for circuit routing; the "Lee algorithm" of EDA textbooks.
  12. Arthur B. Kahn. "Topological sorting of large networks." Communications of the ACM 5(11):558โ€“562, November 1962. DOI: 10.1145/368996.369025. dl.acm.org/doi/10.1145/368996.369025. The indegree-queue algorithm; motivated by ordering a 30,000-activity PERT chart at Westinghouse.
  13. Peter E. Hart, Nils J. Nilsson, Bertram Raphael. "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." IEEE Transactions on Systems Science and Cybernetics SSC-4(2):100โ€“107, 1968. DOI: 10.1109/TSSC.1968.300136. ieeexplore.ieee.org/document/4082128. The A* paper; introduces the algorithm and proves the admissibility theorem.
  14. Peter E. Hart, Nils J. Nilsson, Bertram Raphael. "Correction to 'A formal basis for the heuristic determination of minimum cost paths'." SIGART Newsletter 37:28โ€“29, 1972. DOI: 10.1145/1056777.1056779. dl.acm.org/doi/10.1145/1056777.1056779. The one-page correction that tightens the admissibility/consistency relationship.
  15. Ira Pohl. Bi-directional and heuristic search in path problems. Ph.D. dissertation, Stanford University, May 1969. Also issued as SLAC report SLAC-104. slac.stanford.edu/pubs/slacreports/reports04/slac-r-104.pdf. The original framing of bidirectional heuristic search; the conventionally cited "Pohl 1971" is the Machine Intelligence 6 chapter that followed.
  16. Robert Tarjan. "Depth-first search and linear graph algorithms." SIAM Journal on Computing 1(2):146โ€“160, 1972. DOI: 10.1137/0201010. epubs.siam.org/doi/10.1137/0201010. The paper that formalizes DFS as a general technique and gives the linear-time SCC and biconnected-components algorithms.
  17. Micha Sharir. "A strong-connectivity algorithm and its applications in data flow analysis." Computers & Mathematics with Applications 7(1):67โ€“72, 1981. DOI: 10.1016/0898-1221(81)90008-0. sciencedirect.com/science/article/pii/0898122181900080. The two-pass SCC algorithm; credited to an unpublished 1978 lecture by S. Rao Kosaraju at Johns Hopkins, hence "Kosarajuโ€“Sharir."
  18. Richard E. Korf. "Depth-first iterative-deepening: An optimal admissible tree search." Artificial Intelligence 27(1):97โ€“109, 1985. DOI: 10.1016/0004-3702(85)90084-0. sciencedirect.com/science/article/abs/pii/0004370285900840. The IDDFS paper; also introduces IDA*.
  19. Adi Botea, Martin Mรผller, Jonathan Schaeffer. "Near optimal hierarchical path-finding." Journal of Game Development 1(1):7โ€“28, 2004. webdocs.cs.ualberta.ca/~mmueller/ps/2004/hpastar.pdf. The HPA* paper; reports ~10ร— speedup on standard benchmarks with ~1% path-length suboptimality.
  20. Daniel Harabor, Alban Grastien. "Online graph pruning for pathfinding on grid maps." In Proc. AAAI Conference on Artificial Intelligence (AAAI-11), pp. 1114โ€“1119, 2011. ojs.aaai.org/index.php/AAAI/article/view/7994. The Jump Point Search paper; A*-on-grid speedups via symmetry pruning, order-of-magnitude on HOG2 benchmarks.
  21. Kurt Mehlhorn, Peter Sanders. Algorithms and Data Structures: The Basic Toolbox. Springer, 2008, chapter 8 "Graph Representation." people.mpi-inf.mpg.de/~mehlhorn/ftp/NewToolbox/grepresent.pdf. The asymptotic comparison of adjacency list, adjacency matrix, and edge list representations.
  22. Jure Leskovec, Rok Sosiฤ. "SNAP: A general-purpose network analysis and graph-mining library." ACM Transactions on Intelligent Systems and Technology 8(1):1:1โ€“1:20, 2016. DOI: 10.1145/2898361. snap.stanford.edu/snap. Stanford's reference graph library; uses CSR throughout.
  23. Graph500 Steering Committee. "Graph500 benchmark specification." graph500.org/?page_id=12. The reference BFS benchmark spec used in the parallel-graph-algorithms literature; defines scale-S as a Kronecker graph with 2S vertices.
  24. Scott Beamer. Understanding and Improving Graph Algorithm Performance. Ph.D. thesis, UC Berkeley, 2016. escholarship.org/uc/item/4jp775f7. Direction-optimizing BFS, the GAP benchmark suite, and per-graph throughput figures for BFS on real-world inputs (typically 108โ€“109 edges/s per core depending on graph structure).
  25. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms, 4th ed. MIT Press, 2022. ISBN 9780262046305. mitpress.mit.edu/9780262046305. Chapter 20 covers BFS, DFS, topological sort, and SCC. Chapters 22โ€“23 cover shortest paths. The 1st edition (1990) introduced the white/gray/black DFS notation.
  26. Amit Patel. "Introduction to A*." Red Blob Games, ongoing. redblobgames.com/pathfinding/a-star/introduction.html. Best-in-class interactive explanation of BFS, Dijkstra, and A* with weighted variants. The companion implementation guide is the reference most game-AI courses point to.
  27. Yuriy O'Donnell. "FrameGraph: Extensible Rendering Architecture in Frostbite." GDC 2017. Same talk as [1]. The render-graph compiler in ยง15 is described in detail starting at minute 22.
  28. Unity Technologies. "Transforms package documentation." Unity DOTS / Entities documentation. docs.unity3d.com/Packages/com.unity.entities/transforms-concepts.html. Source for the flat-preorder transform hierarchy layout described in case study 4.
  29. Crytek. "Pathfinding Costs." CryEngine V Manual. docs.cryengine.com/display/SDKDOC4/Pathfinding+Costs. Documents A* over a triangulated navmesh with edge-cost designators; the Volume Navigation Region used in Crysis is documented in the same section family.
  30. Epic Games. "Navigation System in Unreal Engine." Unreal Engine documentation. dev.epicgames.com/documentation/en-us/unreal-engine/navigation-system-in-unreal-engine. Documents ARecastNavMesh, Detour as the runtime A* walker, and the Recast-derived build pipeline.
  31. Unity Technologies. "NavMesh Components." GitHub repository and accompanying blog posts. github.com/Unity-Technologies/NavMeshComponents. Source for Unity's Recast-based NavMesh construction pipeline.

See also