All tutorials Mighty Professional
Build a Game Engine · Advanced

Open-World Streaming

The File Streaming tutorial got bytes off disk fast; Spatial Partitioning built the structures. This module is the layer that drives them from the camera: cut the world into cells, load a window of them around the player, and keep the frame smooth while doing it. The lesson that surprises people is where the stutter actually comes from. It is almost never the disk.

Time~55 min LevelSenior PrereqsThe File Streaming tutorial (async I/O, residency pools), Spatial Partitioning (grids/quadtrees), and Job Systems (workers). StackC++ & Rust
◂ Build a Game Engine Advanced · Resources Next · Save Games & Serialization ▸

01The world doesn't fit

A modern open world is tens to hundreds of gigabytes on disk; runtime memory is a fraction of that. The 's job is to keep resident only the slice of the world near the camera, paging cells in as the player approaches and out as they recede. Mark Cerny framed the PS5 design goal as shrinking that resident window from roughly a 30-second buffer of upcoming gameplay to about one second, by streaming faster[1].

Two budgets, evicted independently

Keep RAM and VRAM separate. Gameplay data, audio, navigation, and collision live in system memory; meshes and textures are uploaded to GPU memory. Each pool has its own budget and its own eviction policy, so a cell can be RAM-resident while its textures are still streaming into VRAM. This tutorial sits above the I/O streamer and the residency pool you built in File Streaming; it decides which cells to ask that streamer for.

02Partitioning into cells

Cut the world into a grid of cells (sometimes called sectors or tiles): each cell owns the actors, meshes, and collision in its footprint, baked as one loadable unit. The cell the player stands in plus a radius of neighbors is what stays resident. Unreal's World Partition does exactly this with a 2D runtime spatial hash, and it replaced the older hand-managed sublevel workflow[2]. The grid-versus-quadtree mechanics are the Spatial Partitioning tutorial's territory; here the only new state per cell is its residency.

Cell size is a churn-versus-spike tradeoff, and the structure isn't universal

Cell size has to be small enough that loading one doesn't spike memory or the frame, but large enough that cells don't churn (load and unload rapidly) as the player moves near a boundary[3]. There is no universal number: World Partition defaults to 256 m cells with a 768 m loading range, but that is a per-engine default, not a law[2]. A regular grid is cheap to build and extends in any direction, but wastes resolution where the world is sparse and grows quadratically with radius; a quadtree adapts to uneven density at a higher update cost. Most shipping open-world engines use a grid of cells; some use a hierarchy. Pick for your world, not by reflex.

The map below streams cells around a wandering player. Cells inside the load radius queue, then load over real time; cells past the eviction radius free. Drop the bandwidth or raise the speed until the player outruns the loader and the center dot turns red, that's pop-in:

Amber = queued, blue = loading (filling), green = resident; the dashed ring is the load radius, and cells evict at a larger radius (hysteresis, §6). The loader serves the nearest queued cells first. Outrun it, raise speed or drop bandwidth, and the player enters a cell that hasn't arrived: the dot flashes red (pop-in). Turn off velocity prediction and a sharp turn strands cells loading behind you.

03Loading radius & priority

The set of cells to keep resident is a streaming volume around each streaming source (usually the camera, sometimes also a fast vehicle or a spectated player). Within that volume, requests are ordered by priority, not issued all at once. The priority that ships in practice is a blend of three signals.

Prediction has a failure mode worth seeing

Predicting ahead of travel only helps while the prediction holds. On a sharp turn the velocity is stale, so a cell finishes loading where the player was about to be, not where they went. In the §2 widget that shows up as a freshly-loaded cell sitting behind the player after a hard turn. The fix is not "predict harder", it is to keep the load radius large enough that the un-predicted neighbors are resident anyway, and to let mispredicted cells evict cheaply.

04Getting the bytes

A queued cell's data is read on a worker thread, never on the thread that draws frames. This is the part everyone assumes is the bottleneck. The File Streaming tutorial built the async read path and the residency pool; the streamer dispatches cell loads onto the job system and is handed back a buffer when the read completes.

The platform async-I/O reality

On Windows, open with FILE_FLAG_OVERLAPPED and drive completions through an I/O completion port (IOCP); on Linux, io_uring batches many reads through one io_uring_enter via its submission/completion rings[5]. A Rust honesty note: tokio::fs is not truly async, it runs blocking reads on a spawn_blocking thread pool, because most operating systems offer no async file API the way they do for sockets[6]. For real async file I/O on Linux you reach for an io_uring crate. Either way, the bytes arriving is not the same as the cell being usable, which is §5.

05The integration budget

The load-bearing idea of the whole tutorial: a streaming (one frame that blows the budget) is usually not the disk read. On an SSD the bytes are already in RAM by the time you notice. The hitch is the main-thread cost of making the loaded data live: registering components, building uniform buffers, adding primitives to the scene, creating physics bodies. Epic's own level-streaming guide pins the long hitches on the render thread "creating uniform buffers, initializing BSPs, and adding primitive scene infos" when a level becomes visible[7]. Unity says the same of Addressables: the load is async, but "the final part of scene loading requires operation on the main thread"[8].

The fix is to time-slice the integration, not to buy a faster disk

Give the per-frame activation work a time budget and stop when it's spent, finishing the cell over several frames. Unreal exposes this directly: s.LevelStreamingActorsUpdateTimeLimit caps the time spent in AddToWorld per frame, and component registration is sliced by a granularity count; set the limit too high and "you will see a lot of hitches if your levels are content heavy"[9]. A second, distinct main-thread spike is the garbage collector running after a cell streams out, which is why engines spread that incrementally too[9]. The disk is a cause of stalls on a slow HDD or when bandwidth-starved; integration is the common cause on modern hardware. Don't collapse the two.

A cell finishes loading every second or so and brings a chunk of main-thread integration work. Integrate it all at once and the frame bar spikes over the 16.6 ms line, a hitch; time-slice it and the frame stays smooth while the cell takes a few extra frames to go live:

The bytes were already resident in both modes, the disk was never the bottleneck. All-at-once: the cell's activation lands on one frame and the bar punches through 16.6 ms (a hitch). Time-sliced: the same work is metered out under a per-frame budget, so frames stay smooth and the cell goes live a few frames later. Shrinking the budget makes it smoother but slower to appear.
The integration budget: drain activation work to a per-frame time limit
// Worker thread (NOT the frame thread): read the cell, then queue it for activation.
void onCellLoaded(Cell& cell, Bytes raw) {
    cell.deserialize(raw);                 // parse off-thread; still not "live"
    pendingIntegration.push(&cell);     // hand to the main thread
}

// Main thread, once per frame: spend only the budget, finish the rest next frame.
void integrateStreamedCells(Duration budget) {
    auto frameStart = std::chrono::steady_clock::now();
    while (!pendingIntegration.empty()) {
        if (std::chrono::steady_clock::now() - frameStart > budget) break;  // time-slice (cf. UE's update time limit)
        Cell* cell = pendingIntegration.front();
        if (cell->integrateNextChunk())       // register a few components / upload one batch
            pendingIntegration.pop();           // the cell is fully live
    }
}
// Worker thread: read + parse the cell, then hand it to the main thread for activation.
fn on_cell_loaded(cell: &mut Cell, raw: Bytes, pending: &Sender<CellId>) {
    cell.deserialize(raw);                 // parse off-thread; still not "live"
    pending.send(cell.id).ok();          // queue for the frame thread
}

// Main thread, once per frame: spend only the budget, finish the rest next frame.
fn integrate_streamed_cells(world: &mut World, budget: Duration) {
    let frame_start = Instant::now();
    while let Some(&id) = world.pending_integration.front() {
        if frame_start.elapsed() > budget { break; }      // time-slice
        if world.integrate_next_chunk(id) {              // register / upload one quantum
            world.pending_integration.pop_front();      // fully live
        }
    }
}
What's intentionally missing

This skips the production concerns the streamer also needs: a pin list so a cell in use this frame can't be evicted mid-frame; GPU upload fences (issuing the copy is not the same as the copy being done); dependency ordering so a cell's textures are resident before a mesh references them; double-buffered staging; bandwidth-aware throttling; and decompression-worker backpressure. The File Streaming tutorial owns the residency-pool and eviction internals this builds on.

06Eviction & persistence

Cells leave memory when the player moves away. A streaming-memory counter (GTA's design: increment on load, decrement on unload, evict the most distant resources when over budget[10]) or a distance/LRU policy decides what goes. The non-obvious part is when, not what.

Load and evict at different radii (hysteresis)

If you load and evict a cell at the same radius, a player pacing back and forth across a boundary loads and frees the same cell every step, the load-evict thrash. Load at radius R and evict at a larger radius R + margin, so a cell has to be clearly gone before it's dropped. It's the streaming version of a Schmitt trigger: an asymmetric threshold that kills oscillation. The §2 widget evicts farther than it loads for exactly this reason.

A modified cell is a delta over the baseline

The cooked cell on disk is read-only and shared. When the player changes it, opens a door, loots a chest, fells a tree, kills an NPC, you do not rewrite the cooked cell. You record a delta layered on top of the baseline, and that delta is what the save system serializes. This keeps the world's authored data immutable and the per-player mutation tiny. It's the common approach for level-based worlds, not the only one: a single-scene or roguelike game may snapshot whole cells, and an MMO keeps authoritative state on the server.

07LOD, HLOD & the far field

A cell beyond the load radius still has to be seen if it's on the horizon. Rendering it at full detail blows the budget; showing nothing leaves a hole. The answer is detail that falls off with distance: full meshes near, lower at mid range, and a single baked proxy for the unloaded far cells. Unreal generates an HLOD proxy mesh and material offline to stand in for unloaded World Partition cells, cutting draw calls in the distance[11].

Concentric rings of detail around the player. Near cells are full meshes, mid cells drop to a cheaper LOD, far cells collapse to an HLOD proxy. Toggle HLOD off and the far field either reverts to full meshes (the triangle budget explodes) or would have to pop in as you approach:

Green is full mesh near the player, amber is a cheaper LOD at mid range, blue is one HLOD proxy for the far field; the readout is the scene triangle budget. With HLOD on, the distance is a handful of proxy cells and the budget stays bounded. Turn it off and the far cells revert to full meshes, the count jumps into the red: the alternative to a proxy is paying that cost or leaving an empty horizon that pops in as you approach.

Texture detail streams on the same principle one level down: pick a mip from the on-screen texel-to-pixel ratio and the memory budget, dropping a mip at a time when over budget[4]. Sparse virtual texturing takes that further, mapping a small physical cache to a huge logical texture through a page-table indirection. Many implementations decide which pages to stream from GPU feedback (a readback of which pages the visible pixels needed); others, for predictable content like terrain, pre-determine residency from the geometry and skip the feedback path entirely[12]. The File Streaming tutorial derives SVT and the I/O fast path in full; the point here is that texture streaming is a related but distinct system from world-cell streaming.

Pop-in and hitching are different bugs with different fixes

Pop-in is data that hasn't arrived: a low-detail proxy (or nothing) shows, then swaps to full detail once it streams in. Its causes are throughput, priority, and bandwidth; its fixes are more bandwidth, better priority, a larger prefetch radius, or an impostor to cover the gap. Hitching is a single frame that blew its budget, from main-thread integration or GC (§5); its fix is time-slicing. The platform I/O fast path, DirectStorage with GPU decompression on PC and the PS5's hardware decompressor, attacks bytes-to-memory latency, so it helps pop-in and load times, not the integration hitch[13]. A faster SSD will not fix a stutter that lives on the main thread.

Wrong answers, and why: a per-region stutter on an SSD is main-thread integration (time-slice it), not the disk or cell size; a blob-then-detail swap is pop-in (throughput), not a hitch or GC; and boundary thrash is fixed by asymmetric load/evict radii (hysteresis), not by resizing cells or never evicting.

08Pitfalls

Blaming the disk for hitchesOn an SSD the bytes are there; the hitch is main-thread integration. Time-slice it.
Integrating a cell in one frameOne big activation drops a frame. Meter it to a per-frame budget.
Load radius = evict radiusBoundary thrash. Evict at a larger radius (hysteresis).
Predicting harder to beat pop-inStale velocity strands cells on turns. Keep the radius large; evict mispredicts cheaply.
One cell size everywhereToo small churns; too large spikes. Tune to density, or go hierarchical.
Rewriting the cooked cell on changePersist a delta over the read-only baseline; the save system serializes that.
"DirectStorage fixes stutter"It speeds bytes-to-memory (pop-in), not the integration hitch.
Conflating pop-in with hitchingThroughput vs main-thread. Different causes, different fixes.

09What's next

Streaming keeps the world resident around the player; the changes the player makes to it have to outlive the session. The next module, Save Games & Serialization, persists that mutable state: the object graph (you cannot write a pointer to disk), versioning so old saves still load, and the atomic write that keeps a crash from corrupting a save. The full path is on the series hub.

  1. Mark Cerny. "The Road to PS5" (2020), as covered by VentureBeat. venturebeat.com. The resident-window framing (about a one-second buffer vs the prior ~30 seconds) and the SSD bandwidth / hardware-decompression figures (treat the headline peak as best-case).
  2. Epic Games. "World Partition in Unreal Engine." dev.epicgames.com. A 2D spatial-hash grid of streaming cells, default 256 m cell / 768 m loading range, streaming sources, and that it replaced the older World Composition workflow. (Per-engine defaults, not universal numbers.)
  3. StraySpark. "World Partition Deep Dive: Streaming, Data Layers, and HLOD for Massive Open Worlds." strayspark.studio. Cell size as a churn-versus-spike tradeoff.
  4. Epic Games. "Texture Streaming Overview." docs.unrealengine.com. Mip selection from the on-screen texel-to-pixel ratio and visibility; dropping a mip at a time when over the pool budget.
  5. Oracle. "An Introduction to the io_uring Asynchronous I/O Framework." blogs.oracle.com. The submission/completion rings and batching many reads through one io_uring_enter.
  6. tokio-rs/tokio, issue #2926 and the spawn_blocking docs. github.com/tokio-rs/tokio. tokio::fs runs blocking reads on a thread pool, not true async I/O, because most operating systems lack an async file API.
  7. Epic Games. "Level Streaming Hitching Guide." dev.epicgames.com. Long hitches on the render thread from creating uniform buffers and adding primitive scene infos when a level becomes visible, the integration cost, not the disk read.
  8. Unity Technologies. Addressables LoadSceneAsync. docs.unity3d.com. The load is async, but "the final part of scene loading requires operation on the main thread."
  9. Peter Leontev. "Level Streaming and Garbage Collection Optimization Tweaks in UE4." peterleontev.com. s.LevelStreamingActorsUpdateTimeLimit time-slicing of AddToWorld, the registration granularity, and the GC hitch after streaming out.
  10. GTAMods Wiki. "Resource Streaming." gtamods.com. The classic slot-based sector streamer: the five-state pipeline, the streaming-memory counter driving eviction, and proximity-ordered disk reads.
  11. Epic Games. "World Partition — Hierarchical Level of Detail." dev.epicgames.com. A baked proxy mesh and material standing in for unloaded grid cells to cut distant draw calls.
  12. J.M.P. van Waveren. "Software Virtual Textures" (2012). mrelusive.com; Andreas Neu. "Virtual Texturing" (arXiv:1005.3163, 2010). arxiv.org. The page-table indirection and the GPU-feedback residency path; the geometry-determined terrain path with no feedback pass is the older MegaTexture approach (Enemy Territory: Quake Wars).
  13. NVIDIA. "Accelerating Load Times for DirectX Games and Apps with GDeflate for DirectStorage." developer.nvidia.com. GPU decompression speeds bytes-to-VRAM (load time and pop-in), which is a different leg from the main-thread integration hitch.

See also