Asset Pipeline & Serialization
A .png and an .fbx are made for an artist, not for your engine at runtime. The asset pipeline is the offline machine that turns authoring files into blobs the engine can load fast, often in place. We build that machine in C++ and Rust: the , a versioned binary format, stable references, a dependency graph, hot reload, and the alignment rules that make actually work.
01Why a pipeline
Source formats are built for authoring and interchange, not for runtime. A PNG must be decoded; a WAV must be parsed; an FBX or glTF carries editor metadata and float data that isn't laid out for the GPU. Unreal puts it plainly: it stores content "in particular formats which it uses internally, such as PNG for texture data or WAV for audio. However, this content needs to be converted to different formats for the various platforms"[2].
So an engine ships a second representation: cooked data, laid out for fast loading and the target hardware. The conversion also generates mip chains, transcodes textures to GPU block formats, and reorders for the platform. Decode is one cost; layout is the bigger one.
The design space splits along axes: text vs binary, parse vs zero-copy, a version tag vs schema evolution, GUIDs vs paths. The right choice depends on your platforms, your iteration speed, and whether data crosses a version boundary at runtime. This tutorial builds one coherent point in that space and names the tradeoffs at each axis.
02Bake vs runtime load
The cook (or bake) is an offline transform from source to a runtime-ready, often platform-specific artifact: import → process → cook → package. Heavy work happens once, offline; the runtime does a light load and fixup[11]. The Bitsquid engine compiled source JSON into runtime data on a separate path, and "the engine never sees the generic data folder so it can't cheat"[3].
Cooking is transform plus layout; compression is a separate, optional packaging step (its own tutorial). A cooked blob can be uncompressed. And "cook once, run anywhere" fails for platform-specific cooks: texture block format, endianness, and alignment can differ per target, which is why Unity and Unreal both cook per build target[2]. Keep import (source → engine intermediate) distinct from cook (intermediate → shipped package).
One asset flows through the pipeline below; hit "edit the source" to trigger a hot re-cook of just that asset:
03Binary serialization
A cooked format is binary. Three fundamentals decide whether it loads correctly and quickly: byte order, alignment, and what you're allowed to put in it.
- Endianness. Pick and document a byte order. Most targets a game cares about are little-endian, which is why formats standardize on it: glTF mandates that "all buffer data... MUST use little endian byte order"[4]. "Little-endian everywhere" is still an over-claim (network order is big-endian, some hardware is BE), so the format must declare its order rather than assume.
- Alignment. A type with alignment N must sit at an address that's a multiple of N; the compiler pads between fields to satisfy it. Use fixed-width types (
<cstdint>) and lock the layout withstatic_asserton size and alignment. - POD only. To write a struct straight to disk it must be plain data, no virtual functions, because a vtable pointer is a runtime address that means nothing in a file[11].
Casting a byte buffer to a struct only works if the buffer satisfies the struct's alignment. On x86-64 an unaligned scalar load works but can cost 30 to 70 percent when it straddles a cache line; on ARM it can be a bus error (SIGBUS); and in C++ the act of reading through a misaligned pointer is undefined behavior regardless of the CPU[8]. Use std::bit_cast or memcpy when alignment isn't guaranteed. In Rust, only #[repr(C)] gives a defined field order you can serialize; #[repr(Rust)] layout is unspecified.
The inspector lays out an asset header byte by byte. Padding is highlighted; flip the version and a live migration rewrites the bytes:
04Versioning & migration
Data outlives the code that wrote it, so a format must evolve. Two valid strategies:
- Version tag + migration. The header carries a version; the loader switches on it and migrates old data up to the current in-memory form. Total control, but you write every migration.
- Schema evolution. Compatibility is structural. Protocol Buffers key fields by number, not name: the number "cannot be changed once your message type is in use," old binaries ignore unknown fields, and a deleted field's number must be
reservedso it's never reused[7]. FlatBuffers uses per-table vtables so adding fields at the end and deprecating (never deleting or reordering) keeps forward and backward compatibility[6].
A version tag only detects which layout you have; you still need a migration path or a default-fill rule, or old files fail to load. And these are different mechanisms: a version tag plus migration converts old to new; schema evolution makes one layout tolerate added and removed fields without a migration step. Forward compatibility (old code reads new data) and backward compatibility (new code reads old data) are also distinct; state which you're claiming.
#include <cstdint>
struct AssetHeader { // fixed-width, no virtuals, safe to write as bytes
uint32_t magic; // sentinel: catch a wrong/corrupt file early
uint32_t version; // drives the migration switch
uint16_t flags; // 16-bit: a 2-byte pad follows so the next field is 4-aligned
uint32_t payloadBytes;
uint64_t contentHash; // 8-byte aligned at offset 16
};
static_assert(sizeof(AssetHeader) == 24, "layout drifted");
static_assert(alignof(AssetHeader) == 8, "alignment drifted");
bool migrate(AssetHeader& h) {
switch (h.version) {
case 1: h.flags = 0; // v1 had no flags field
[[fallthrough]];
case 2: h.version = 3; return true;
case 3: return true;
default: return false; // unknown/newer: refuse
}
}
use bytemuck::{Pod, Zeroable};
#[repr(C)] // defined field order; #[repr(Rust)] is unspecified
#[derive(Clone, Copy, Pod, Zeroable)]
struct AssetHeader {
magic: u32,
version: u32,
flags: u16, // 16-bit: a 2-byte pad follows so the next field is 4-aligned
payload_bytes: u32,
content_hash: u64, // 8-byte aligned at offset 16
}
const _: () = assert!(core::mem::size_of::<AssetHeader>() == 24);
fn migrate(h: &mut AssetHeader) -> bool {
match h.version {
1 => { h.flags = 0; h.version = 3; true }
2 => { h.version = 3; true }
3 => true,
_ => false, // unknown/newer: refuse
}
}
05References & dependencies
Assets point at other assets, and how you encode that reference decides whether moving a file breaks the game.
- GUID = stable identity. A global ID that survives move and rename. Unity stores one per asset in a sidecar
.metaso it "can move or rename the asset without breaking anything"; lose the.metaand "any reference to that asset is broken"[9]. A path, by contrast, breaks the moment a file moves. - Runtime handle ≠ GUID. At load, GUIDs resolve to compact runtime handles. The right shape is a generational handle (index + generation), so a freed slot reused by a new asset doesn't alias an old handle[10]. This is the same pattern as entity IDs in the ECS and the handles in Memory Allocators.
- Cooking builds a DAG: each asset points at the assets it references. Touch a source asset and it, plus every dependent, must re-cook.
Click an asset to "edit" it; the dirty set propagates to everything downstream:
Wrong answers, and why: GUID and handle are opposite roles (persisted identity vs runtime index), not synonyms; and a cast crash that's platform-specific is an alignment/layout problem, not memory or mmap availability.
06Packaging
Thousands of small files mean thousands of open() calls and seeks. A package concatenates cooked assets into a few archives plus an index (offset, size, and ID per entry), which cuts I/O overhead and lets the streamer pull contiguous data.
Unreal's .pak is "an archive file format... to store cooked content" with an index for its file system, and UE5's IoStore splits content (.ucas) from metadata (.utoc)[13]; Unity's AssetBundle is a platform-specific archive grouping assets[14]. Compression layers on here (per-entry or whole-archive) and is its own tutorial; the index plus contiguous layout is what makes the prioritized streaming from the File Streaming tutorial feasible.
07Hot reload
The pipeline watches the source, re-cooks the changed asset and its dependents, and swaps the live copy without restarting. Bitsquid's tool sent a network message telling the engine to reload the changed file "nearly instantaneously"[3].
Dangling references: if anything holds a raw pointer into the old asset's memory, the swap invalidates it. This is exactly why generational handles matter; a stale handle is caught on next access instead of dereferencing freed memory. In-flight uses: an asset can be mid-render or mid-job when the reload fires, so a naive free-and-replace is a data race. Real systems refcount or double-buffer and defer the free until no reader is active. And dependents must re-cook, not just the touched file: editing a shader include must reprocess every material that uses it.
08Zero-copy loading
The fastest load is no load: lay the cooked file out so the bytes are the in-memory image, then memory-map it and use it in place, with no parse and no per-object allocation. Cap'n Proto's whole premise is that "the encoding is appropriate both as a data interchange format and an in-memory representation," so you can mmap a file and "the OS won't even read in the parts that you don't access"[5]. FlatBuffers gives "access to serialized data without parsing/unpacking"[6]; Rust's rkyv makes the archived representation identical to the in-memory one[15].
It requires the cooked layout to match the target's struct layout and padding, its alignment, and its endianness. Violate any one and the cast is wrong or undefined. The buffer must be aligned: mmap returns page-aligned memory (fine), but a plain read into a Vec<u8> or malloc region is not aligned for an arbitrary type, so rkyv ships an AlignedVec for exactly this. Because the layout is the target's layout, a blob cooked for one platform is not portable to another, the same per-platform-cook point from §2. The work didn't vanish; it moved to the cook, and you pay in layout rigidity. References become offsets or relative pointers (a raw pointer's value changes every run[10]), patched once at load or resolved on access[12].
09Pitfalls
#[repr(Rust)] field order is unspecified; use #[repr(C)].10What's next
Packages are the natural home for Data Compression, the next module: how LZ and entropy coding shrink them, and why decompression speed, not just ratio, decides load times. Then the renderer, which consumes the cooked meshes and textures this pipeline produces. The full path is on the series hub.
- Jason Gregory. Game Engine Architecture, "Resources and the File System." gameenginebook.com. Resource managers, GUIDs, the offline-tools-vs-runtime split.
- Epic Games. "Cooking Content in Unreal Engine." dev.epicgames.com. Internal PNG/WAV converted to platform formats; per-target cooked output.
- Niklas Frykholm (Bitsquid). "Our Tool Architecture." bitsquid.blogspot.com. JSON source compiled to runtime data on a separate path; network-message hot reload.
- The Khronos Group. glTF 2.0 Specification. registry.khronos.org. Buffer data must be little-endian; the GLB binary container and its chunk alignment.
- Kenton Varda. Cap'n Proto. capnproto.org. The encoding is also the in-memory representation; mmap and access in place with no parse step.
- Google. FlatBuffers documentation. flatbuffers.dev. Access serialized data without parsing; vtables for forward/backward schema compatibility.
- Google. Protocol Buffers, proto3 language guide. protobuf.dev. Field numbers are the wire identity and can't change;
reservedon delete; unknown fields preserved. - Quarkslab. "Unaligned accesses in C/C++: what, why, and solutions." blog.quarkslab.com. Misaligned access is UB in the C++ standard; the x86 penalty and the ARM fault; the memcpy fix.
- Unity Technologies. "Asset Metadata." docs.unity3d.com. The
.metasidecar GUID; move/rename safety; losing the meta breaks references. - Noel Llopis. "Managing Data Relationships." gamesfromwithin.com. Pointers are unsafe to serialize; indices are safe; handles as (index, counter, type).
- "Delicious Data Baking." Game Developer. gamedeveloper.com. The offline bake, load-in-place blobs, offset references, and the POD-only constraint.
- Tom Hulton-Smith. "Load in Place data structures and Pointer Fixups." tomhulton.blogspot.com. Storing references as offsets and patching them at load in a single read.
- Epic Games. "Packaging Projects" / IoStore. dev.epicgames.com. The
.pakarchive and index; UE5 IoStore.ucas/.utocsplit. - Unity Technologies. "AssetBundles." docs.unity3d.com. A platform-specific archive grouping assets, built per target.
- David Koloski. rkyv. rkyv.org. Total zero-copy: the archived representation equals the in-memory layout; aligned access (
AlignedVec).