All tutorials Mighty Professional
Build a Game Engine · Resources

Asset Pipeline & Serialization

A .png and an .fbx are made for an artist, not for your engine at runtime. The asset pipeline is the offline machine that turns authoring files into blobs the engine can load fast, often in place. We build that machine in C++ and Rust: the , a versioned binary format, stable references, a dependency graph, hot reload, and the alignment rules that make actually work.

Time~55 min LevelMid PrereqsThe File Streaming tutorial (how bytes get off disk). Bit Shifting helps for the binary layout. StackC++ & Rust
◂ Build a Game Engine Phase 4 · Resources Next · Data Compression ▸

01Why a pipeline

Source formats are built for authoring and interchange, not for runtime. A PNG must be decoded; a WAV must be parsed; an FBX or glTF carries editor metadata and float data that isn't laid out for the GPU. Unreal puts it plainly: it stores content "in particular formats which it uses internally, such as PNG for texture data or WAV for audio. However, this content needs to be converted to different formats for the various platforms"[2].

So an engine ships a second representation: cooked data, laid out for fast loading and the target hardware. The conversion also generates mip chains, transcodes textures to GPU block formats, and reorders for the platform. Decode is one cost; layout is the bigger one.

There is no single right format

The design space splits along axes: text vs binary, parse vs zero-copy, a version tag vs schema evolution, GUIDs vs paths. The right choice depends on your platforms, your iteration speed, and whether data crosses a version boundary at runtime. This tutorial builds one coherent point in that space and names the tradeoffs at each axis.

02Bake vs runtime load

The cook (or bake) is an offline transform from source to a runtime-ready, often platform-specific artifact: import → process → cook → package. Heavy work happens once, offline; the runtime does a light load and fixup[11]. The Bitsquid engine compiled source JSON into runtime data on a separate path, and "the engine never sees the generic data folder so it can't cheat"[3].

Cooked is not the same as compressed

Cooking is transform plus layout; compression is a separate, optional packaging step (its own tutorial). A cooked blob can be uncompressed. And "cook once, run anywhere" fails for platform-specific cooks: texture block format, endianness, and alignment can differ per target, which is why Unity and Unreal both cook per build target[2]. Keep import (source → engine intermediate) distinct from cook (intermediate → shipped package).

One asset flows through the pipeline below; hit "edit the source" to trigger a hot re-cook of just that asset:

The source PNG is decoded, gets a mip chain, is transcoded to a GPU block format, written into a package at an offset, then loaded as a resident blob. Each stage transforms the representation. Hot reload re-cooks only the edited asset and swaps the result into the live scene, the payoff of §7.

03Binary serialization

A cooked format is binary. Three fundamentals decide whether it loads correctly and quickly: byte order, alignment, and what you're allowed to put in it.

A misaligned cast is undefined behavior, not just slow

Casting a byte buffer to a struct only works if the buffer satisfies the struct's alignment. On x86-64 an unaligned scalar load works but can cost 30 to 70 percent when it straddles a cache line; on ARM it can be a bus error (SIGBUS); and in C++ the act of reading through a misaligned pointer is undefined behavior regardless of the CPU[8]. Use std::bit_cast or memcpy when alignment isn't guaranteed. In Rust, only #[repr(C)] gives a defined field order you can serialize; #[repr(Rust)] layout is unspecified.

The inspector lays out an asset header byte by byte. Padding is highlighted; flip the version and a live migration rewrites the bytes:

Load file version:
The 16-bit flags field leaves a 2-byte gap so the next 32-bit field stays 4-aligned. Load v1 and the migration fills the flags field v1 never had (v2 is just a version bump). Toggle "force pack(1)" and that padding vanishes, leaving payloadBytes and the hash misaligned: fast-but-UB on x86, a bus error on ARM.

04Versioning & migration

Data outlives the code that wrote it, so a format must evolve. Two valid strategies:

Bumping the version doesn't make old data readable

A version tag only detects which layout you have; you still need a migration path or a default-fill rule, or old files fail to load. And these are different mechanisms: a version tag plus migration converts old to new; schema evolution makes one layout tolerate added and removed fields without a migration step. Forward compatibility (old code reads new data) and backward compatibility (new code reads old data) are also distinct; state which you're claiming.

A versioned binary header with a migration switch
#include <cstdint>
struct AssetHeader {              // fixed-width, no virtuals, safe to write as bytes
    uint32_t magic;                 // sentinel: catch a wrong/corrupt file early
    uint32_t version;               // drives the migration switch
    uint16_t flags;                 // 16-bit: a 2-byte pad follows so the next field is 4-aligned
    uint32_t payloadBytes;
    uint64_t contentHash;           // 8-byte aligned at offset 16
};
static_assert(sizeof(AssetHeader) == 24, "layout drifted");
static_assert(alignof(AssetHeader) == 8, "alignment drifted");

bool migrate(AssetHeader& h) {
    switch (h.version) {
        case 1: h.flags = 0;          // v1 had no flags field
                [[fallthrough]];
        case 2: h.version = 3; return true;
        case 3: return true;
        default: return false;            // unknown/newer: refuse
    }
}
use bytemuck::{Pod, Zeroable};
#[repr(C)]                          // defined field order; #[repr(Rust)] is unspecified
#[derive(Clone, Copy, Pod, Zeroable)]
struct AssetHeader {
    magic: u32,
    version: u32,
    flags: u16,                   // 16-bit: a 2-byte pad follows so the next field is 4-aligned
    payload_bytes: u32,
    content_hash: u64,            // 8-byte aligned at offset 16
}
const _: () = assert!(core::mem::size_of::<AssetHeader>() == 24);

fn migrate(h: &mut AssetHeader) -> bool {
    match h.version {
        1 => { h.flags = 0; h.version = 3; true }
        2 => { h.version = 3; true }
        3 => true,
        _ => false,                       // unknown/newer: refuse
    }
}

05References & dependencies

Assets point at other assets, and how you encode that reference decides whether moving a file breaks the game.

Click an asset to "edit" it; the dirty set propagates to everything downstream:

Click an asset to edit it; its dependents must re-cook.
Editing the shader include dirties the shader, the material, the mesh, and the prefab, everything downstream, while unrelated assets stay clean. That's incremental cooking: touch one file, re-cook only what depends on it. The graph must use a visited set, because asset references can form cycles.

Wrong answers, and why: GUID and handle are opposite roles (persisted identity vs runtime index), not synonyms; and a cast crash that's platform-specific is an alignment/layout problem, not memory or mmap availability.

06Packaging

Thousands of small files mean thousands of open() calls and seeks. A package concatenates cooked assets into a few archives plus an index (offset, size, and ID per entry), which cuts I/O overhead and lets the streamer pull contiguous data.

Unreal's .pak is "an archive file format... to store cooked content" with an index for its file system, and UE5's IoStore splits content (.ucas) from metadata (.utoc)[13]; Unity's AssetBundle is a platform-specific archive grouping assets[14]. Compression layers on here (per-entry or whole-archive) and is its own tutorial; the index plus contiguous layout is what makes the prioritized streaming from the File Streaming tutorial feasible.

07Hot reload

The pipeline watches the source, re-cooks the changed asset and its dependents, and swaps the live copy without restarting. Bitsquid's tool sent a network message telling the engine to reload the changed file "nearly instantaneously"[3].

The two things that make hot reload hard

Dangling references: if anything holds a raw pointer into the old asset's memory, the swap invalidates it. This is exactly why generational handles matter; a stale handle is caught on next access instead of dereferencing freed memory. In-flight uses: an asset can be mid-render or mid-job when the reload fires, so a naive free-and-replace is a data race. Real systems refcount or double-buffer and defer the free until no reader is active. And dependents must re-cook, not just the touched file: editing a shader include must reprocess every material that uses it.

08Zero-copy loading

The fastest load is no load: lay the cooked file out so the bytes are the in-memory image, then memory-map it and use it in place, with no parse and no per-object allocation. Cap'n Proto's whole premise is that "the encoding is appropriate both as a data interchange format and an in-memory representation," so you can mmap a file and "the OS won't even read in the parts that you don't access"[5]. FlatBuffers gives "access to serialized data without parsing/unpacking"[6]; Rust's rkyv makes the archived representation identical to the in-memory one[15].

Zero-copy is not free, and not unconditional

It requires the cooked layout to match the target's struct layout and padding, its alignment, and its endianness. Violate any one and the cast is wrong or undefined. The buffer must be aligned: mmap returns page-aligned memory (fine), but a plain read into a Vec<u8> or malloc region is not aligned for an arbitrary type, so rkyv ships an AlignedVec for exactly this. Because the layout is the target's layout, a blob cooked for one platform is not portable to another, the same per-platform-cook point from §2. The work didn't vanish; it moved to the cook, and you pay in layout rigidity. References become offsets or relative pointers (a raw pointer's value changes every run[10]), patched once at load or resolved on access[12].

09Pitfalls

"Cooked means compressed"Cooking is transform + layout; compression is a separate packaging step.
Crash on ARM, fine on x86A misaligned cast of a cooked blob: UB everywhere, a bus error on ARM.
Old saves won't loadBumped the version but wrote no migration or default-fill rule.
Reference breaks on renameReferencing by path instead of a stable GUID.
Stale asset after reloadHolding a raw index/pointer instead of a generational handle.
Cooker re-cooks everythingNo dependency graph, or no visited-set so a reference cycle loops.
Rust struct serializes wrong#[repr(Rust)] field order is unspecified; use #[repr(C)].
zero-copy garbage on another platformThe layout is the target's; re-cook per platform (endian/align/format).

10What's next

Packages are the natural home for Data Compression, the next module: how LZ and entropy coding shrink them, and why decompression speed, not just ratio, decides load times. Then the renderer, which consumes the cooked meshes and textures this pipeline produces. The full path is on the series hub.

  1. Jason Gregory. Game Engine Architecture, "Resources and the File System." gameenginebook.com. Resource managers, GUIDs, the offline-tools-vs-runtime split.
  2. Epic Games. "Cooking Content in Unreal Engine." dev.epicgames.com. Internal PNG/WAV converted to platform formats; per-target cooked output.
  3. Niklas Frykholm (Bitsquid). "Our Tool Architecture." bitsquid.blogspot.com. JSON source compiled to runtime data on a separate path; network-message hot reload.
  4. The Khronos Group. glTF 2.0 Specification. registry.khronos.org. Buffer data must be little-endian; the GLB binary container and its chunk alignment.
  5. Kenton Varda. Cap'n Proto. capnproto.org. The encoding is also the in-memory representation; mmap and access in place with no parse step.
  6. Google. FlatBuffers documentation. flatbuffers.dev. Access serialized data without parsing; vtables for forward/backward schema compatibility.
  7. Google. Protocol Buffers, proto3 language guide. protobuf.dev. Field numbers are the wire identity and can't change; reserved on delete; unknown fields preserved.
  8. Quarkslab. "Unaligned accesses in C/C++: what, why, and solutions." blog.quarkslab.com. Misaligned access is UB in the C++ standard; the x86 penalty and the ARM fault; the memcpy fix.
  9. Unity Technologies. "Asset Metadata." docs.unity3d.com. The .meta sidecar GUID; move/rename safety; losing the meta breaks references.
  10. Noel Llopis. "Managing Data Relationships." gamesfromwithin.com. Pointers are unsafe to serialize; indices are safe; handles as (index, counter, type).
  11. "Delicious Data Baking." Game Developer. gamedeveloper.com. The offline bake, load-in-place blobs, offset references, and the POD-only constraint.
  12. Tom Hulton-Smith. "Load in Place data structures and Pointer Fixups." tomhulton.blogspot.com. Storing references as offsets and patching them at load in a single read.
  13. Epic Games. "Packaging Projects" / IoStore. dev.epicgames.com. The .pak archive and index; UE5 IoStore .ucas/.utoc split.
  14. Unity Technologies. "AssetBundles." docs.unity3d.com. A platform-specific archive grouping assets, built per target.
  15. David Koloski. rkyv. rkyv.org. Total zero-copy: the archived representation equals the in-memory layout; aligned access (AlignedVec).

See also