Tutorial 14 · Engine Architecture

ECS from Scratch

The architecture that replaced deep OOP inheritance hierarchies in game engines. Entity is an ID. Component is data. System is a function that transforms matching components every tick. No virtual dispatch, no pointer chasing, no diamond inheritance. Flat arrays, cache-friendly iteration, and trivial parallelism. We build one from scratch, measure why it is fast, and trace the design through Overwatch, Unity DOTS, Bevy, Flecs, and EnTT.

Time~50 min LevelMid to senior PrereqsYou can read C++ and Rust. Basic memory model awareness (cache lines, sequential vs random access). Comfortable with bitwise operations. HardwareNone. A feel for cache hierarchy helps in sections 5 through 7.

01Why ECS

Game objects in a shipping engine carry a variable set of behaviors: this one has a transform and a mesh, that one has a transform, a mesh, a rigid body, and an AI controller. The classical OOP approach models this with inheritance. Twenty years of shipped games demonstrated that deep inheritance hierarchies produce diamond problems, fat base classes, and cache-hostile memory layouts that cost real frame time at scale. ECS is the replacement.

The core proposition: separate identity from data from behavior. An entity is a lightweight ID. A component is a plain data struct attached to that ID. A system is a function that runs over all entities matching a component query. No inheritance. No virtual dispatch. Components live in flat, typed arrays. Systems iterate those arrays sequentially. The CPU prefetcher sees a predictable stride. The scheduler sees declared read/write sets and can parallelize automatically.

The results show up in frame time. Unity's DOTS benchmarks report iterating 100,000 entities with a simple Position+Velocity update in roughly 0.3 ms on a modern desktop CPU, versus 3+ ms for the equivalent MonoBehaviour approach^[4]. That is an order-of-magnitude improvement on identical logic, driven entirely by memory layout and dispatch cost.

What you'll have by the end

Working knowledge of both major ECS storage strategies (archetypes and sparse sets), when to pick each, and how to implement them. Generational indices for safe entity recycling. Query matching by bitset intersection. The structural change problem and command buffer pattern. System scheduling and automatic parallelism. And the case studies: Overwatch's gameplay ECS, Unity DOTS, Bevy's parallel executor, Flecs' relationship model, EnTT's sparse-set design, and Unreal's Mass Entity framework.

02A short history

The component pattern predates the term "ECS" by over a decade. The timeline of the ideas that converged into the architecture shipping in engines today:

2002

Scott Bilas, "A Data-Driven Game Object System," GDC 2002. Built for Dungeon Siege at Gas Powered Games. Over 7,300 unique object types, 100,000+ placed objects across a continuous world. Bilas proposed assembling game objects from data-driven components instead of inheriting from a class hierarchy. No engineer required to create a new object type.^[1] This is the earliest widely cited talk on component-based game objects.

2007

Adam Martin, "Entity Systems are the future of MMOG development," T-Machine blog. A five-part blog series that named the pattern and argued for strict separation: entities hold no data, components hold no behavior, systems hold no state.^[2] Martin's taxonomy (entity as ID, component as data, system as logic) became the canonical definition the community adopted.

2017

Timothy Ford, "Overwatch Gameplay Architecture and Netcode," GDC 2017. Blizzard's Overwatch shipped on a custom ECS. Ford described how the ECS curtails complexity even as the team adds new heroes with radically different abilities. The deterministic simulation is built on the ECS tick model: systems run in a fixed order, each reading and writing declared component sets.^[3]

2018

Catherine West, RustConf 2018 closing keynote: "Using Rust for Game Development." Walked through the OOP-to-ECS transition in Rust, showing how the borrow checker makes traditional mutable-object-graph architectures painful and ECS natural. Widely credited with sparking the Rust gamedev ECS wave that produced Bevy, Hecs, and Legion.^[5]

2018

Unity announces DOTS (Data-Oriented Technology Stack). An archetype-based ECS integrated into the Unity editor. Chunk-allocated archetype tables, the Burst compiler for auto-vectorized system code, and the C# Job System for multi-threaded scheduling. Shipped iteratively from 2018 through the Entities 1.0 release in 2023.^[4]

2017

Michele Caini releases EnTT. A header-only C++ ECS built on sparse sets rather than archetypes. Each component type gets its own sparse-set pool. Adding and removing components is O(1) with no table migration. Used in Minecraft (Bedrock Edition) by Mojang.^[7]

2019

Sander Mertens releases Flecs v1.0. A C/C++ ECS with first-class entity relationships, query caching, and an archetype storage backend. Mertens' "Building an ECS" blog series^[8] is the most detailed public documentation of archetype-storage internals, covering table layout, edge graphs for archetype transitions, and query optimization.

2020

Bevy 0.1 released by Cart (Carter Anderson). A Rust game engine with an archetype-based ECS at its core. Bevy's scheduler automatically parallelizes systems based on declared read/write access to component types. The ECS design draws from prior Rust crates (Legion, Hecs) but integrates scheduling, resources, and change detection into one system.^[6]

2022

Unreal Engine 5.0 ships Mass Entity. An archetype-based ECS framework built by Epic's AI team for large crowd simulations. Chunk-based memory layout sized for 128-byte cache lines (anticipating next-gen hardware; current x86 and ARM use 64-byte lines). Integrated with Unreal's existing actor/component model via Mass Entity traits.^[10]

03The OOP problem

The classical game-object hierarchy starts reasonable: GameObject at the root, RenderableObject inherits from it, PhysicsObject inherits from it, Character inherits from both. By the time you have 200 object types across a shipped game, the hierarchy is 6 to 12 levels deep. The problems are structural, not cosmetic.

Diamond inheritance. A FlyingEnemy needs both Enemy (AI, health) and FlyingObject (flight model). Both inherit from PhysicsObject. C++ virtual inheritance "solves" this at the cost of extra indirection, vtable complexity, and a data layout that no one on the team can draw on a whiteboard.
Fat base classes. Every feature that "most objects need" migrates upward. GameObject accumulates a transform, a bounding box, a name, a layer mask, a tag, an enable flag, a serialization hook. Objects that need none of these (a trigger zone, a sound emitter) pay for all of them in memory and initialization cost.
Virtual dispatch overhead. A per-frame Update() call on 50,000 objects through a vtable means 50,000 indirect function calls. Each one is a potential branch misprediction (the CPU cannot predict the target of an indirect call through a pointer it hasn't seen recently). At roughly 15 to 20 cycles per misprediction on a modern out-of-order core, that is on the order of a millisecond wasted on dispatch alone at 50k objects.
Cache-hostile layout. Each object is heap-allocated. The new allocator interleaves objects of different types in address space. Iterating all RenderableObject instances pointer-chases through a linked list or flat pointer array, loading one cache line per object. Most of that cache line is wasted on fields the current loop does not touch.

The "everything is a GameObject" model that Unity (pre-DOTS) and many custom engines used is a partial fix. It replaces inheritance with composition at the object level, but the components themselves are still polymorphic, heap-allocated, and pointer-chased. The iteration pattern (for each entity, fetch its component by type, call a virtual method on it) is fundamentally the same pointer chase.

04Entities, Components, Systems

An ECS has three concepts and zero inheritance.

Entity: a . Typically a 32-bit or 64-bit integer split into an index (slot in an array) and a generation counter. The entity itself stores nothing. It is a key into the component tables.

Component: a plain data struct. No methods, no vtable, no inheritance. struct Position { float x, y, z; }; is a complete component. Components are stored in typed, contiguous arrays. The storage strategy (how those arrays are organized) is the subject of sections 5 and 6.

System: a function (or callable) that queries a set of component types and iterates all entities matching that query. A movement system declares "give me every entity with Position and Velocity" and runs position += velocity * dt for each one. Systems have no per-entity state. They read and write components; the ECS runtime provides the iterator.

This separation has three consequences that matter for performance:

Homogeneous arrays. All Position components are stored in one flat array. Iteration is a sequential scan. The CPU prefetcher sees a constant stride and loads ahead.
No virtual dispatch. A system is a single function pointer. It runs in a tight loop over flat data. No indirect call per entity.
Declarative access. Each system declares which component types it reads and which it writes. The scheduler can run two systems in parallel if their access sets don't conflict. This is mechanical, not hand-tuned.

// Minimal ECS usage pattern (pseudocode).
// Create entities
auto player = world.spawn();
world.add<Position>(player, {0, 0, 0});
world.add<Velocity>(player, {1, 0, 0});
world.add<Health>(player, {100, 100});

auto prop = world.spawn();
world.add<Position>(prop, {5, 0, 0});
world.add<StaticTag>(prop, {});

// Movement system: iterates entities with Position AND Velocity.
// The prop (no Velocity) is excluded automatically.
world.system<Position, const Velocity>(
    [](auto& position, const auto& velocity) {
        position.x += velocity.dx * dt;
        position.y += velocity.dy * dt;
        position.z += velocity.dz * dt;
    }
);

05Storage strategy 1: Sparse sets

A maps entity IDs to component data using two arrays. The sparse array is indexed by entity ID and stores the index into a dense array. The dense array stores entity IDs (and, in parallel, component values) packed contiguously with no gaps.

All three core operations are O(1):

Has: check sparse[entity]. If the value is in range and dense[sparse[entity]] == entity, the entity has this component.
Add: set sparse[entity] = dense.length, push the entity ID onto dense, push the component value onto the parallel values array.
Remove: swap the entity's slot with the last element of dense and values, then pop. Update sparse for the swapped entity. O(1), no allocation.

EnTT^[7] uses one sparse set per component type. The trade-off: the sparse array is sized to the maximum entity ID, so it can consume significant memory if entity IDs are large. Paging the sparse array (allocating it in fixed-size pages on demand) mitigates this. The swap-and-pop removal does not preserve insertion order, which matters if you need deterministic iteration order across runs.

dense count

sparse size

utilization

The sparse array is indexed by entity ID. The dense array is packed: no gaps, no holes. Removal swaps the target with the last element and pops, keeping the dense array contiguous in O(1). The cost is that iteration order changes on every removal. EnTT provides a sort() operation to restore order when needed (useful for render-order-dependent iteration).

Sparse set allocator

template<typename T>
struct SparseSet {
    static constexpr uint32_t INVALID = UINT32_MAX;

    std::vector<uint32_t> sparse;   // entity ID -> dense index
    std::vector<uint32_t> dense;    // packed entity IDs
    std::vector<T>          values;  // component data, parallel to dense

    void ensure_sparse(uint32_t entityId) {
        if (entityId >= sparse.size())
            sparse.resize(entityId + 1, INVALID);
    }

    bool has(uint32_t entityId) const {
        return entityId < sparse.size()
            && sparse[entityId] != INVALID
            && dense[sparse[entityId]] == entityId;
    }

    void add(uint32_t entityId, T value) {
        ensure_sparse(entityId);
        sparse[entityId] = static_cast<uint32_t>(dense.size());
        dense.push_back(entityId);
        values.push_back(std::move(value));
    }

    void remove(uint32_t entityId) {
        if (!has(entityId)) return;
        auto idx  = sparse[entityId];
        auto last = static_cast<uint32_t>(dense.size() - 1);
        if (idx != last) {                            // swap with last
            dense[idx]  = dense[last];
            values[idx] = std::move(values[last]);
            sparse[dense[idx]] = idx;                // fix swapped entity's sparse entry
        }
        dense.pop_back();
        values.pop_back();
        sparse[entityId] = INVALID;
    }

    T& get(uint32_t entityId)       { return values[sparse[entityId]]; }
    const T& get(uint32_t entityId) const { return values[sparse[entityId]]; }
};

pub struct SparseSet<T> {
    sparse: Vec<Option<usize>>,  // entity ID -> dense index
    dense:  Vec<u32>,             // packed entity IDs
    values: Vec<T>,              // component data, parallel to dense
}

impl<T> SparseSet<T> {
    pub fn has(&self, entity_id: u32) -> bool {
        let eid = entity_id as usize;
        eid < self.sparse.len()
            && self.sparse[eid].is_some()
            && self.dense[self.sparse[eid].unwrap()] == entity_id
    }

    pub fn add(&mut self, entity_id: u32, value: T) {
        let eid = entity_id as usize;
        if eid >= self.sparse.len() {
            self.sparse.resize_with(eid + 1, || None);
        }
        self.sparse[eid] = Some(self.dense.len());
        self.dense.push(entity_id);
        self.values.push(value);
    }

    pub fn remove(&mut self, entity_id: u32) {
        if !self.has(entity_id) { return; }
        let idx  = self.sparse[entity_id as usize].unwrap();
        let last = self.dense.len() - 1;
        if idx != last {
            self.dense.swap(idx, last);
            self.values.swap(idx, last);
            let swapped = self.dense[idx] as usize;
            self.sparse[swapped] = Some(idx);
        }
        self.dense.pop();
        self.values.pop();
        self.sparse[entity_id as usize] = None;
    }
}

06Storage strategy 2: Archetypes

An groups entities by their exact component set. All entities with exactly {Position, Velocity} live in one table. All entities with {Position, Velocity, Health} live in another. Each table is a set of contiguous arrays, one per component column, plus an entity ID column. Iterating all entities with Position and Velocity means finding every archetype whose component set is a superset of {Position, Velocity} and scanning each matching table sequentially.

This is the storage model Unity DOTS^[4], Flecs^[8], and Unreal Mass Entity^[10] use. The core trade-off vs sparse sets: iteration is a pure sequential scan (the CPU prefetcher's best case), but adding or removing a component moves the entity's data from one archetype table to another. That move is the problem (section 8).

archetypes

entities

selected

none

07Iteration and cache coherence

The performance argument for ECS reduces to one claim: iterating flat, typed arrays is faster than pointer-chasing through polymorphic objects. The gap is not algorithmic (both are O(n)); it is entirely in the constant factor, dominated by cache line utilization.

In an OOP hierarchy, each game object is heap-allocated. Iterating "all objects with a physics component" dereferences a pointer per object. Each pointer leads to a different address. The CPU loads a 64-byte cache line for each dereference; if the useful data is 16 bytes (a Position struct), 48 bytes of each line are wasted. Worse, successive objects are rarely adjacent in memory, so every dereference is a potential L1 miss (roughly 4 ns on modern hardware) or L2 miss (roughly 12 ns), possibly an L3 miss or DRAM access (60 to 100+ ns)^[9].

In an archetype ECS, all Position values for entities in one archetype are packed in a contiguous float[]. Iterating it is a sequential scan. The hardware prefetcher detects the stride and loads ahead. Every byte of every cache line contains useful data. For a 12-byte Position struct (3 floats), roughly 5 positions fit per 64-byte cache line. At L1 hit latency, the amortized cost per entity is a fraction of a nanosecond. The ratio between pointer-chasing and sequential access is often 50x to 200x in practice, depending on working set size and cache pressure from other systems.

hit time 20 ms (a miss costs 8×)

OOP cache hits

OOP cache misses

ECS cache hits

ECS cache misses

The OOP side accesses entities in random order (simulating heap-allocated objects scattered across memory); each access to a new cache line is a miss. The ECS side scans sequentially, reusing each line for several entities before advancing. Both visit all 64 entities exactly once, so the only difference is memory layout. A miss stalls that side while the line is fetched, so the sequential (ECS) side finishes first while the pointer-chasing (OOP) side is still grinding through misses. The per-miss stall is set to 8× a hit here so the clip stays watchable; a real DRAM miss runs roughly 50–200× an L1 hit, so the true gap is wider.

08Component add/remove: the structural change problem

In archetype storage, adding a component to an entity means moving its data from the current archetype table to a different one (the archetype that has the old set plus the new component). Removing a component does the same in reverse. Each move copies every component value for that entity. If the entity has 8 components totaling 200 bytes, that is a 200-byte memcpy per .

A single move is cheap. A thousand moves per frame (spawn 500 enemies, each with an add-component-on-spawn pattern) is not. The solutions:

Deferred commands. Systems record structural changes into a instead of executing them immediately. At the end of the system (or at an explicit sync point), the buffer replays all changes in batch. This avoids invalidating iterators mid-loop and enables batching moves by target archetype. Unity calls this an EntityCommandBuffer. Bevy uses Commands.
Archetype edge caching. When entity e moves from archetype A to archetype A+{Health}, the ECS caches the edge "A + Health = B". The next entity that adds Health to the same archetype A skips the archetype lookup and goes straight to B. Flecs^[8] stores these edges in a graph connecting archetypes.
Chunk allocation. Unity DOTS allocates archetype tables in fixed-size chunks (16 KiB). Each chunk holds as many entities as fit. Moving an entity out of a chunk leaves a hole that is filled by swapping in the last entity from the same chunk. This keeps chunks packed without a global compaction pass.

In sparse-set ECS (EnTT), structural changes are cheaper: adding a component inserts into a per-type sparse set (O(1)), removing swaps and pops (O(1)). No table migration. This is the primary advantage of sparse sets over archetypes for workloads with frequent component add/remove (particle systems, buff/debuff stacking, short-lived effects).

09Queries and query caching

A query is the interface between a system and the storage. A query descriptor says: "give me every entity that has all of these component types and none of those." The ECS runtime resolves this into a set of archetype tables whose component sets satisfy the constraints.

The resolution step is a bitset intersection. Assign each component type a bit index. Each archetype stores a bitmask of its component types. The query "With(Position, Velocity), Without(Static)" becomes:

// Query: With(Position, Velocity), Without(Static)
uint64_t withMask    = (1 << POSITION_BIT) | (1 << VELOCITY_BIT);
uint64_t withoutMask = (1 << STATIC_BIT);

for (auto& archetype : allArchetypes) {
    bool hasAll    = (archetype.mask & withMask) == withMask;
    bool hasNone   = (archetype.mask & withoutMask) == 0;
    if (hasAll && hasNone) {
        iterateArchetype(archetype);           // sequential scan of matching table
    }
}

This outer loop (over archetypes) is cheap: dozens to low hundreds of archetypes in a typical game, each checked by two bitwise ANDs and two comparisons. The inner loop (over entities in each matching archetype) is the sequential scan that does the real work.

Query caching avoids re-running the archetype match every frame. On first execution, the query finds all matching archetypes and stores pointers to them. When a new archetype is created (because some entity got a novel component combination), the ECS tests it against all cached queries and adds it where it matches. Flecs^[8] and Bevy^[6] both cache queries this way.

Click components to cycle: off / with / without

archetypes matched

0/0

entities matched

10Relationships and entity references

Pure ECS stores flat data. But games need structure: parent/child hierarchies (scene graph), targeting (missile locked onto a ship), inventory (item inside a container), socket attachment (weapon in hand). These are all relationships between entities.

The simplest approach: store a Parent component containing the parent's entity ID. This works for parent/child. Flecs^[8] generalizes this into first-class relationships: a component type can be parameterized by a target entity. (ChildOf, parent_entity) is a relationship pair that acts as a component. Entities with the same relationship pair land in the same archetype, so "find all children of entity X" is an archetype query.

The danger with entity references in components: the referenced entity may be destroyed. A Target component pointing to entity 42 is a dangling reference after entity 42 is freed. Generational indices (section 12) detect this at resolve time. Flecs additionally supports "on delete" hooks: when the target of a relationship is destroyed, the relationship component is automatically removed from all entities that reference it.

11Scheduling and parallelism

Each system declares which component types it reads and which it writes. Two systems can run in parallel if they have no write-write or read-write conflict on the same component type. A system that reads Position and writes Velocity can run in parallel with a system that reads Health and writes Damage, because the component sets are disjoint.

The scheduler builds a dependency graph of systems. Edges encode conflicts: if system A writes Position and system B reads Position, B depends on A (or vice versa, depending on declared ordering). The scheduler topologically sorts this graph and dispatches independent systems to worker threads. Bevy's multi-threaded executor^[6] does this automatically each frame. The developer writes systems with declared access; the engine parallelizes them.

This only works because ECS access is declarative. In an OOP architecture, a method on a GameObject can touch any field on any other object through a pointer. The engine has no way to know what a method accesses without running it. In ECS, the query signature is the access declaration. The scheduler reads it statically.

// Bevy system declarations (Rust). The scheduler reads the type signature
// to determine access: Query<&Position> = read Position, Query<&mut Velocity> = write Velocity.

fn movement_system(
    mut query: Query<(&mut Position, &Velocity)>,
    time: Res<Time>,
) {
    for (mut position, velocity) in &mut query {
        position.x += velocity.dx * time.delta_seconds();
        position.y += velocity.dy * time.delta_seconds();
    }
}

fn damage_system(
    mut query: Query<&mut Health, With<DamageReceiver>>,
    damage_events: EventReader<DamageEvent>,
) {
    // Reads DamageEvent, writes Health. No overlap with movement_system.
    // The scheduler runs both in parallel.
}

// Registration: the engine reads the function signatures at compile time.
app.add_systems(Update, (movement_system, damage_system));

12Generational indices

Entity IDs must be recycled. A game that spawns and destroys thousands of projectiles per second will grow the slot array unboundedly unless IDs are reused, since per-ID storage (sparse arrays, component slots) scales with the high-water mark of the index. A monotonically increasing 32-bit index would also exhaust the space in days at sustained high spawn rates. The standard solution: a .

The entity allocator maintains an array of slots. Each slot has a generation counter and an alive flag. A free list tracks available slots. Allocating pops a slot off the free list and returns {index, generation}. Freeing increments the generation and pushes the slot back onto the free list. Any saved reference that holds the old generation will fail the generation check on resolve.

This prevents the ABA problem in entity references: slot 5 held enemy A (generation 0), enemy A was destroyed (generation bumped to 1), slot 5 was reused for projectile B (generation 1). A stale reference to {index: 5, generation: 0} correctly fails to resolve, even though slot 5 is alive again.

alive

free slots

saved refs

Generational index allocator

struct Entity {
    uint32_t index;
    uint32_t generation;
};

struct EntityAllocator {
    struct Slot { uint32_t generation; bool alive; };

    std::vector<Slot> slots;
    std::vector<uint32_t> freeList;

    Entity allocate() {
        uint32_t index;
        if (!freeList.empty()) {
            index = freeList.back();
            freeList.pop_back();
        } else {
            index = static_cast<uint32_t>(slots.size());
            slots.push_back({0, false});
        }
        slots[index].alive = true;
        return { index, slots[index].generation };
    }

    void free(Entity entity) {
        if (!isAlive(entity)) return;
        slots[entity.index].alive = false;
        slots[entity.index].generation++;           // invalidate stale refs
        freeList.push_back(entity.index);
    }

    bool isAlive(Entity entity) const {
        return entity.index < slots.size()
            && slots[entity.index].alive
            && slots[entity.index].generation == entity.generation;
    }
};

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub struct Entity {
    pub index: u32,
    pub generation: u32,
}

pub struct EntityAllocator {
    slots: Vec<(u32, bool)>,        // (generation, alive)
    free_list: Vec<u32>,
}

impl EntityAllocator {
    pub fn new() -> Self {
        Self { slots: Vec::new(), free_list: Vec::new() }
    }

    pub fn allocate(&mut self) -> Entity {
        let index = if let Some(idx) = self.free_list.pop() {
            idx
        } else {
            let idx = self.slots.len() as u32;
            self.slots.push((0, false));
            idx
        };
        self.slots[index as usize].1 = true;
        Entity { index, generation: self.slots[index as usize].0 }
    }

    pub fn free(&mut self, entity: Entity) {
        if !self.is_alive(entity) { return; }
        let slot = &mut self.slots[entity.index as usize];
        slot.1 = false;
        slot.0 += 1;                                  // bump generation
        self.free_list.push(entity.index);
    }

    pub fn is_alive(&self, entity: Entity) -> bool {
        let idx = entity.index as usize;
        idx < self.slots.len()
            && self.slots[idx].1
            && self.slots[idx].0 == entity.generation
    }
}

What's intentionally missing

The allocator above is the minimal viable version. Production implementations add: packing index and generation into a single 64-bit integer (Bevy uses 32 bits index + 32 bits generation), tombstone detection for double-free, configurable generation width (some engines use 16-bit generation + 16-bit index for tighter packing), and atomic operations for thread-safe allocation.

13Case studies

Overwatch (Blizzard, 2016)

Timothy Ford's GDC 2017 talk^[3] describes a custom ECS built for Overwatch. Systems run in a fixed tick order. Each system declares its component reads and writes. The ECS enables the deterministic simulation that powers Overwatch's netcode: given the same inputs, the same sequence of system ticks produces the same game state. Hero abilities that would be nightmares in a deep inheritance hierarchy (Genji's deflect interacting with every projectile type) are implemented as systems that query component combinations rather than as method overrides on a base Projectile class.

Unity DOTS

Unity's archetype ECS stores entities in 16 KiB chunks^[4]. Each chunk belongs to one archetype. Within a chunk, component arrays are laid out in order at the component-type level: all Position structs contiguous, then all Velocity structs, then all Health structs. The Burst compiler auto-vectorizes system loops over these arrays, emitting SIMD instructions without manual intrinsics. The C# Job System schedules jobs across worker threads based on declared component access, similar to Bevy's approach but in the C# / .NET runtime.

Bevy (Rust)

Bevy^[6] uses archetype storage with a multi-threaded system executor. Systems are plain Rust functions. Their parameter types encode the query: Query<&Position, &mut Velocity> means "read Position, write Velocity." The executor builds a dependency graph from these signatures and dispatches independent systems to a thread pool. Change detection is built in: each component column tracks the tick at which it was last written, so systems can query "give me only entities whose Health changed since last frame."

Flecs (C/C++)

Flecs^[8] is an archetype-based ECS with first-class relationships. The (ChildOf, parent) relationship pair acts as a component: entities with the same parent share an archetype, making "find all children of X" an archetype-table scan. Flecs caches query results and maintains an archetype graph where edges represent "add component C" or "remove component C" transitions, enabling O(1) archetype lookup on structural changes. Sander Mertens' "Building an ECS" blog series^[8] provides the most detailed public documentation of these internals.

EnTT (C++)

EnTT^[7] is the primary example of a sparse-set ECS. Each component type has its own sparse set pool. No archetype tables, no table migration on add/remove. The trade-off: iteration over multiple component types requires intersecting multiple sparse sets (iterating the smallest set and looking up each entity in the others). Used in Minecraft Bedrock Edition. Michele Caini's "ECS Back and Forth" series documents the design decisions in detail.

Unreal Mass Entity

Mass Entity^[10] is Epic's archetype-based ECS framework, integrated into Unreal Engine 5. Built by the AI team for crowd simulation (the "Matrix Awakens" demo). Chunk-based allocation sized at 128 bytes per cache line, 1024 lines per chunk. Interoperates with Unreal's existing Actor/Component model via "Mass Entity traits" that bridge between the ECS world and traditional UObjects.

14Pitfalls

Over-splitting components. Splitting Position into PositionX, PositionY, PositionZ (one component per field) maximizes SoA vectorization but creates three archetype columns where one suffices. Most systems read all three fields together; the extra indirection costs more than the SIMD benefit. Split only when profiling shows a system that reads one axis and ignores the others.
Archetype explosion. If entities carry many optional components and the combinations are diverse, the archetype count grows combinatorially. 20 optional components produce up to 2^20 possible archetypes. In practice, a few hundred archetypes cover the common cases. If the count grows past a few thousand, reconsider whether some of those "components" should be fields inside a larger component.
Structural change storms. A system that adds and removes components every frame (toggling a buff on and off) moves entities between archetypes every tick. Use a boolean field inside the component instead of adding/removing the component, or use command buffers to batch the changes.
Entity reference dangling. A component stores an entity ID referencing another entity. That entity is destroyed. The component now holds a dangling reference. Generational indices detect this at resolve time, but the system must handle the failure case (skip, remove the component, spawn a replacement).
System ordering bugs. System A writes a value that system B reads. If the scheduler runs them in the wrong order (or in parallel), B sees stale data. Declare explicit ordering constraints (.after(SystemA) in Bevy) when data dependencies exist that the component access analysis cannot capture (e.g., both systems read and write different fields of the same component type).

15What's next

Change detection. Track which components were modified since the last frame. Bevy stores a "last changed" tick per archetype column; a system can filter its query to only entities whose Health changed, skipping the rest. This turns O(n) iteration into O(modified) for reactive systems.
Serialization. Archetype tables are contiguous arrays of typed data. Serializing a game state is iterating each archetype and writing its arrays to disk. Deserialization reconstructs the tables. The regularity of the layout makes this simpler than serializing an arbitrary object graph.
Networking. ECS makes delta compression straightforward: for each component type, diff the current frame's array against the previous frame's array. Send only the changed entries. The deterministic system ordering that ECS encourages (Overwatch's approach) enables lockstep and rollback netcode patterns.
GPU-driven ECS. Store component arrays in GPU-visible buffers. Run systems as compute shaders. The flat-array layout maps directly to GPU memory models. Unity's DOTS Burst compiler and Unreal's Mass framework are steps in this direction, though fully GPU-resident ECS is still experimental in most production engines as of 2026.

16Sources

Scott Bilas. "A Data-Driven Game Object System." GDC 2002, Gas Powered Games. gamedevs.org/uploads/data-driven-game-object-system.pdf. The earliest widely cited talk on assembling game objects from data-driven components. Built for Dungeon Siege (7,300+ object types).
Adam Martin. "Entity Systems are the future of MMOG development." T-Machine blog, Part 1: September 2007. t-machine.org. Five-part series that codified the entity-as-ID, component-as-data, system-as-logic taxonomy.
Timothy Ford. "Overwatch Gameplay Architecture and Netcode." GDC 2017, Blizzard Entertainment. gdcvault.com. Describes the custom ECS powering Overwatch's deterministic simulation, system ordering, and netcode.
Unity Technologies. "DOTS - Data-Oriented Technology Stack." unity.com/dots. Archetype-based ECS with Burst compiler and C# Job System. The Entities 1.0 package shipped in 2023.
Catherine West. "Using Rust For Game Development." RustConf 2018, Closing Keynote. kyren.github.io/2018/09/14/rustconf-talk.html. Walked through OOP-to-ECS in Rust; credited with catalyzing the Rust ECS ecosystem (Bevy, Hecs, Legion).
Carter Anderson et al. "Bevy Engine." bevyengine.org. Rust game engine with archetype ECS, automatic parallel system scheduling, and change detection. Source code at github.com/bevyengine/bevy.
Michele Caini (skypjack). "EnTT: Gaming meets modern C++." github.com/skypjack/entt. Sparse-set ECS. Used in Minecraft Bedrock Edition. Caini's "ECS Back and Forth" series: skypjack.github.io.
Sander Mertens. "Building an ECS" blog series and Flecs. ajmmertens.medium.com. Most detailed public documentation of archetype storage internals, edge graphs, and query caching. Flecs source: github.com/SanderMertens/flecs.
Jeff Dean. "Numbers Everyone Should Know." Originally from a Stanford CS295 talk, c. 2009; popularized by Jonas Bonér's gist. gist.github.com/jboner/2841832. L1 cache reference ~1 ns, L2 ~4 ns, main memory ~100 ns. The canonical source for memory-hierarchy latency ballparks.
Epic Games. "Mass Entity in Unreal Engine." Unreal Engine 5 Documentation. dev.epicgames.com. Archetype-based ECS framework for crowd simulation, chunk-allocated at cache-line-aligned boundaries.
Louis Cox, Benjamin Williams, Jay Vickers, Davin Ward, Christopher Headleand. "Run-time Performance Comparison of Sparse-set and Archetype Entity-Component Systems." CGVC 2025. diglib.eg.org. Academic benchmark comparing sparse-set vs archetype ECS. Confirms archetypes excel at iteration; sparse sets at composition changes.