C++ — Data-Oriented Design (ECS & SoA)

C++ — Data-Oriented Design (ECS & SoA)

Data-oriented design (DoD) restructures code around data transformations rather than object hierarchies, matching hardware memory access patterns. Related: C++ — Memory and Cache Performance, C++ — High Performance Data Processing.


The Core Problem with OOP on Hot Paths

OOP is fine for APIs, plugin interfaces, and code called a handful of times per frame. It becomes a bottleneck on tight loops touching millions of objects because it forces Array of Structs (AoS) layout.

Three specific costs:

ProblemCauseEffect
Pointer chasingvector<unique_ptr<Entity>> — objects heap-scatteredCache miss on every e->update()
Object bloatLoading the full struct (vtable + all fields) when only 3 fields are needed68% of each cache line is waste
Virtual dispatchUnpredictable vtable targets across 10–50 entity typesBranch mispredictions, pipeline stalls

The hardware contract: CPUs fetch memory in 64-byte cache lines. If the 8 bytes you need are surrounded by 56 bytes of unrelated data, those 56 bytes still consume the cache line — and cost ~100 ns to load from RAM.


AoS vs SoA

Array of Structs (AoS) — the OOP default:

1
entities[]: [ {pos, vel, hp, name, vtbl}, {pos, vel, hp, name, vtbl}, ... ]

Each update loads the entire struct; only pos + vel + is_active are needed.

Struct of Arrays (SoA) — the DoD layout:

1
2
3
positions[]:    [ p0, p1, p2, p3, ... ]   ← every byte used
velocities[]:   [ v0, v1, v2, v3, ... ]   ← every byte used
active_flags[]: [  1,  1,  0,  1, ... ]   ← every byte used

Cache lines load only the data the system needs. The prefetcher sees linear access and preloads ahead. The compiler can auto-vectorise the inner loop.


Minimal ECS Implementation

Entity Component System (ECS) is SoA applied to game/simulation entities. Three concepts:

  • Entity — just an integer ID (uint32_t). No object, no vtable.
  • Components — plain structs with no inheritance, stored in flat arrays indexed by entity ID.
  • Systems — free functions that iterate component arrays and apply transformations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// ecs.hpp
struct Position    { float x = 0.f, y = 0.f, z = 0.f; };
struct Velocity    { float x = 0.f, y = 0.f, z = 0.f; };
struct Health      { int value = 100; };
struct ActiveFlag  { bool is_active = true; };

enum ComponentID : uint8_t {
    COMP_POSITION = 0, COMP_VELOCITY, COMP_HEALTH,
    COMP_ACTIVE_FLAG, COMP_COUNT
};
using ComponentMask = std::bitset<COMP_COUNT>;
using Entity = uint32_t;

class ECSWorld {
public:
    std::vector<ComponentMask> masks;
    std::vector<Position>      positions;
    std::vector<Velocity>      velocities;
    std::vector<Health>        healths;
    std::vector<ActiveFlag>    active_flags;
    size_t entity_count = 0;

    Entity create_entity() {
        Entity id = static_cast<Entity>(entity_count++);
        masks.emplace_back(); positions.emplace_back();
        velocities.emplace_back(); healths.emplace_back();
        active_flags.emplace_back();
        return id;
    }
    // add_position, add_velocity, ... (trivial setters)
    bool has_components(Entity e, ComponentMask required) const {
        return (masks[e] & required) == required;
    }
};

Movement system — no virtual calls, no pointer dereferences, trivially vectorisable:

1
2
3
4
5
6
7
8
9
10
11
12
13
void movement_system(ECSWorld& world, float dt) {
    Position*       __restrict pos = world.positions.data();
    const Velocity* __restrict vel = world.velocities.data();
    const ActiveFlag* __restrict act = world.active_flags.data();

    for (size_t i = 0; i < world.entity_count; ++i) {
        if (!world.has_components(i, required)) continue;
        if (!act[i].is_active) continue;
        pos[i].x += vel[i].x * dt;
        pos[i].y += vel[i].y * dt;
        pos[i].z += vel[i].z * dt;
    }
}

__restrict tells the compiler the arrays don’t alias, enabling SIMD auto-vectorisation.


Benchmark Results

Workload: 1,000,000 entities, physics integration loop (position += velocity * dt), 100 frames. Identical algorithm, different data layout only.

Ryzen 7 5800X, GCC 13.2, -O2 -march=native:

ApproachAvg frame (ms)Relative
OOP (vtable, AoS)8.341.00× baseline
ECS (SoA)1.475.67× faster

perf stat breakdown:

MetricOOPECS
Cache references18.2M4.1M
Cache misses5.4M (29.7%)0.12M (2.9%)
IPC0.713.12

IPC of 0.71 means the OOP CPU was idle ~70% of the time waiting for memory. ECS keeps all execution units fed.

Scaling behaviour: The gap widens with data size. At 10K entities everything fits in L3 and OOP is only 3.5× slower. At 10M entities (blowing past L3) ECS is faster.


When to Apply DoD

1
2
3
4
5
6
7
8
Is this code on a hot path?
├── NO  → use whatever is clearest (OOP is fine)
└── YES → how many items?
          ├── <1000 → probably doesn't matter
          └── >1000 → profile it
                      cache misses > 5%?
                      ├── NO  → you're fine
                      └── YES → data-oriented redesign

Good candidates: per-frame simulation loops, particle systems, physics engines, HFT order book updates touching millions of quotes. Not worth it for: UI code, configuration, plugin interfaces, anything called < 1000× per frame.


C++26 Reflection (Future Direction)

The boilerplate cost of manual ECS (explicit component IDs, add_* functions, bitmasks) will shrink significantly with P2996 static reflection (targeting C++26). The idea: write an AoS struct definition, compiler generates SoA storage automatically.

1
2
3
// HYPOTHETICAL C++26
struct PhysicsBundle { Position position; Velocity velocity; ActiveFlag active; };
auto storage = make_soa_storage<PhysicsBundle>(1'000'000);  // SoA under the hood

Libraries like EnTT and flecs already approximate this with template metaprogramming. See C++ — Modern Features Reference (C++20-23) for current C++20/23 compile-time tooling.


Trending Tags