C++ — Data-Oriented Design (ECS & SoA)
Data-oriented design (DoD) restructures code around data transformations rather than object hierarchies, matching hardware memory access patterns. Related: C++ — Memory and Cache Performance, C++ — High Performance Data Processing.
The Core Problem with OOP on Hot Paths
OOP is fine for APIs, plugin interfaces, and code called a handful of times per frame. It becomes a bottleneck on tight loops touching millions of objects because it forces Array of Structs (AoS) layout.
Three specific costs:
| Problem | Cause | Effect |
|---|---|---|
| Pointer chasing | vector<unique_ptr<Entity>> — objects heap-scattered | Cache miss on every e->update() |
| Object bloat | Loading the full struct (vtable + all fields) when only 3 fields are needed | 68% of each cache line is waste |
| Virtual dispatch | Unpredictable vtable targets across 10–50 entity types | Branch mispredictions, pipeline stalls |
The hardware contract: CPUs fetch memory in 64-byte cache lines. If the 8 bytes you need are surrounded by 56 bytes of unrelated data, those 56 bytes still consume the cache line — and cost ~100 ns to load from RAM.
AoS vs SoA
Array of Structs (AoS) — the OOP default:
1
entities[]: [ {pos, vel, hp, name, vtbl}, {pos, vel, hp, name, vtbl}, ... ]
Each update loads the entire struct; only pos + vel + is_active are needed.
Struct of Arrays (SoA) — the DoD layout:
1
2
3
positions[]: [ p0, p1, p2, p3, ... ] ← every byte used
velocities[]: [ v0, v1, v2, v3, ... ] ← every byte used
active_flags[]: [ 1, 1, 0, 1, ... ] ← every byte used
Cache lines load only the data the system needs. The prefetcher sees linear access and preloads ahead. The compiler can auto-vectorise the inner loop.
Minimal ECS Implementation
Entity Component System (ECS) is SoA applied to game/simulation entities. Three concepts:
- Entity — just an integer ID (
uint32_t). No object, no vtable. - Components — plain structs with no inheritance, stored in flat arrays indexed by entity ID.
- Systems — free functions that iterate component arrays and apply transformations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// ecs.hpp
struct Position { float x = 0.f, y = 0.f, z = 0.f; };
struct Velocity { float x = 0.f, y = 0.f, z = 0.f; };
struct Health { int value = 100; };
struct ActiveFlag { bool is_active = true; };
enum ComponentID : uint8_t {
COMP_POSITION = 0, COMP_VELOCITY, COMP_HEALTH,
COMP_ACTIVE_FLAG, COMP_COUNT
};
using ComponentMask = std::bitset<COMP_COUNT>;
using Entity = uint32_t;
class ECSWorld {
public:
std::vector<ComponentMask> masks;
std::vector<Position> positions;
std::vector<Velocity> velocities;
std::vector<Health> healths;
std::vector<ActiveFlag> active_flags;
size_t entity_count = 0;
Entity create_entity() {
Entity id = static_cast<Entity>(entity_count++);
masks.emplace_back(); positions.emplace_back();
velocities.emplace_back(); healths.emplace_back();
active_flags.emplace_back();
return id;
}
// add_position, add_velocity, ... (trivial setters)
bool has_components(Entity e, ComponentMask required) const {
return (masks[e] & required) == required;
}
};
Movement system — no virtual calls, no pointer dereferences, trivially vectorisable:
1
2
3
4
5
6
7
8
9
10
11
12
13
void movement_system(ECSWorld& world, float dt) {
Position* __restrict pos = world.positions.data();
const Velocity* __restrict vel = world.velocities.data();
const ActiveFlag* __restrict act = world.active_flags.data();
for (size_t i = 0; i < world.entity_count; ++i) {
if (!world.has_components(i, required)) continue;
if (!act[i].is_active) continue;
pos[i].x += vel[i].x * dt;
pos[i].y += vel[i].y * dt;
pos[i].z += vel[i].z * dt;
}
}
__restrict tells the compiler the arrays don’t alias, enabling SIMD auto-vectorisation.
Benchmark Results
Workload: 1,000,000 entities, physics integration loop (position += velocity * dt), 100 frames. Identical algorithm, different data layout only.
Ryzen 7 5800X, GCC 13.2, -O2 -march=native:
| Approach | Avg frame (ms) | Relative |
|---|---|---|
| OOP (vtable, AoS) | 8.34 | 1.00× baseline |
| ECS (SoA) | 1.47 | 5.67× faster |
perf stat breakdown:
| Metric | OOP | ECS |
|---|---|---|
| Cache references | 18.2M | 4.1M |
| Cache misses | 5.4M (29.7%) | 0.12M (2.9%) |
| IPC | 0.71 | 3.12 |
IPC of 0.71 means the OOP CPU was idle ~70% of the time waiting for memory. ECS keeps all execution units fed.
Scaling behaviour: The gap widens with data size. At 10K entities everything fits in L3 and OOP is only 3.5× slower. At 10M entities (blowing past L3) ECS is 6× faster.
When to Apply DoD
1
2
3
4
5
6
7
8
Is this code on a hot path?
├── NO → use whatever is clearest (OOP is fine)
└── YES → how many items?
├── <1000 → probably doesn't matter
└── >1000 → profile it
cache misses > 5%?
├── NO → you're fine
└── YES → data-oriented redesign
Good candidates: per-frame simulation loops, particle systems, physics engines, HFT order book updates touching millions of quotes. Not worth it for: UI code, configuration, plugin interfaces, anything called < 1000× per frame.
C++26 Reflection (Future Direction)
The boilerplate cost of manual ECS (explicit component IDs, add_* functions, bitmasks) will shrink significantly with P2996 static reflection (targeting C++26). The idea: write an AoS struct definition, compiler generates SoA storage automatically.
1
2
3
// HYPOTHETICAL C++26
struct PhysicsBundle { Position position; Velocity velocity; ActiveFlag active; };
auto storage = make_soa_storage<PhysicsBundle>(1'000'000); // SoA under the hood
Libraries like EnTT and flecs already approximate this with template metaprogramming. See C++ — Modern Features Reference (C++20-23) for current C++20/23 compile-time tooling.
Related Pages
- C++ — Memory and Cache Performance — struct padding, false sharing, cache line alignment
- C++ — High Performance Data Processing — mmap I/O, flat hash tables, NUMA pinning
- C++ — ULL Developer Skillset — hot path skill checklist for HFT/ULL roles
- C++ — Modern Features Reference (C++20-23) — current compile-time and SIMD tooling