Skip to content

Commit

Permalink
intro page
Browse files Browse the repository at this point in the history
  • Loading branch information
marzer committed Jul 26, 2023
1 parent 0bdfb36 commit b8f76fa
Show file tree
Hide file tree
Showing 6 changed files with 143 additions and 15 deletions.
Binary file added docs/images/author.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/pages.css
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
background-color: rgba(255,255,255,0.05);
}

#intro_motivation table:first-of-type td:nth-child(n+11),
#intro_motivation table:first-of-type td:nth-child(-n+5)
#intro_motivation_typical table:first-of-type td:nth-child(n+11),
#intro_motivation_typical table:first-of-type td:nth-child(-n+5),
#intro_motivation_soa table:first-of-type td:nth-child(n+9),
#intro_motivation_soa table:first-of-type td:nth-child(-n+4)
{
background-color: rgba(255,255,255,0.05);
}
136 changes: 129 additions & 7 deletions docs/pages/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,20 @@

@section intro_tldr TL;DR

This article is an introduction to, and overview of, **soagen** - a new Structure-Of-Arrays generator
and library for C++. Skip to @ref intro_getting_started if you already know all about SoA and want to give it a try!
The leading section of this page is an overview of what [Structure-of-Arrays (SoA)](https://en.wikipedia.org/wiki/AoS_and_SoA)
is, what problems it has, and what problems it solves. Following that is an overview of **soagen** - a new Structure-Of-Arrays
generator and library for C++.

@inline_success Skip to @ref intro_introducing_soagen if you already know all about SoA and want to go
straight to learning about **soagen** instead.

<!-- --------------------------------------------------------------------------------------------------------------- -->

@section intro_motivation Motivation

@subsection intro_motivation_typical Typical data layouts (Array of Structures)
<!-- --------------------------------------------------------------------------------------------------------------- -->

@subsection intro_motivation_typical Typical data layouts (Arrays of Structures)

Data records in a typical C++ program will be organized into `structs` and/or `classes`, one per 'object' unit type,
and multiples of these will be stored in arrays. For example, a program for managing employee records
Expand All @@ -18,8 +26,8 @@ might contain something akin to this:
```cpp
struct employee
{
std::string name;
unsigned long long id;
std::string name;
std::tuple<uint16_t, uint8_t, uint8_t> dob;
int salary;
void* tag;
Expand All @@ -33,14 +41,14 @@ And elsewhere in the program you'd almost certainly find this:
std::vector<employee> employees;
```

This paradigm is broadly called [Array of Structures](https://en.wikipedia.org/wiki/AoS_and_SoA) \(AoS\).
This paradigm is broadly called [Array-of-Structures](https://en.wikipedia.org/wiki/AoS_and_SoA) \(AoS\).
Stored this way, the all employee members are laid out sequentially in memory, so the `employees`array would look like this:

@m_class{m-block m-success}

<table>
<tr><th colspan="5"> employees[0] <th colspan="5"> employees[1] <th colspan="5"> employees[2]
<tr><td> `name` <td> `id` <td> `dob` <td> `salary` <td> `tag` <td> `name` <td> `id` <td> `dob` <td> `salary` <td> `tag` <td> `name` <td> `id` <td> `dob` <td> `salary` <td> `tag`
<tr><td> `id` <td> `name` <td> `dob` <td> `salary` <td> `tag` <td> `id` <td> `name` <td> `dob` <td> `salary` <td> `tag` <td> `id` <td> `name` <td> `dob` <td> `salary` <td> `tag`
</table>

when iterating over the array depicted above, the CPU's cache will constantly be mostly (or entirely) filled by only a
Expand All @@ -61,8 +69,122 @@ that the benefit of restructuring anything to solve this one specific problem wo

Let's have a look at a situation where the benefit of restructing away from an explicit object layout is much greater.

@subsection intro_motivation_soa_layout Structure of Arrays
<!-- --------------------------------------------------------------------------------------------------------------- -->

@subsection intro_motivation_soa Structure-of-Arrays

Consider a game engine: game worlds are populated by entities, those entities have various characteristics (position,
orientation, mesh, id, name, et cetera). Imagine we had those encapsulated by an `entity` struct:

```cpp
struct entity
{
unsigned long long id;
std::string name;
vec3 pos;
mat3 rot;
// ... and so on
};
```

As before, lets have a look at what an array of these would look like in memory:

```cpp
std::vector<entity> entities;
```

<table>
<tr><th colspan="4"> entities[0] <th colspan="4"> entities[1] <th colspan="4"> entities[2]
<tr><td> `id` <td> `name` <td> `pos` <td> `rot` <td> `id` <td> `name` <td> `pos` <td> `rot` <td> `id` <td> `name` <td> `pos` <td> `rot`
</table>

Game worlds feature many thousands of entities. Rendering these can be expensive so great effort goes into ensuring
the engine does not render anything unnecessarily. One technique for eliminating entities from the render list is to
cull those that do not intersect with the camera's [view frustum](https://en.wikipedia.org/wiki/Viewing_frustum). This
necessitates iterating over all the entities in the game world (assisted by a bounding volume hierarchy or other
acceleration structure). Unlike the `employee` application above, taking constant cache-line hits during this iteration
by reading in entire `entity` structs is potentially devastating to our frame time!

Ideally we want to _only_ read in the
parts of the entities we care about (e.g. `entity::position`), without dragging anything else into the cache.

Enter: [Structure-of-Arrays](https://en.wikipedia.org/wiki/AoS_and_SoA).

If we restructured our entities to instead be represented as a series of parallel arrays, we could iterate over just
the positions:

```cpp
struct entities
{
std::vector<unsigned long long> ids;
std::vector<std::string> names;
std::vector<vec3> positions;
std::vector<mat3> rotations;
// ... and so on
};
```

By structuring our data this way, we've lost the explicit `entity` class for representing one single game world entity,
and have instead transitioned to an implicit object model where an entity is described by all the elements sharing the
same index in the parallel `std::vectors`. As well as being faster for single-element iteration, this also has another
neat side-effect: no structs means no padding between struct members, so lower memory usage!

There some obvious problems with this naive SoA implementation, though.

<!-- --------------------------------------------------------------------------------------------------------------- -->

@subsubsection intro_motivation_soa_multiple_allocation Problem #1: Multiple allocations

One heap allocation for each member array. Sure we could come up with a custom `allocator` that works from the same
internal buffer, but that's not for the faint of heart.

@inline_remark <i>No doubt there's some solution to this in `std::pmr`. I'll leave that as an exercise for the reader.</i>

@subsubsection intro_motivation_soa_manual_sync Problem #2: Manual Synchronization

Now that we have multiple arrays, we must ensure that they are all updated in unison. Forgetting to do this at any point
will mean the parallel members are no longer in sync and likely caus disastrous and hilarious effects.

@subsubsection intro_motivation_soa_iterators_weakly_typed Problem #3: Identities are weakly-typed

We no longer have the benefit of the strongly-typed `std::vector<entity>::iterator` (or even just the ability to take
a pointer, `entity*`), so any time we need to store an association between a specific entity and some other thing,
it needs to be done via an index (e.g. `size_t`). This is much more error-prone; it's all-too-easy easy to
accidentally treat an index into one collection as an index into another.

Using a 'strong type' library (like
[this one](https://github.com/rollbear/strong_type)) or using an `enum class` can help here, but that still means
you need `static_casts` (or some other conversion function) everywhere, which is _very annoying_.

@subsubsection intro_motivation_soa_struct_mode_hard Problem #4: AoS-style access is cumbersome

There will always be situations where you need to treat your SoA data as if it were AoS. Debug printing, for example.
Code that needs to work in AoS mode is now littered with `operator[]` invocations, making it uglier and more error prone.

@subsubsection intro_motivation_soa_names Problem #5: Elegance or Names; pick one

There are a number of solutions for the problems described above floating around the internet. All of them boil down to
one of two approaches:

1. Most/all of the problems above are solved using a nice template syntax and some specialization tricks, but you lose
the names of members and instead need to fall-back to the incredibly unfrendly std::tuple-like `entities.get<0>()`
(because there's no way to create named members in C++ programatically without any sort of reflection).
2. You get names, yay! But now your codebase is filled with macros, boo. Solutions based on this approach also typically
create every type in a fully bespoke manner so there's very little re-use - longer compile times, larger binaries.

<!-- --------------------------------------------------------------------------------------------------------------- -->

@section intro_introducing_soagen Introducing soagen

todo

<!-- --------------------------------------------------------------------------------------------------------------- -->

\[div class="m-block m-badge m-primary"\]
\[img src="author.jpg" alt="The Author"\]
\[h3\]About the author\[/h3\]
\[p\]
I'm <a href="https://github.com/marzer">Mark</a>. I write code. Some of it is alright. Almost all of it is C++.
You might know me as the <a href="https://github.com/marzer/tomlplusplus">toml++</a> guy.
\[/p\]
\[/div\]
6 changes: 5 additions & 1 deletion docs/poxy.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ changelog = true
#logo = 'images/logo.svg'
#favicon = 'images/favicon.ico'
theme = 'light'
extra_files = ['images/badge-gitter.svg', 'images/badge-sponsor.svg']
extra_files = [
'images/badge-gitter.svg',
'images/badge-sponsor.svg',
'images/author.jpg',
]
stylesheets = ['pages.css']

[warnings]
Expand Down
8 changes: 4 additions & 4 deletions examples/employees.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ namespace soagen::detail
/* id */ make_column<unsigned long long, param_type<unsigned long long>, 32>,
/* date_of_birth */ make_column<std::tuple<uint16_t, uint8_t, uint8_t>>,
/* salary */ make_column<int>,
/* tag */ make_column<int*>>;
/* tag */ make_column<void*>>;
};

template <>
Expand Down Expand Up @@ -205,7 +205,7 @@ namespace soagen::examples
/// unsigned long long id;
/// std::tuple<uint16_t, uint8_t, uint8_t> date_of_birth;
/// int salary;
/// int* tag;
/// void* tag;
/// };
/// @endcode
///
Expand Down Expand Up @@ -1095,14 +1095,14 @@ namespace soagen::examples

/// @brief Returns a pointer to the elements in column [4]: tag.
SOAGEN_ALIGNED_COLUMN(4)
constexpr int** tag() noexcept
constexpr void** tag() noexcept
{
return column<4>();
}

/// @brief Returns a pointer to the elements in column [4]: tag.
SOAGEN_ALIGNED_COLUMN(4)
constexpr std::add_const_t<int*>* tag() const noexcept
constexpr std::add_const_t<void*>* tag() const noexcept
{
return column<4>();
}
Expand Down
2 changes: 1 addition & 1 deletion examples/employees.natvis
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

<Intrinsic
Name="get_4"
Expression="reinterpret_cast&lt;int**&gt;(table_.alloc_.columns[4])"
Expression="reinterpret_cast&lt;void**&gt;(table_.alloc_.columns[4])"
/>

<DisplayString>{{ size={size()} }}</DisplayString>
Expand Down

0 comments on commit b8f76fa

Please sign in to comment.