Flatdata supports the following primitive types:
bool
- boolean data typei8
- signed 8-bit wide typeu8
- unsigned 8-bit wide typei16
- signed 16-bit wide typeu16
- unsigned 16-bit wide typei32
- signed 32-bit wide typeu32
- unsigned 32-bit wide typei64
- signed 64-bit wide typeu64
- unsigned 64-bit wide type
Flatdata supports defining constants of basic types:
const <type> <name> = <value>;
Flatdata supports adding enumeration over basic types. Each enumeration value can either automatically be assigned a value (previous value +1, starting with 0), or manually.
Each enumeration is defined as follows:
enum <name> : <type> [ : bits ] {
<value name> [= value],
...
}
Flatdata will auto-generate names for all missing values as UNKNOWN_VALUE_{X}
or UNKNOWN_VALUE_MINUS_{X}
, e.g. if no value for 5
is specified, and 5
is possible to represent in the specified number of bits the generator will generate UNKNOWN_VALUE_5 = 5
. The main reason for this behaviour is, that reading from files is inherently untrustworthy: While the value is not mentioned in the schema, nothing prevents a malicious entity from writing it.
The following restrictions for values are checked:
- No duplicate values
- Values must fit into the underlying type
- Most possible values should be listed/named (>=50% +- 256), e.g. a u16 should have at least 2^16 values
Flatdata structure definition syntax resembles known alternatives, albeit with notable differences:
- Backward compatibility: flatdata format does not have backward compatibility support built in. It is not meant to be used directly in communication protocols, there are libraries which are well-known and well-suited for that purpose. Flatdata is a create-once read-extensively storage library.
- Bit fields: Unlike with some other formats, bitfields are supported natively in a platform- independent fashion.
Each structure is defined as follows:
struct <Name> {
<field> : <type> : <width>;
...
}
<type>
can either be a basic type, or an enumeration.
Example:
struct Structure {
field1 : u32 : 29;
field2 : u8 : 2;
}
Every structure field specifies target language type to represent the field and its target size in bits. Flatdata takes care of packing and aligning the structures correctly as well as accessing them efficiently.
Flatdata archive is the entry point to the data. Archives are the smallest data units which can be opened or written to disk. An archive's schema is saved along with the data and is checked when the archive is opened. If schema does not match expectations, archive can not be opened.
Archives are defined as follows:
archive <name> {
<resource> : <type>;
...
}
For example:
archive ExampleArchive {
single_structure : StructureType;
vector_of_stuff : vector< StructureType >;
what_an_archive_without_a_map : multivector< 40, StructureA, StructureB, StructureC >;
strings_forever : raw_data;
lets_get_some_structure : archive OtherArchive;
}
Archive resources can be one of following types:
T
- a single structure of given typevector< T >
- a vector of structures of a given type.multivector< IndexSize, T1, T2, ... >
- a heterogenuous associative container for storing multiple properties for a single entity. Allows efficient storage of the data whose properties are sparsily assigned to each item. Think of it as a multimap of variants.IndexSize
is the number of bits used for indexing the entities. An index is addressing the start of the offset of a variant in the data.raw_data
- Uninterpreted raw data. Useful for storing arrays of non-numeric data like strings referenced from structures.archive ArchiveName
- Archive resource. Archive resources allow to structure large archives better, while also acting as a namespace and grouping optionality semantics. Referenced archive type has to be defined.
Flatdata schema supports C++-style comments. Comments located before structures/archives or their members will be available in generated code. Example:
/// A single secret. Might be important
struct Secret { importance : u64 : 64; }
/**
* Very important archive
*/
archive TheBookOfSecrets {
// More important secret
secret1 : Secret;
// Less important secret
secret2 : Secret;
}
Decorations declare additional properties of entities they are applied
to. Decorations supported at the moment are described below. Note that
not all target languages provide full support for all decorations. For
example, dot
generator uses decorations to group archive resources
and create reference edges, while other generators mostly support only
@optional
.
Nonetheless, decorations are first-class citizens of schema and thus are validated as well during archive opening.
@const( <name> )
can be added to fields of a structure to indicate
in which locations a constant can appear, e.g.:
const u32 MY_CONST = 10;
struct MyStruct {
@const( MY_CONST )
my_value : u32 : 16;
}
Note: If a constant is not referenced anywhere, flatdata will assume that it is a global constant, and include it into the schema of every resource of every archive.
@optional
can be applied to resources. If resource is optional and
missing, archive can still be opened successfully. Resource of any type
can be optional. Example:
archive Archive {
@optional
resource: vector< SomeStructure >;
}
@optional( <name> )
can be added to a field to mark a special constant
value of the field as a sentinel value, making the whole field optional.
This special value is considered the none value of the field. Many language backends will use native optional data structures for such fields instead of the underlying integer type.
@explicit_reference
declares an explicit reference of one resource's
property to another resource. This is a very common type of referencing
in flatdata and can be seen as a "Foreign Key", with the exception that
consistency of the key is not enforced.
It is possible to define explicit reference with its target in a different archive, as long as it is defined.
Example:
struct Person {
name : u64 : 64;
first_child : u64 : 64;
}
archive Archive {
@explicit_reference( Person.name, names )
@explicit_reference( Person.first_child, children )
people: vector< Person >
children: vector< Child >
names: raw_data
}
Sometimes it is useful to split structures' fields into multiple
resources (for example, to promote data locality in case binary search
is done extensively on a particular field). @bound_implicitly
declares that such resources are grouped implicitly and therefore
represent a single entity. The decoration also gives entity a name
@bound_implicitly( transactions: keys, transaction_data )
archive Archive {
keys: vector< Key >
transaction_data : vector< Transaction >
}
Resources and decorations can reference other entities declared in the schema. Types can be specified either with fully-qualified path or with local path, for example:
namespace N {
struct T {
...
}
archive Archive {
// Local path
resource: vector< T >
// Fully-qualified path
another_resource: vector< .N.T >
}
}
Local paths must be available in the current namespace.
When flattening a data model into a flatdata schema one often encounters a pattern of storing index ranges as members of consecutive vector items:
struct Node {
...
first_edge_ref : u32;
}
struct Edge {
...
}
archive Archive {
// contains sentinel
@explicit_reference( Nodes.first_edge_ref, edges )
nodes : vector< Nodes >;
edges : vector< Edges >;
}
In this case the edges of a node i
are then retrieved as
edges.slice(nodes[i].first_edge_ref..nodes[i + 1].first_edge_ref)
Additionally the last element of the nodes
vector is usually a sentinel (only used to retrieve first_edge_index
).
To simplify this flatdata offers the @range(name_of_range_attribute)
annotation:
struct Node {
...
@range(edges_range)
first_edge_ref : u32;
}
This will have two effects:
- Adding
edges_range
attribute exposing range(nodes[i].first_edge_ref, nodes[i + 1].first_edge_ref)
- Hiding the sentinel in views (it still needs to be populated first, though)
Retrieving all edges is now as easy as this:
edges.slice(nodes[i].edges_range)