Skip to content

Commit

Permalink
Update the docs a bit
Browse files Browse the repository at this point in the history
  • Loading branch information
halgari committed May 9, 2024
1 parent 2aafe9e commit a657f4b
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 101 deletions.
97 changes: 0 additions & 97 deletions docs/BenchmarkLog.md

This file was deleted.

10 changes: 6 additions & 4 deletions docs/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ accidentally instantiated).
```csharp
public static class File
{
public static readonly Attribute<ulong> Hash = new("Test.Model.File/Hash", isIndexed: true);
public static readonly Attribute<ulong> Size = new("Test.Model.File/Size");
public static readonly Attribute<string> Name = new("Test.Model.File/Name", noHistory: true);
public static readonly Attribute<EntityId> ModId = new"Test.Model.File/ModId", cardinality: Cardinality.Many);
private const string Namespace = "Test.Model.File";

public static readonly ScalarAttribute<ulong> Hash = new(Namespace, nameof(Hash), isIndexed: true);
public static readonly ScalarAttribute<ulong> Size = new(Namespace, nameof(Size));
public static readonly ScalarAttribute<string> Name = new(Namespace, nameof(Name), noHistory: true);
public static readonly ReferenceAttribute<EntityId> ModId = new(Namespace, nameof(ModId), cardinality: Cardinality.Many);

public class Model(ITransaction tx) : AEntity(tx)
{
Expand Down
111 changes: 111 additions & 0 deletions docs/ValueFormat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
hide:
- toc
---

## Value Format

Originally, MnemonicDB was developed using out-of-band value type serialization. That is to say, each attribute had a C# name
attached to it, that would be used at read time to determine the format of the value. This method was simple, but had several
side effects. For one, it was impossible to read the data without having access to that C# class. Since RocksDB performs
some value comparisons at startup after a crash, this resulted in an unreadable database. The RocksDB couldn't start because
it needed a comparator, but the comparator couldn't start until RocksDB had properly started and the comparator could read
all the possible value types. Thus the internals of MnemonicDB were rewritten to use a more standard format.

### Value Format
Every value in the system is prefixed by a one-byte type identifier. This identifier determines the size, value type, and
serialization format. As of the time of this writing we are only using about 16 of the possible 256 values, so there is
plenty of room for expansion. These formats are stored in the `ValueTags` enum, and currently support the following values:

```csharp
public enum ValueTags : byte
{
/// <summary>
/// Null value, no data
/// </summary>
Null = 0,
/// <summary>
/// Unsigned 8-bit integer
/// </summary>
UInt8 = 1,
/// <summary>
/// Unsigned 16-bit integer
/// </summary>
UInt16 = 2,
/// <summary>
/// Unsigned 32-bit integer
/// </summary>
UInt32 = 3,
/// <summary>
/// Unsigned 64-bit integer
/// </summary>
UInt64 = 4,
/// <summary>
/// Unsigned 128-bit integer
/// </summary>
UInt128 = 5,
/// <summary>
/// Unsigned 16-bit integer
/// </summary>
Int16 = 6,
/// <summary>
/// Unsigned 32-bit integer
/// </summary>
Int32 = 7,
/// <summary>
/// Unsigned 64-bit integer
/// </summary>
Int64 = 8,
/// <summary>
/// Unsigned 128-bit integer
/// </summary>
Int128 = 9,
/// <summary>
/// 32-bit floating point number
/// </summary>
Float32 = 10,
/// <summary>
/// 64-bit floating point number (double)
/// </summary>
Float64 = 11,
/// <summary>
/// ASCII string, case-sensitive
/// </summary>
Ascii = 12,
/// <summary>
/// UTF-8 string, case-sensitive
/// </summary>
Utf8 = 13,
/// <summary>
/// UTF-8 string, case-insensitive
/// </summary>
Utf8Insensitive = 14,
/// <summary>
/// Inline binary data
/// </summary>
Blob = 15,

/// <summary>
/// A blob sorted by its xxHash64 hash, and where the data is possibly stored in a separate location
/// as to degrade the performance of the key storage
/// </summary>
HashedBlob = 16,

/// <summary>
/// A reference to another entity
/// </summary>
Reference = 17,
}
```

Many of these values have a fixed size and are self-describing. Since the format is so simple, we can "serialize" data
such as integers by doing a simple pointer dereference. For other more complex values like strings, we must run them
through a text encoder/decoder. None of the variable sized values have an encoded length, this is because RocksDB tracks
value sizes, so it can be assumed that every key is a 16 byte header, followed by a ValueTag, followed by the value, with
the value taking up the remainder of the key.

### Comparator Simplicity
Since this value/key format is so simple it's possible in a few hundred lines of code to write a comparator for RocksDB
to sort this data. In MnemonicDB the comparator is a completely static method without any virtual dispatch in the main-line
code. In the future it would be fairly simple to move the comparator code into a C++ DLL to further squeeze up some performance,
but this is considered a low priority at the moment.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,6 @@ theme:
nav:
- Home: index.md
- Index Format: IndexFormat.md
- Value Format: ValueFormat.md
- Schema Changes: SchemaChanges.md

0 comments on commit a657f4b

Please sign in to comment.