Skip to content

Releases: meilisearch/heed

v0.20.5 🛁

18 Aug 17:25
4a01acb
Compare
Choose a tag to compare

heed

What's Changed

  • fix function docs (clippy warnings) by @antonilol in #273
  • fix custom_key_cmp_wrapper being able to unwind to C code (ub) by @antonilol in #275

New Contributors

v0.20.4 🛁

12 Aug 17:43
52be23f
Compare
Choose a tag to compare

heed

What's Changed

New Contributors

v0.20.3 🛁

03 Jul 08:06
b9b2385
Compare
Choose a tag to compare

heed

What's Changed

v0.20.2 🛁

31 May 09:40
769c757
Compare
Choose a tag to compare

heed

What's Changed

  • Introduce the longer-keys feature which sets -DMDB_MAXKEYSIZE=0 by @tpunder in #263
  • Bump the internal LMDB version to v0.9.33 by @Kerollmops in #264

New Contributors

v0.20.1 🛁

16 May 15:37
2c90779
Compare
Choose a tag to compare

heed

What's Changed

New Contributors

v0.20.0 🛁

26 Apr 10:59
aa7bf8a
Compare
Choose a tag to compare

heed

Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store that has been battle-tested for a long time.

This release is about more than 80 closed PRs and issues ✨. All of them to improve the safety of the library and the usage simplicity. We now even have a great cookbook to help library users do tricky or advanced operations with Heed and LMDB.

Fix a Soundness Issue with the sync-read-txn Feature

We removed the unsound sync-read-txn feature that was making the RoTxn: Sync when it mustn't as it is not safe. We replaced this feature with the read-txn-no-tls, which makes the RoTxn: Send, usable from different threads using a Mutex.

We apologize for this and are discussing with the RustSec advisory team how best to advise people not to use this unsound feature.

Opening an Environment is Now Unsafe

Thanks to @hinto-janai, opening an environment is now unsafe. There has been a lot of discussion about memory mapping and its safety, so I decided to follow the general mood around that. The EnvOpenOptions::open() is unsafe now, and we have a safety paragraph explaining why.

Support a Lot More LMDB Features

We exposed nearly every LMDB features: DUPSORT, INTERGER_KEY/DUP, REVERSE_KEY/DUP... You'll also be able to iterate over duplicate items or skip them. That's your choice.

You'll also be able to use the new Database::put_with_flags and <Iterator>::put_current_with_flags methods that support the NO_DUP_DATA, NO_OVERWRITE, APPEND, and APPEND_DUP flags. Allowing you to append data faster on keys or duplicate data.

We improved the iterators' put_current and put_current_reserved methods to accept flags and the codec to serialize the data. It is easier to do custom encoding operations on databases.

Support Custom Key Comparison Function

Thanks to @xiaoyawei, you can use the LMDB key comparison custom functions and not only rely on the default lexicographic comparison. You can read more about this key-value feature in the LMDB source code.

use std::cmp::Ordering;
use heed_traits::Comparator;

enum StringAsIntCmp {}

impl Comparator for StringAsIntCmp {
    fn compare(a: &[u8], b: &[u8]) -> Ordering {
        let a: i32 = str::from_utf8(a).unwrap().parse().unwrap();
        let b: i32 = str::from_utf8(b).unwrap().parse().unwrap();
        a.cmp(&b)
    }
}

let mut wtxn = env.write_txn()?;
let db = env.database_options().types::<Str, Unit>().key_comparator::<StringAsIntCmp>().create(&mut wtxn)?;

db.put(&mut wtxn, "-1000", &())?;
db.put(&mut wtxn, "-100", &())?;
db.put(&mut wtxn, "100", &())?;

let mut iter = db.iter(&wtxn)?;
assert_eq!(iter.next().transpose()?, Some(("-1000", ())));
assert_eq!(iter.next().transpose()?, Some(("-100", ())));
assert_eq!(iter.next().transpose()?, Some(("100", ())));
assert_eq!(iter.next().transpose()?, None);

Simplify our Internal Processes

We now have our own update-to-date lmdb-master-sys crate. It represents the bindgen-generated bindings to the LMDB library, and heed is directly plugged into it.

It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.

Now we can make all the changes in the heed repository to bump the LMDB version 🎉

Use it with Apple App's SandBoxed applications

Thanks to @GregoryConrad, we now have a posix-sem feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store) and possible speed improvements brought upon by the POSIX semaphores.

Simplify the Number-Typed Database

You can now declare a heed Database with a number as the key or the value straightforwardly. Just specify the endianness of it, and that's it.

use heed::byteorder::BE;
use heed::types::*;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;

let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;

wtxn.commit()?;

Know the Stats of your Database

@irevoire added some new Env methods to get the size of a database:

  • The Env::map_size returns the size of the original memory map.
  • The Env::real_disk_size returns the size on the disk as seen by the file system.
  • The Env::non_free_pages_size returns the size of the non-free pages of the current transaction.
  • @quake added the Env::resize unsafe method to resize the environment.
  • You can use the Database::stat method to get detailed information about the internal BTree pages of a database.

You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count() internally and directly ask LMDB about this count.

Reduce the Number of Copies to Write into your Database

Sometimes, it is possible to directly write into your database without first serializing your data into an intermediary buffer. For example, it can be true for many data structures like RoaringBitmaps.

use roaring::RoaringBitmap;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;

let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
//You can directly write the data in the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
    bitmap.serialize_into(reserved)
})?;

Better Error Handling and Debugging

Return Expressive Errors when Encoding and Decoding Data

The support for custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you could not understand why? It is no longer an issue as the BytesEncode/BytesDecode traits can return a BoxedError to keep the information and let you decide what to do with it.

Safer Environment Opening

We introduced the BadOpenOptions heed error when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.

Implement Debug for most Structs

A lot more types implement the Debug trait. It will be easier to embed an Env, a Database, or even an iterator in a struct that already implements Debug.

Document Every Public Types

Thanks to @AureliaDolo and @darnuria, we have a much better documentation covering and added examples to nearly everything that could look complex. On the other hand, @wackbyte improved the general documentation and quality of the sentences.

Always use the Vendored Version of LMDB

The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.

Since the early days of heed, it has automatically linked to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!

It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no longer use an unknown version of LMDB.

Simplify Transaction Usage

Make it Possible to use Read-Only LMDB Environments

Thanks to @darnuria again, read-only transactions sometimes need to commit to making databases globally usable in the program. We now have tests to open and commit databases in read-only environments. However, this change is subtle. We must commit to making a just-opened database global and not just local.

let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here if the database is alive.

This detail raised an issue in heed. It is currently not safe to use a Database. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.

In this release, the RwTxn::abort method no longer returns a heed::Result as LMDB can't fail. It was introduced when we were supporting MDBX.

Merge Two Lifetimes

We simplified the signature of the RoTxn and RwTxn types by removing one lifetime and only keeping a single one. The new signature only has a single 'p lifetime, the environment lifetime, or the parent transaction. The simplification ...

Read more

v0.20.0-alpha.6 🛁

07 Nov 09:29
8bfdf3b
Compare
Choose a tag to compare
v0.20.0-alpha.6 🛁 Pre-release
Pre-release

An even safer LMDB wrapper

Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.

Fix a soundness issue with the sync-read-txn feature

We removed the unsound sync-read-txn feature that was making the RoTxn: Sync when it mustn't as it is not safe. We replaced this feature with the read-txn-no-tls, which makes the RoTxn: Send, usable from different threads using a Mutex.

We apologize for this and are discussing with the RustSec advisory team how best to advise people not to use this unsound feature.

Support a lot more LMDB features

We exposed nearly every LMDB features: DUPSORT, INTERGER_KEY/DUP, REVERSE_KEY/DUP... You'll also be able to iterate over duplicate items or skip them. That's your choice.

You'll also be able to use the new Database::put_with_flags and <Iterator>::put_current_with_flags methods that support the NO_DUP_DATA, NO_OVERWRITE, APPEND, and APPEND_DUP flags. Allowing you to append data faster on keys or duplicate data.

Support customized key compare function

Thanks to @xiaoyawei, you can use the LMDB key comparison custom functions and not only rely on the default lexicographic comparison. You can read more about this key-value feature in the LMDB source code.

use std::cmp::Ordering;
use heed_traits::Comparator;

enum StringAsIntCmp {}

impl Comparator for StringAsIntCmp {
    fn compare(a: &[u8], b: &[u8]) -> Ordering {
        let a: i32 = str::from_utf8(a).unwrap().parse().unwrap();
        let b: i32 = str::from_utf8(b).unwrap().parse().unwrap();
        a.cmp(&b)
    }
}

let mut wtxn = env.write_txn()?;
let db = env.database_options().types::<Str, Unit>().key_comparator::<StringAsIntCmp>().create(&mut wtxn)?;

db.put(&mut wtxn, "-1000", &())?;
db.put(&mut wtxn, "-100", &())?;
db.put(&mut wtxn, "100", &())?;

let mut iter = db.iter(&wtxn)?;
assert_eq!(iter.next().transpose()?, Some(("-1000", ())));
assert_eq!(iter.next().transpose()?, Some(("-100", ())));
assert_eq!(iter.next().transpose()?, Some(("100", ())));
assert_eq!(iter.next().transpose()?, None);

Simplify our internal processes

We now have our own update-to-date lmdb-master-sys crate. It represents the bindgen-generated bindings to the LMDB library, and heed is directly plugged into it.

It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.

Now we can make all the changes in the heed repository to bump the LMDB version 🎉

Use it with Apple App's SandBoxed applications

Thanks to @GregoryConrad, we now have a posix-sem feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store) and possible speed improvements brought upon by the POSIX semaphores.

Simplify the number-typed database

You can now declare a heed Database with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.

use heed::byteorder::BE;
use heed::types::*;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;

let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;

wtxn.commit()?;

Know the size of your database

@irevoire added some new Env methods to get the size of a database:

  • The Env::map_size returns the size of the original memory map.
  • The Env::real_disk_size returns the size on the disk as seen by the file system.
  • The Env::non_free_pages_size returns the size of the non-free pages of the current transaction.
  • @quake added the Env::resize unsafe method to resize the environment.

You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count() internally and directly ask LMDB about this count.

Reduce the number of copies to write into your database

Sometimes, it is possible to directly write into your database without first serializing your data into an intermediary buffer. For example, it can be true for many data structures like RoaringBitmaps.

use roaring::RoaringBitmap;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;

let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
    bitmap.serialize_into(reserved)
})?;

Replace zerocopy with the more popular bytemuck

The new version of heed now uses bytemuck to replace zerocopy. The bytemuck library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails. It would simplify some codecs.

Better error handling and debugging

Return expressive errors when en/decoding data

Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you could not understand why? It is no longer an issue as the BytesEncode/BytesDecode trait can return a BoxedError that can be displayed.

Safer environment opening

We introduced the BadOpenOptions heed error when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.

Implement Debug for most structs

A lot more types implement the Debug trait. It will be easier to embed an Env, a Database, or even an iterator in a struct that already implements Debug.

Document everything public

Thanks to @AureliaDolo and @darnuria, we have a much better documentation covering and added examples to nearly everything that could look complex.

Always use the vendored version of LMDB

The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.

Since the early days of heed, it would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!

It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no longer use an unknown version of LMDB.

Simplify transactions usage

Make it possible to use read-only LMDB environments

Thanks to @darnuria again, read-only transactions sometimes need to commit to making databases globally usable in the program. We now have tests to ensure we can open and commit databases in read-only environments. However, this change is subtle. We must commit to making a just-opened database global and not just local.

let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.

This detail raised an issue in heed. It is currently not safe to use a Database. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.

In this release, the RwTxn::abort method no longer returns a heed::Result as LMDB can't fail. It was introduced when we were supporting MDBX.

Merge two lifetimes

We simplified the signature of the RoTxn and RwTxn types by removing one lifetime and only keeping a single one. The new signature only has a single 'p lifetime, the environment lifetime, or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.

// Previous signature
struct RwTxn<'env, 'parent, T = ()>;

// New signature
struct RwTxn<'p>;

Replace the generic parameter with a runtime check

We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq! to be sure that transactions and environments weren't mixed.

We no longer use nested transactions when opening databases

The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify intern...

Read more

v0.20.0-alpha.4 🛁

23 Aug 13:02
02030e3
Compare
Choose a tag to compare
v0.20.0-alpha.4 🛁 Pre-release
Pre-release

An even safer LMDB wrapper

Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.

Fix a soundness issue with the sync-read-txn feature

We removed the unsound sync-read-txn feature that was making the RoTxn: Sync when it mustn't as it is not safe. We replaced this feature with the read-txn-no-tls, which makes the RoTxn: Send, usable from different threads using a Mutex.

We apologize for this and are discussing with the RustSec advisory team how best to advise people not to use this unsound feature.

Support a lot more LMDB features

We exposed nearly every LMDB features: DUPSORT, INTERGER_KEY/DUP, REVERSE_KEY/DUP... You'll also be able to iterate over duplicate items or skip them. That's your choice.

You'll also be able to use the new Database::put_with_flags and <Iterator>::put_current_with_flags methods that support the NO_DUP_DATA, NO_OVERWRITE, APPEND, and APPEND_DUP flags. Allowing you to append data faster on keys or duplicate data.

Simplify our internal processes

We now have our own update-to-date lmdb-master-sys crate. It represents the bindgen-generated bindings to the LMDB library, and heed is directly plugged into it.

It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.

Now we can make all the changes in the heed repository to bump the LMDB version 🎉

Use it with Apple App's SandBoxed applications

Thanks to @GregoryConrad, we now have a posix-sem feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store) and possible speed improvements brought upon by the POSIX semaphores.

Simplify the number-typed database

You can now declare a heed Database with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.

use heed::byteorder::BE;
use heed::types::*;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;

let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;

wtxn.commit()?;

Know the size of your database

@irevoire added some new Env methods to get the size of a database:

  • The Env::map_size returns the size of the original memory map.
  • The Env::real_disk_size returns the size on the disk as seen by the file system.
  • The Env::non_free_pages_size returns the size of the non-free pages of the current transaction.
  • @quake added the Env::resize unsafe method to resize the environment.

You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count() internally and directly ask LMDB about this count.

Reduce the number of copies to write into your database

Sometimes, it is possible to directly write into your database without first serializing your data into an intermediary buffer. For example, it can be true for many data structures like RoaringBitmaps.

use roaring::RoaringBitmap;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;

let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
    bitmap.serialize_into(reserved)
})?;

Replace zerocopy with the more popular bytemuck

The new version of heed now uses bytemuck to replace zerocopy. The bytemuck library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails. It would simplify some codecs.

Better error handling and debugging

Return expressive errors when en/decoding data

Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you could not understand why? It is no longer an issue as the BytesEncode/BytesDecode trait can return a BoxedError that can be displayed.

Safer environment opening

We introduced the BadOpenOptions heed error when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.

Implement Debug for most structs

A lot more types implement the Debug trait. It will be easier to embed an Env, a Database, or even an iterator in a struct that already implements Debug.

Document everything public

Thanks to @AureliaDolo and @darnuria, we have a much better documentation covering and added examples to nearly everything that could look complex.

Always use the vendored version of LMDB

The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.

Since the early days of heed, it would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!

It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no longer use an unknown version of LMDB.

Simplify transactions usage

Make it possible to use read-only LMDB environments

Thanks to @darnuria again, read-only transactions sometimes need to commit to making databases globally usable in the program. We now have tests to ensure we can open and commit databases in read-only environments. However, this change is subtle. We must commit to making a just-opened database global and not just local.

let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.

This detail raised an issue in heed. It is currently not safe to use a Database. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.

In this release, the RwTxn::abort method no longer returns a heed::Result as LMDB can't fail. It was introduced when we were supporting MDBX.

Merge two lifetimes

We simplified the signature of the RoTxn and RwTxn types by removing one lifetime and only keeping a single one. The new signature only has a single 'p lifetime, the environment lifetime, or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.

// Previous signature
struct RwTxn<'env, 'parent, T = ()>;

// New signature
struct RwTxn<'p>;

Replace the generic parameter with a runtime check

We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq! to be sure that transactions and environments weren't mixed.

We no longer use nested transactions when opening databases

The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify internal methods. Unfortunately, LMDB has some limitations: using nested transactions with the MDB_WRITEMAP option is impossible.

It is now possible to use LMDB with MDB_WRITEMAP and open databases freely 😊

v0.20.0-alpha.0 🛁

11 Jan 16:31
50f4b89
Compare
Choose a tag to compare
v0.20.0-alpha.0 🛁 Pre-release
Pre-release

An even safer LMDB wrapper

Heed is a fully typed LMDB wrapper with minimum overhead. It is also the most maintained Rust wrapper on top of LMDB and is used by meilisearch/meilisearch. LMDB is a memory-mapped key-value store battle-tested for a long time.

Simplify our internal processes

We now have our own update-to-date lmdb-master-sys crate. It represents the bindgen-generated bindings to the LMDB library, and heed is directly plugged into it.

It will be easier for Meilisearch to bump the engine's LMDB version now. We previously used a fork of the outdated lmdb-rkv-sys crate of Mozilla, but it was cumbersome to bump three repositories, i.e., our fork, meilisearch/lmdb-rs and finally heed.

Now we can make all the changes in the heed repository to bump the LMDB version 🎉

Use it with Apple App's SandBoxed applications

Thanks to @GregoryConrad, we now have a posix-sem feature. This change allows iOS and macOS build to comply with Apple's App Sandbox (necessary for distribution in the App Store), in addition to possible speed improvements brought upon by the POSIX semaphores.

Simplify the number-typed database

You will now be able to declare a heed Database with a number as the key or the value in a straightforward way. Just specify the endianness of it, and that's it.

use heed::byteorder::BE;
use heed::types::*;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db: Database<BEI64, Unit> = env.create_database(&mut wtxn, Some("big-endian-iter"))?;

let ret = db.put(&mut wtxn, &0, &())?;
let ret = db.put(&mut wtxn, &68, &())?;
let ret = db.put(&mut wtxn, &35, &())?;
let ret = db.put(&mut wtxn, &42, &())?;

wtxn.commit()?;

Know the size of your database

@irevoire added some new Env methods to get the size of a database:

  • The Env::map_size returns the size of the original memory map.
  • The Env::real_disk_size returns the size on the disk as seen by the file system.
  • The Env::non_free_pages_size returns the size of the non-free pages of the current transaction.

You'll also be able to get the number of entries in a database in a snap. We no longer .iter().count() internally and directly ask LMDB about this count.

Reduce the number of copies to write into your database

Sometimes it is possible to directly write into your database without first serializing your data into an intermediary buffer. It can be the case for many data-structure like RoaringBitmaps, for example.

use roaring::RoaringBitmap;

type BEI64 = I64<BE>;

let mut wtxn = env.write_txn()?;
let db = env.create_database::<BEI32, ByteSlice>(&mut wtxn, Some("number-string"))?;

let bitmap = RoaringBitmap::from_iter([1, 2, 3, 4]);
// Instead of serializing the data into a buffer, as you know the length of it,
// you can directly write the data into the LMDB value reserved space.
db.put_reserved(&mut wtxn, &42, bitmap.serialize_size(), |reserved| {
    bitmap.serialize_into(reserved)
})?;

Replace zerocopy with the more popular bytemuck

The new version of heed now uses bytemuck to replace zerocopy. The bytemuck library seems much easier to contribute to; it seems much more popular than the former (710k downloads by month compared to 109k). It brings a better API, at least for heed, as it can return information on which kind of problem happens when a cast fails, it would simplify some codecs.

Better error handling and debugging

Return expressive errors when en/decoding data

Support custom encoding/decoding errors has been added. Weren't you frustrated when heed triggered an error in one of the encoding/decoding traits, and you were unable to understand the reason? It is no more an issue as the BytesEncode/BytesDecode trait can return a BoxedError that can be displayed.

Safer environment opening

We introduced the BadOpenOptions heed error for when a database is already opened in the same program, but you tried to open it with different options. This behavior will also be improved in v0.20.0 to simplify the usage of the lib and make it more correct towards LMDB behaviors around the map size.

Implement Debug for most structs

A lot more types implement the Debug trait. It will be easier to embed an Env or a Database in a struct that already implements Debug.

Always use the vendored version of LMDB

The principle of least astonishment applies to user interface and software design. It proposes that a system component should behave how most users expect it to behave. The behavior should not astonish or surprise users.

Since the early days of heed would automatically link to the already available libmdb library installed on the system. We saw a lot of strange issues, non-reproducible on our side, and later discovered that the system-LMDB of Arch Linux was used by heed instead of the vendored one!

It is no longer an issue as we removed this behavior for the build.rs. The vendored version is always used. We no more use an unknown version of LMDB.

Simplify transactions usage

Remove the commit and abort methods of the read-only transaction

The read-only transaction is immutable and therefore doesn't require the commit nor abort methods as it cannot make changes. We removed those two methods from the RoTxn type.

However, this change is subtle. Making opened (not created) databases global is no longer possible without committing the transaction and, therefore can only be possible with a write transaction. We must commit to making a just-opened database global and not just local.

let rtxn = env.read_txn()?;
let db = env.open_poly_database(&rtxn, Some("my-database"))?;
rtxn.commit()?;
// We can store and use `db` here as long as the database is alive.

This detail raised an issue in heed. It is currently not safe to use a Database. We must redefine how we open and create databases to make them safe. The new API should be released for v0.20.0.

Note that in this release, the RwTxn::abort method no more returns a heed::Result as LMDB can't fail.

Merge two lifetimes

We simplified the signature of the RoTxn and RwTxn types by removing one lifetime and only keeping a single one. The new signature only has a single 'p lifetime, either the environment lifetime or the parent transaction. The simplification was possible as the parent transaction must already live longer than the environment.

// Previous signature
struct RwTxn<'env, 'parent, T = ()>;

// New signature
struct RwTxn<'p>;

Replace the generic parameter with a runtime check

We also removed the types of transactions. Those types were first introduced to avoid using a transaction opened with one environment with another one. Unfortunately, as the T type was optional, it wasn't used much. We decided that a runtime check would be better and added a bunch of assert_eq! to be sure that transactions and environments weren't mixed.

Don't use a nested transaction when opening databases

The previous version of heed used nested transactions when opening or creating databases. The operation did it this way to simplify internal methods. Unfortunately, LMDB has some limitations: it is impossible to use nested transactions with the MDB_WRITEMAP option.

It is now possible to use LMDB with MDB_WRITEMAP and open databases freely 😊

An LMDB bump and some unsafe methods

28 Jun 14:41
6c0b957
Compare
Choose a tag to compare

In this PR we are:

  • breaking the library by really specifying what is unsafe, the del_current, put_current, and append iterator methods are unsafe now and it because you must know what you are doing when you use them: you must not retain any reference from inside the database when calling them. You can read more on the pull request.
  • bumping LMDb to the latest version (v0.9.70) by using a fork of Mozilla's repository. However, we will not be able to publish this in crates.io for now.