RFC: Serialize and Deserialize #8185
jerrykingxyz
started this conversation in
RFC
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feature Name: Rspack support serialize and deserialize
Start Date: 2024/10/21
Summary
Design a serialization and deserialization implementation for rspack
Motivation
Persistent caching requires writing data from memory to storage media such as disk, which requires the data structures in rspack to support serialization and deserialization.
Glossary
Converter
: In this page, the term "converter" specifically refers to a series of Traits. By implementing these Traits, one can perform Serialize/Deserialize operations.Cacheable
: Data structures that can be Serialized/Deserialized are called cacheable.ArchivedStruct
: The intermediate structure generated by rkyv, which is between bytes and user structure, can be easily converted to bytes.StructResolver
: The object generated by rkyv to parse the user structure into the corresponding ArchivedStruct.User Guide
API
The entry points for serialize/deserialize are both pure functions, and their signatures like:
It should be noted that both of these will accept parameters named context, context can be accessed in serialize/deserialize , a typical example is
Arc<CompilerOption>
this global object, in the serialize operation can be ignored, in the deserialize can be directly cloned from the context.Data Cacheable
Automatic Implementation
To use the above API, we need to make the target struct cacheable. We can automatically implement the default converter by adding
#[cacheable]
to it.For unsupported data struct, you can specify to use a converter by adding
#[cacheable(with=...)]
.Use
AsInnerConverter
Trait for conversion, OnceCell, Arc, etc. have implementedAsInnerConverter
Using
AsMapConverter
Trait for conversion, HashMap, IndexMap, DashMap, etc. have implementedAsMapConverter
Using
AsRefStrConverter
Trait for conversion, Cow<'static, str>, Arc , etc. have implementedAsRefStrConverter
Using
AsStringConverter
Trait for conversion, PathBuf, etc. have implementedAsStringConverter
Use
AsVecConverter
Trait for conversion, HashSet, Vec, IndexSet, etc. have implementedAsVecConverter
Use
AsConverter
Trait for conversionConvert
Cow
Convert the cacheable data reference. Note that since the
Inline
does not implement the deserialize method, it cannot be used withfrom_bytes
.In theory, AsPreset needs to support all third-party crates which is used in rspack. Here are some examples of combinations:
Manual Implementation
In addition to using
#[cacheable]
for automatic implementation, you can also customize theSerialize
andDeserialize
.The simplest way to customize is to implement the converter-related Trait. Currently, AsInner, AsRefStr, AsString, etc. all provide relevant Converter Traits, which can be used in conjunction with
#[cacheable(with=...)]
.You can also use
#[cacheable(with=...)]
to replace the default converter with a specific converter.For complex data structures, they can be directly implemented using the underlying Serialize/Deserialize crate. For details, please refer to the implementation of AsPreset, AsRefStr, etc.
Macro API
#[cacheable]
supports the following configuration when used in struct and enum:#[cacheable(with=...)]
: Configure the default converter#[cacheable(hashable)]
: Make the generated ArchiveData structure support Hash, which can be used in structures such as HashMap & HashSet.#[cacheable(crate=...)]
: Configure the rspack_cacheable crate path used after macro expansion, default value is rspack_cacheable.#[cacheable(as=...)]
: Configure the struct that deserialize converts to, requiring its memory layout to match the current struct.#[Cacheable]
supports the following configuration when used in fields:#[cacheable(with=...)]
: Configure the converter separately for the current field#[cacheable(omit_bounds)]
: Omit the feature boundaries in the impl generated by the current field, generally used when there is a recursion structure in the current fieldTrait Cacheable
For
dyn Trait
, add#[cacheable_dyn]
when both defined trait and implement trait to make it cacheable, and of course, the corresponding struct also needs cacheable.Detailed design
There are many Serialize/Deserialize crates in Rust. Through third-party benchmark data, we temporarily choose rkyv. Thanks to the independent design of the API, if we need to switch the underlying crate later, it will not have a big impact on the API.
Rkyv
A brief introduction to the implementation of Rkyv helps to understand the next implementation process. There are some core trait in Rkyv:
Serialize
: Convert user Struct to StructResolver, it can return an Error.Resolve
: Use the StructResolver return from Serialize to generate ArchivedStruct.Deserialize
: Convert ArchivedStruct to Struct.Their relationship is shown in the following, and Rkyv will implement the conversion between ArchivedStruct and Bytes.
Here is a simple example to show the code after expanding rkyv:
API
Context
There are Sharing and Pooling in Rkyv, which can be used for pointer sharing in the Serialize and Deserialize. With this, the Context function can be easily achieved. Here is an example of using Serialize Sharing to implement the Context function:
CustomError
Rkyv can specify the Error type on the function signature, so only the corresponding Serializer and Deserializer types need to be declared.
Cacheable
Rkyv supports the use of macros to quickly implement related functions. We need to expose the rkyv crate through rspack_cacheable to facilitate the use of the correct rkyv lib after macro expansion.
Macro type only API
we only need to expand it into rkyv macros.
For
#[cacheable(with=...)]
, we should implementrkyv::{Archive, Serialize, Deserialize}
to proxy.Macro field only API
Rkyv supports configuring specific converters directly using
with
in the field.Although Rkyv supports the direct use of
omit_bounds
in fields, but also need to configure the type of bounds to make the program run normally, the detailed implementation can refer to this exampleCacheable_dyn
#[cacheable_dyn]
The implementation referencerkyv_dyn
, but at present rkyv_dyn has not released version 0.8.x, and rkyv_dyn 0.7.x can not be used with rkyv 0.8.x, so refer to its own implementation.Serialize
#[cacheable_dyn]
It is easy to implement in serialize, just add a super trait when defining the current trait.Deserialize
#[cacheable_dyn]
deserialize requires useinventory
crate for global collection. The simple implementation step as follows:__dyn_id
to the trait to distinguish different implementations and write it to the meta information of the serialize data.DeserializeDyn
trait forArchivedStruct
to define deserialize methods and register them globally.__dyn_id
in the meta information to find the correspondingdeserialize
implementation.Preset Converter
In addition to the basic functions mentioned above, we also need to implement a series of preset converters for use in third-party crates and manual implementation scenarios. The implementation of this part mainly relies on rkyv::with. Below is a simple implementation example using AsPreset.
Refs
https://github.com/djkoloski/rust_serialization_benchmark
https://docs.rs/rkyv/0.8.8/rkyv/index.html
https://docs.rs/rkyv/0.8.8/rkyv/with/index.html
Beta Was this translation helpful? Give feedback.
All reactions