Asset System Rewrite #7138

TechnoPorg · 2024-03-10T14:30:17Z

Enhancement Summary

For all kinds of data, from projects to plugins to samples, LMMS currently manages most of its saving and loading of that data through the DataFile class. DataFile does not offer its own built-in serialization methods, but rather relies on individual classes to write to and read from a separate DataFile that then gets serialized as XML.

In short, the DataFile workflow was something that grew, rather than being engineered. From what I gather, most parts of DataFile are written in a way that is inflexible and outdated; I don't think some of that code has seen the light of day in years 🥲. LMMS as a whole would benefit from a more modular, extensible, and flexible asset handling system to replace it.

Justification

This will allow us to do several things much more easily:

Add support for saving and loading new file types like audio file formats, plugin settings, etc. Most of DataFile is hard-coded with enums and fixed lists of file extensions, and the code does not lend itself well to changes to those. A system where the asset importer is decoupled from the asset types themselves would make it far easier to do this.
Remove Qt from core. DataFile is heavily based on QDomDocument, which I don't think was ever designed to be used for the way DataFile uses it, and removing it would be beneficial to the goal of Plan for the core: Removing Qt, Decoupling, and Modernization #6472.
More easily change the serialization method backing project files. Currently, we use XML, but there has also been interest expressed in a JSON format (JSON project format #3981), which would be nigh on impossible to accomplish without a total rewrite of DataFile anyways. More on this point will come later.
Fix bugs and write unit tests. Broadly speaking, I think modernized, better engineered code is always easier to debug and fix problems in, but in this specific case, I do think that specifically, the removal of Qt and decoupling of individual parts would make life better. (Windows) Allow international characters in file names and paths #1191 is an example of a bug that would be much easier to fix with a new system. As I've been saying ad nauseam, the DataFile API is pretty convoluted, and when I tried to write tests for it in Add unit tests for DataFile #7121, I found it was not easy to try and write comprehensive tests, which makes it hard to guard the system against regressions.

Mockup

I've been thinking about how the new system would need to be designed to address the above points, and this is what I've come up with:

`Asset`

All types that can be considered "project data objects" will inherit from the Asset base class. The Asset class will have away of indicating what type of asset it is, and a specific way of binding properties that are to be serialized, with an API something like this:

// The name of this type of asset. Replaces the DataFile type enum with something more flexible.
// Might return "Sample" or "MIDI"
std::string assetType() const;
static std::string assetType() const;

std::vector<std::string> getPropertyList() const; // return names of exposed members.
template<typename T>
T getProperty(std::string property_name) const;
template<typename T>
void setProperty(T &new_value);

Basically, it would be similar in concept to Blender's RNA system, or Godot's Object class.

`AssetLoader`

Asset loading will be handled by the AssetLoader (maybe AssetReader, but I don't like that as much) abstract class, with an API something like this at minimum:

bool handlesFile(std::string file_path) virtual const; // whether or not this asset loader is capable of loading a specific file.
Asset import(std::string file_path) virtual; // imports the file
Asset load(std::string file_path) virtual; // loads an already imported file

I see AssetLoaders being implemented in a hierarchical, modular way. The base asset loader used by LMMS would be AssetLoaderDefault, which would be a wrapper around several AssetCategoryLoaders, which load specific file types. There will also be an AssetLoaderText base class which loads serialized properties of previously saved assets. This diagram illustrates it best:

`AssetWriter`

Asset saving will be handled by the AssetWriter class. There will be an AssetWriterDefault which takes the exposed properties and stores them in a structured format (probably text), and will recursively serialize all Asset properties of an Asset. This could be further specialized into something like an AssetTextWriterXML / AssetTextWriterJSON class if desired.
Example API:

void save(Asset asset, std::string file_path, bool compressed = true) virtual; // Writes an asset to the given location on disk.

Here's a diagram broadly illustrating how the saving of a song might work in practice:

Both this and AssetLoader would ideally be based on std::filesystem.

In conclusion, this would be a major change and represent a significant amount of effort, but I believe it is something that needs to happen if LMMS is to move forward.
I would very much appreciate feedback and debate on the idea, because I am under no illusions that the design I have laid out here is the best possible one, and it could be made even better with input from other, more experienced developers.

Thank you!

The text was updated successfully, but these errors were encountered:

Rossmaxx · 2024-03-10T15:17:37Z

I think this might also help with #6909

Spekular · 2024-03-10T15:22:14Z

Just looking through this quickly, the first thing I'm inclined to ask is what the deal with import vs load is; why does a file need to be imported and then loaded? Also, in what way exactly do you see AssetLoader and AssetWriter being "based on std::filesystem"? I think it's important that serialization/deserialization is a separate concern from IO.

As for the names, AssetCategoryLoader sounds like it loads an AssetCategory, and AssetLoaderText seems like it would load Text assets (especially since you also propose AssetLoaderMP3 and AssetLoaderOGG).

TechnoPorg · 2024-03-10T16:47:44Z

why does a file need to be imported and then loaded?

I had been thinking it would be so that import would add LMMS-specific metadata to a file the first time it's brought into the engine, perhaps with specified import options like number of channels for a sample, and then after that, load would just be able to read without extra processing.

Also, in what way exactly do you see AssetLoader and AssetWriter being "based on std::filesystem"? I think it's important that serialization/deserialization is a separate concern from IO.

Yeah, good point. I meant it in that this new asset system would use std::filesystem rather than Qt for IO. Do you think it would be best to decouple the IO from the serialization by putting them in separate classes?

As for the names, AssetCategoryLoader sounds like it loads an AssetCategory, and AssetLoaderText seems like it would load Text assets (especially since you also propose AssetLoaderMP3 and AssetLoaderOGG).

How about CategoryLoader for the former, and TextAssetLoader for the latter? I agree that keeping AssetLoaderXYZ as convention for specific file types is a good idea.

DomClark · 2024-03-21T23:59:05Z

For all kinds of data, from projects to plugins to samples, LMMS currently manages most of its saving and loading of that data through the DataFile class.

I don't think this is true. DataFile deals with project data, which may be an entire project, or a small part of a project. Examples of project parts include presets (a single instrument), clipboard and drag data (collections of clips), undo history entries (the old version of whatever was modified), etc. If by "plugins" you mean presets (I'm not sure what it means to "save" a plugin otherwise), it makes sense to use the same class, as the data in a preset is the same as that of an instrument in a project. Samples are loaded using SampleLoader.

DataFile does not offer its own built-in serialization methods, but rather relies on individual classes to write to and read from a separate DataFile that then gets serialized as XML.

Surely leaving serialisation to individual classes is the right approach? The idea here is to have an extensible system that doesn't require DataFile to take on too many responsibilities. I'm not sure what you mean by "separate" here - the DataFile instance that is loaded or saved is the same one that project objects are asked to deserialise or serialise themselves to.

Add support for saving and loading new file types like audio file formats, plugin settings, etc.

Supported audio file formats are already dynamically detected in SampleDecoder::supportedAudioTypes, which currently requests them from libsndfile (and adds DrumSynth as we have first-party support for that). First-party plugin settings I have mentioned above, but third-party plugin preset extensions can already be registered by the appropriate wrapper plugin.

Remove Qt from core. DataFile is heavily based on QDomDocument, which I don't think was ever designed to be used for the way DataFile uses it

Ways to reduce the core Qt dependency are always good. I too dislike that DataFile inherits from QDomDocument - it would be much better containing it instead - but I think that's poor OO programming in general, and not a misuse of QDomDocument specifically. Also, we still of course need to support XML in the core somehow. I think which XML library we choose and how DataFile interacts with that library are different, albeit related, questions.

More easily change the serialization method backing project files.

This is a reasonable point. As DataFile inherits from QDomDocument, it is very much locked in to an XML format. I already had in mind a rewrite of the serialisation API that abstracted both XML and JSON structures into a single interface that can be passed to project objects in place of QDomDocument. I haven't started on this at all, though.

michaelgregorius · 2024-03-22T21:35:50Z

Please note that there is also pull request #6866 by @consolegrl which intends to improve the structure of the upgrade routines with regards to DataFile, etc. I don't know what's the state of that PR though. It has merge conflicts and might have gone stale.

@TechnoPorg, how do you want to deal with loading existing files?

anytizer · 2024-05-16T07:13:01Z

Another idea for the default samples and presets shipping with LMMS:

I think of having one lump of sqlite database that contains current snapshot of ogg, xpf and xiz, and .ds data. There are roughly 1400 tiny files scattered in places. The current assets files compose around 15mb.

A rough database would be:

CREATE TABLE "files" (
	"file_id"	TEXT NOT NULL,
	"file_source"	TEXT NOT NULL, --  which folder being scanned?
	"file_version"	TEXT NOT NULL, --  file version
	"file_type"	TEXT NOT NULL, --  one of ogg, xpf, xiz or .ds
	"file_category"	TEXT NOT NULL, --  serves as instrument category?
	"file_filename"	TEXT NOT NULL, --  physical file name
	"file_disksize"	TEXT NOT NULL, --  disk space
	"file_sha256"	TEXT NOT NULL, --  hash
	"file_content"	BLOB NOT NULL, --  binary data of the asset
	PRIMARY KEY("file_id")
);

Every time a resource is required, it would query the database and cache it optionally.
Which is pretty much as fast as accessing the files directly.

Introducing an embedded database is about ~1.5mb overhead with sqlite3.h. But It could help a lot in enhancing directory browser, assets navigation, searches, configurations, svgs, themes and many more by putting them in a single file.
It would also help in mapping keys in the piano roll etc as per user preferences / culture.

The only drawback is 15mb (aka, the final assets as sqlite) to be under version control. (I have already connected to the sqlite with native c++ to test it, and might help if in case we want to migrate to using database, including current assets.)

TechnoPorg added the enhancement label Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asset System Rewrite #7138

Asset System Rewrite #7138

TechnoPorg commented Mar 10, 2024

Rossmaxx commented Mar 10, 2024

Spekular commented Mar 10, 2024

TechnoPorg commented Mar 10, 2024

DomClark commented Mar 21, 2024

michaelgregorius commented Mar 22, 2024

anytizer commented May 16, 2024

Asset System Rewrite #7138

Asset System Rewrite #7138

Comments

TechnoPorg commented Mar 10, 2024

Enhancement Summary

Justification

Mockup

Asset

AssetLoader

AssetWriter

Rossmaxx commented Mar 10, 2024

Spekular commented Mar 10, 2024

TechnoPorg commented Mar 10, 2024

DomClark commented Mar 21, 2024

michaelgregorius commented Mar 22, 2024

anytizer commented May 16, 2024

`Asset`

`AssetLoader`

`AssetWriter`