Possible optimisation on the `TreeMapper` #583

Baptouuuu · 2024-11-30T15:49:34Z

Context

As discussed IRL we want this package to be more performant. Either by reducing the memory footprint or reducing the number of steps of the whole process.

This issue is about the latter.

Findings

Note

Bear in mind that I'm new to the internals of this package and may not have all the implications of the mving parts mentionned below.

The mapping process is as follows:

a signature and a raw value is passed to the TypeTreeMapper
TypeTreeMapper calls the TypeParser with the signature as argument
TypeParser recursively builds a Type tree
TypeTreeMapper calls the RootNodeBuilder with the Type tree and raw value wrapped in a Shell
RootNodeBuilder recursively calls the underlying builders to find the correct builder for the Type
TypeTreeMapper returns the built data or throws on error

Graphically this looks something like that:

graph TB
    Input([Input]) --> internal --> Output([Output]);
    
    subgraph internal
    TreeMapper(TreeMapper)
    TreeMapper<-->TypeParser;
    TreeMapper <--> NodeBuilder(NodeBuilder)
    TypeParser-->|Recursion|TypeParser;
    NodeBuilder-->|Recursion|NodeBuilder
    end

Having two separate recursive calls is a good thing as it allows the TypeParser to stay independent.

However the second recursion takes extra steps to find the correct builder. For example the CasterNodeBuilder loops over multiple builders to find the one associated to the Shell Type.

When calling the TypeTreeMapper only once for a given signature this may be acceptable in order to keep the code structure simple.

But if one calls TypeTreeMapper multiple times to map multiple sources with the same signature (for example streaming data from a database), this becomes time consuming. In this case the recursion to associate the builder to the correct type is done N times (N being the number of calls to TypeTreeMapper).

In essence this is an O(n) problem when it could a O(1).

Solution

The mapping process could be as follows:

a signature and a raw value is passed to the TypeTreeMapper
TypeTreeMapper calls the TypeParser with the signature as argument
TypeParser recursively builds a Type tree
TypeTreeMapper recursively walks the Type tree to attach the corresponding NodeBuilder on each node (let's call it NodeBuilderTree for now)
TypeTreeMapper calls NodeBuilderTree with the raw value
NodeBuilderTree returns the built data or throws

Graphically this looks something like that:

graph TB
    Input([Input]) --> internal --> Output([Output]);
    
    subgraph internal
    TreeMapper(TreeMapper)
    TreeMapper<-->TypeParser;
    TreeMapper <--> NodeBuilderTree(NodeBuilderTree)
    TypeParser-->|Recursion|TypeParser;
    NodeBuilderTree-->|Recursion|NodeBuilderTree
    TreeMapper-->NodeBuilder(NodeBuilder)
    end

In order to accomplish this the NodeBuilder service in the Container needs to be refactored. Instead of having a single service NodeBuilder where the object is an object tree of builders, each builder could be it's own service. Then to find the builder for any given Type it would be a call to Container->get($type::class) (this is pseudo code).

But for this to be really useful the parsing and the builder tree creation must be split from when the builder is called.

It would go from (pseudo code):

foreach ($generator as $rawValue) {
    $builtData = $mapper->map('signature', $rawValue);
}

To:

$map = $mapper->forType('signature');

foreach ($generator as $rawValue) {
    $builtData = $map($rawValue);
}

Final thoughts

At first glance this could be relatively simple as it's mainly moving code around.

But taking into account all the Settings may require to flip some logic on its head, conditions done at the highest level would be pushed down the lowest ones. It may result in a big refactoring.

The other benefit of this approach would be that compiling the builder would be simpler. The process would be:

Parse the signature to build the type tree
Generate a PHP file that create the builder tree object

On the second call it only needs to find the correct PHP file for the given signature (via a hash of the signature for example). The signature parsing is no longer needed and the builder tree matches the expected input.

What do you think ?

The text was updated successfully, but these errors were encountered:

mastir · 2024-12-19T22:13:29Z

Hey @Baptouuuu this is great idea. Node mapping is hard related to the configuration and probably contain pointers to functions registered in configuration so not so easy serializable and any changes to configuration will break mapping compatibility. As for me it sounds more like goal for version 2.0 milestone

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible optimisation on the `TreeMapper` #583

Possible optimisation on the `TreeMapper` #583

Baptouuuu commented Nov 30, 2024

mastir commented Dec 19, 2024

Possible optimisation on the TreeMapper #583

Possible optimisation on the TreeMapper #583

Comments

Baptouuuu commented Nov 30, 2024

Context

Findings

Solution

Final thoughts

mastir commented Dec 19, 2024

Possible optimisation on the `TreeMapper` #583

Possible optimisation on the `TreeMapper` #583