Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible optimisation on the TreeMapper #583

Open
Baptouuuu opened this issue Nov 30, 2024 · 1 comment
Open

Possible optimisation on the TreeMapper #583

Baptouuuu opened this issue Nov 30, 2024 · 1 comment

Comments

@Baptouuuu
Copy link

Context

As discussed IRL we want this package to be more performant. Either by reducing the memory footprint or reducing the number of steps of the whole process.

This issue is about the latter.

Findings

Note

Bear in mind that I'm new to the internals of this package and may not have all the implications of the mving parts mentionned below.

The mapping process is as follows:

  • a signature and a raw value is passed to the TypeTreeMapper
  • TypeTreeMapper calls the TypeParser with the signature as argument
  • TypeParser recursively builds a Type tree
  • TypeTreeMapper calls the RootNodeBuilder with the Type tree and raw value wrapped in a Shell
  • RootNodeBuilder recursively calls the underlying builders to find the correct builder for the Type
  • TypeTreeMapper returns the built data or throws on error

Graphically this looks something like that:

graph TB
    Input([Input]) --> internal --> Output([Output]);
    
    subgraph internal
    TreeMapper(TreeMapper)
    TreeMapper<-->TypeParser;
    TreeMapper <--> NodeBuilder(NodeBuilder)
    TypeParser-->|Recursion|TypeParser;
    NodeBuilder-->|Recursion|NodeBuilder
    end
Loading

Having two separate recursive calls is a good thing as it allows the TypeParser to stay independent.

However the second recursion takes extra steps to find the correct builder. For example the CasterNodeBuilder loops over multiple builders to find the one associated to the Shell Type.

When calling the TypeTreeMapper only once for a given signature this may be acceptable in order to keep the code structure simple.

But if one calls TypeTreeMapper multiple times to map multiple sources with the same signature (for example streaming data from a database), this becomes time consuming. In this case the recursion to associate the builder to the correct type is done N times (N being the number of calls to TypeTreeMapper).

In essence this is an O(n) problem when it could a O(1).

Solution

The mapping process could be as follows:

  • a signature and a raw value is passed to the TypeTreeMapper
  • TypeTreeMapper calls the TypeParser with the signature as argument
  • TypeParser recursively builds a Type tree
  • TypeTreeMapper recursively walks the Type tree to attach the corresponding NodeBuilder on each node (let's call it NodeBuilderTree for now)
  • TypeTreeMapper calls NodeBuilderTree with the raw value
  • NodeBuilderTree returns the built data or throws

Graphically this looks something like that:

graph TB
    Input([Input]) --> internal --> Output([Output]);
    
    subgraph internal
    TreeMapper(TreeMapper)
    TreeMapper<-->TypeParser;
    TreeMapper <--> NodeBuilderTree(NodeBuilderTree)
    TypeParser-->|Recursion|TypeParser;
    NodeBuilderTree-->|Recursion|NodeBuilderTree
    TreeMapper-->NodeBuilder(NodeBuilder)
    end
Loading

In order to accomplish this the NodeBuilder service in the Container needs to be refactored. Instead of having a single service NodeBuilder where the object is an object tree of builders, each builder could be it's own service. Then to find the builder for any given Type it would be a call to Container->get($type::class) (this is pseudo code).

But for this to be really useful the parsing and the builder tree creation must be split from when the builder is called.

It would go from (pseudo code):

foreach ($generator as $rawValue) {
    $builtData = $mapper->map('signature', $rawValue);
}

To:

$map = $mapper->forType('signature');

foreach ($generator as $rawValue) {
    $builtData = $map($rawValue);
}

Final thoughts

At first glance this could be relatively simple as it's mainly moving code around.

But taking into account all the Settings may require to flip some logic on its head, conditions done at the highest level would be pushed down the lowest ones. It may result in a big refactoring.

The other benefit of this approach would be that compiling the builder would be simpler. The process would be:

  • Parse the signature to build the type tree
  • Generate a PHP file that create the builder tree object

On the second call it only needs to find the correct PHP file for the given signature (via a hash of the signature for example). The signature parsing is no longer needed and the builder tree matches the expected input.

What do you think ?

@mastir
Copy link

mastir commented Dec 19, 2024

Hey @Baptouuuu this is great idea. Node mapping is hard related to the configuration and probably contain pointers to functions registered in configuration so not so easy serializable and any changes to configuration will break mapping compatibility. As for me it sounds more like goal for version 2.0 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants