Multi-expression objects and fixed functional forms #193

MilesCranmer · 2023-04-11T15:21:16Z

Right now you can fix a functional form by writing a custom objective function that splits an expression tree into subexpressions. However, this is not an elegant approach.

I think a better solution is to have a multi-expression object:

struct MultiTree{T<:Number}
    trees::Dict{Symbol,Node{T}}
end
MultiTree(; kwargs...) = MultiTree(Dict([Symbol(k) => v for (k, v) in kwargs]))

Then, for example, if I wish to find the functional form:

SymbolicRegression.@form F = f(x1, x2, x3) + g(x4, x5) / f(x1, x2, x6)

this would result in the following MultiTree:

operators = Options(; binary_operators=[+, *, /, -], unary_operators=[cos, sin], form=F) 
# Note new argument "form"

x1, x2, x3 = [Node(;feature=i) for i=1:3]

example_multi_tree = MultiTree(f=cos(x1) * 3.2, g=x2*x2 - 0.1)

The evaluation would then be:

function eval_multi_tree(mtree::MultiTree, X::Matrix{T}, options) where T
    outputs = Dict{Symbol,Vector{T}}()
    for (k, tree) in mtree.trees
        features = options.form.features[k]
        outputs[k] = eval_tree_array(tree, X[features, :], options)
    end
    return options.form(outputs)
end

For the evolution: at each mutation, perhaps one random tree of the multi-tree would be mutated or crossed-over. There would be an assumption that all individuals have the same multi-tree structure.

@ChrisRackauckas @AlCap23 @johanbluecreek what do you think of this? It could be interesting to use to evolve SINDy bases.

Hey @marcovirgolin, what do you think of something like this with multiple trees in a single individual? This is sort of similar to what Eureqa used to do.

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2023-04-11T15:22:38Z

This could also be a nice way to get vector expressions:

SymbolicRegression.@form F = [f1(x1, x2, x3), f2(x1, x2, x3), f3(x1, x2, x3)]

marcovirgolin · 2023-04-11T17:59:48Z

I have done something like that in the past.
You have a good catch of mutating a single tree at the time: as the search space is larger with multi-trees, of course less likely to get multiple lucky simultaneous mutations.

Makes sense to assume all solutions have same number of trees if you know this a priori, but that needs not always need to be the case. E.g., you might want to do SR for interpretable dimensionality reduction from R^d (d original features) to R^k (k latent features), where you want k to be small to have fewer trees to interpret, but large enough to get good reconstruction accuracy.

MilesCranmer · 2023-08-28T07:39:21Z

I have been thinking more about how this could work. Programmatically I think the easiest way forward is as follows.

Create a way to "freeze" a set of nodes in a tree. Those nodes cannot be mutated or used in crossovers.
- Option 1: a new custom tree type FrozenNode <: AbstractNode (Create AbstractNode super type SymbolicML/DynamicExpressions.jl#53)
- Option 2: maybe just storing custom metadata in Node{T,schema}?
You would use a regular Node type to store all expression information. For two subexpressions f and g, you would have root.l storing f, and root.r storing g. Since root would be labeled as frozen, it would not be reduced.
- However, you could still crossover the contents of f with the contents of g. Just you couldn't crossover root itself.
Rewrite functions like eval_tree_array, string_tree, node_to_symbolic to work with a custom functional form.

It will still take a bit of work to get this to work. But I think this is the easiest way to embed it in SymbolicRegression.jl without breaking anything for existing functionality.

eelregit · 2024-01-04T05:46:35Z

Found this issue after submitting a quite similar feature request for composite PySR regressor in MilesCranmer/PySR#514

For the evolution: at each mutation, perhaps one random tree of the multi-tree would be mutated or crossed-over. There would be an assumption that all individuals have the same multi-tree structure.

I wonder when different trees are degenerate, would correlated changes (mutating all sub-trees together) actually help to optimize faster, even though with a larger search space?

Create a way to "freeze" a set of nodes in a tree.

Very interesting approach. If I understand correctly, would it be easy for this to enforce, say the final sub-tree is at most a bivariate function (i.e., $k\leq2$ in @marcovirgolin 's comment), with one variable being a constrained expression of the other sub-trees ($f+g/f$ in an above example)?

MilesCranmer added the feature (importance: mid) Mid-importance feature label Apr 11, 2023

MilesCranmer self-assigned this Aug 28, 2023

MilesCranmer mentioned this issue Dec 26, 2023

[Feature]: Add priori of equation form MilesCranmer/PySR#494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-expression objects and fixed functional forms #193

Multi-expression objects and fixed functional forms #193

MilesCranmer commented Apr 11, 2023 •

edited

MilesCranmer commented Apr 11, 2023

marcovirgolin commented Apr 11, 2023

MilesCranmer commented Aug 28, 2023 •

edited

eelregit commented Jan 4, 2024 •

edited

Multi-expression objects and fixed functional forms #193

Multi-expression objects and fixed functional forms #193

Comments

MilesCranmer commented Apr 11, 2023 • edited

MilesCranmer commented Apr 11, 2023

marcovirgolin commented Apr 11, 2023

MilesCranmer commented Aug 28, 2023 • edited

eelregit commented Jan 4, 2024 • edited

MilesCranmer commented Apr 11, 2023 •

edited

MilesCranmer commented Aug 28, 2023 •

edited

eelregit commented Jan 4, 2024 •

edited