Replies: 9 comments 7 replies
-
Hi Nico, Thanks for reaching out! For beginner tutorials, you could try the online tutorial here: https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb
There are no convergence tests available (sometimes the model might look like its converged, but then find a new branch of the evolutionary tree, and continue from there). However, there are some ways you can trigger an early stop; see the "Stopping Criteria" section of the API reference page: https://astroautomata.com/PySR/api/#stopping-criteria
Not by default, although you can set up your own selection strategies. After the search, the Pareto front is stored in For example, to implement AIC, you could do this as follows: equations = model.equations_
import re
number_matching_pattern = r"(?<![a-zA-Z0-9_.])[+-]?(\d+\.\d+|\.\d+|\d+\.|\d+)(?:[eE][-+]?\d+)?"
# Count number of constants:
equations["number_constants"] = [len(re.findall(number_matching_pattern, eq)) for eq in equations["equation"]]
# Compute log likelihood (for example)
equations["log_like"] = - equations["loss"] * len(X)
# Compute AIC:
equations["aic"] = 2 * equations["number_constants"] - 2 * equations["log_like"]
# Find best AIC:
best_row = equations["aic"].argmin()
# Use in different contexts:
model.sympy(index=best_row) # SymPy version
model.latex(index=best_row) # LaTeX version
model.predict(X, index=best_row) # Make predictions with equation on some data `X` |
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Thanks!!
Wonder if you would know tutorials with real data using docker and PySR?
Ex: importing a file, selecting predictive and response var, setting up the
model, run it, selecting best equations, ...
Cheers!
Nico
Le ven. 21 avr. 2023 18 h 34, Miles Cranmer ***@***.***> a
écrit :
… Hi Nico,
Thanks for reaching out!
For beginner tutorials, you could try the online tutorial here:
https://colab.research.google.com/github/MilesCranmer/PySR/blob/master/examples/pysr_demo.ipynb
I can see how to set up the model and the search space. Not sure if there
is a convergence or any search parameters to tell the model to stop
searching (a bit what there was with Eureqa and convergence)?
There are no convergence tests available (sometimes the model might look
like its converged, but then find a new branch of the evolutionary tree,
and continue from there). However, there are some ways you can trigger an
early stop; see the "Stopping Criteria" section of the API reference page:
https://astroautomata.com/PySR/api/#stopping-criteria
Is there a way to select (after the search) the simplest model based on
AIC for example?
Not by default, although you can set up your own selection strategies.
After the search, the Pareto front is stored in model.equations_, which
is a pandas dataframe
<https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe> with
columns for the loss, complexity, and the equation.
For example, to implement AIC, you could do this as follows:
equations = model.equations_
import renumber_matching_pattern = r"(?<![a-zA-Z0-9_.])[+-]?(\d+\.\d+|\.\d+|\d+\.|\d+)(?:[eE][-+]?\d+)?"
# Count number of constants:equations["number_constants"] = [len(re.findall(number_matching_pattern, eq)) for eq in equations["equation"]]
# Compute log likelihood (for example)equations["log_like"] = - equations["loss"] * len(X)
# Compute AIC:equations["aic"] = 2 * equations["number_constants"] - 2 * equations["log_like"]
# Find best AIC:best_row = equations["aic"].argmin()
# Use in different contexts:model.sympy(index=best_row) # SymPy versionmodel.latex(index=best_row) # LaTeX versionmodel.predict(X, index=best_row) # Make predictions with equation on some data `X`
—
Reply to this email directly, view it on GitHub
<#312 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY5D6DBZ7BZDITCBMSX3RDXCMDQFANCNFSM6AAAAAAXHKB3EE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I gave a talk+tutorial here: https://www.youtube.com/watch?v=q6tjKXmhiMs, although it also ventures into some deep learning stuff. The accompanying tutorial code is here: https://github.com/MilesCranmer/pysr_tutorial, which uses docker. |
Beta Was this translation helpful? Give feedback.
-
Fantastic, I will give a try ASAP!!
Thanks a lot for the help, highly appreciated!
Cheers,
Nico
Le ven. 21 avr. 2023 à 19:29, Miles Cranmer ***@***.***> a
écrit :
… I don’t know what predictive and response variable are but I assume
predictive=X and response=y.
—
Reply to this email directly, view it on GitHub
<#312 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY5D6BWYLUZOXYF7B4CLLDXCMJ7DANCNFSM6AAAAAAXHKB3EE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
***************************************************************
Nicolas Tromas PhD
LS2N/Université de Montréal
E-mail: ***@***.*** ***@***.***>
Researchgate: NTromasPage
<https://www.researchgate.net/profile/Nicolas_Tromas>
Web: http://www.shapirolab.ca/
***************************************************************
|
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
I think I made it working properly! However I wonder how to avoid that the
search space is stuck. With Eureqa, I repeated each run 10x, then I kept
the best formula based on AICc.
I also wonder how to improve the process time, taking into account that I
might have ~1000 observations, and 5000 predictive variables.
To start, I used a simpler dataset with only 40 predictive var andthe
following parameters:
default_pysr_params = dict(populations=80,model_selection="best",)
from pysr import PySRRegressor
model = PySRRegressor(niterations=40, binary_operators=["+", "*",
"-","/"],unary_operators=["exp","inv(x) =
1/x",],extra_sympy_mappings={"inv": lambda x: 1 / x},**default_pysr_params)
model.fit(X,Y)
Thanks for your time and advices!!
Cheers,
Nico
Le ven. 21 avr. 2023 à 21:48, Nicolas Tromas ***@***.***> a
écrit :
… Fantastic, I will give a try ASAP!!
Thanks a lot for the help, highly appreciated!
Cheers,
Nico
Le ven. 21 avr. 2023 à 19:29, Miles Cranmer ***@***.***> a
écrit :
> I don’t know what predictive and response variable are but I assume
> predictive=X and response=y.
>
> —
> Reply to this email directly, view it on GitHub
> <#312 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABY5D6BWYLUZOXYF7B4CLLDXCMJ7DANCNFSM6AAAAAAXHKB3EE>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
***************************************************************
Nicolas Tromas PhD
LS2N/Université de Montréal
E-mail: ***@***.*** ***@***.***>
Researchgate: NTromasPage
<https://www.researchgate.net/profile/Nicolas_Tromas>
Web: http://www.shapirolab.ca/
***************************************************************
--
***************************************************************
Nicolas Tromas PhD
LS2N/Université de Montréal
E-mail: ***@***.*** ***@***.***>
Researchgate: NTromasPage
<https://www.researchgate.net/profile/Nicolas_Tromas>
Web: http://www.shapirolab.ca/
***************************************************************
|
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Thanks a lot! Awesome 👌
Is there a way to get R2 for each equations?
I got really cool results for my dataset that were confirmed with other
analysis :)
Nico
Le lun. 24 avr. 2023 10 h 46, Miles Cranmer ***@***.***> a
écrit :
… Maybe have a look at https://astroautomata.com/PySR/tuning/ as well as
the API reference page? There are some other discussions on improving
performance which might be useful too.
—
Reply to this email directly, view it on GitHub
<#312 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY5D6AUJ26HN4KN2POI4ELXC2G2XANCNFSM6AAAAAAXHKB3EE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Thanks!! It was not clear if PySR automatically split the dataset into
training and testing dataset.
If not, I should generate it and then use y_pred with my testing df that
would have a similar number of columns than X.
Cheers!!
Nico
Le mar. 25 avr. 2023 à 07:48, Miles Cranmer ***@***.***> a
écrit :
… You can do this with sklearn:
import sklearn
equation_index = 10 # choose an equation
y_pred = model.predict(X, equation_index)
r2 = sklearn.metrics.r2_score(y, y_pred)
—
Reply to this email directly, view it on GitHub
<#312 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY5D6CAJG6ZSEQ4IGYOMDLXC62ZDANCNFSM6AAAAAAXHKB3EE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
***************************************************************
Nicolas Tromas PhD
LS2N/Université de Montréal
E-mail: ***@***.*** ***@***.***>
Researchgate: NTromasPage
<https://www.researchgate.net/profile/Nicolas_Tromas>
Web: http://www.shapirolab.ca/
***************************************************************
|
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Hope all's good!
I would have 2 questions for you:
1/ When I change from niteration=100 to =200, the time process is
respectively 20min to >1 day...Is it normal?
2/When I compared R2 (with the best equation based on score) from training
and R2 from testing, I observe huge difference (e.g 90% vs 2%). I though SR
was robust to overfitting?
Cheers,
Nico
default_pysr_params = dict(populations=100,model_selection="best",)
model = PySRRegressor(niterations=100, binary_operators=["+", "*",
"-","/"],unary_operators=["exp","inv(x) =
1/x",],extra_sympy_mappings={"inv": lambda x: 1 / x},**default_pysr_params)
Le mar. 25 avr. 2023 à 19:48, Miles Cranmer ***@***.***> a
écrit :
… PySR does not do this split. You should only give your training data to
PySR.
—
Reply to this email directly, view it on GitHub
<#312 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABY5D6B5LMLSJB6KARHQ63DXDBPFNANCNFSM6AAAAAAXHKB3EE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
***************************************************************
Nicolas Tromas PhD
LS2N/Université de Montréal
E-mail: ***@***.*** ***@***.***>
Researchgate: NTromasPage
<https://www.researchgate.net/profile/Nicolas_Tromas>
Web: http://www.shapirolab.ca/
***************************************************************
|
Beta Was this translation helpful? Give feedback.
-
Beginner materials is super awesome! I have really enjoyed the tutorials thus far. I am trying to interpret how PySR can do multiple equation regression. For example, a 2d pendulum, a falling sliding ladder, etc. I may be misunderstanding, but there should be two equations of motion x(t) and y(t), for example. But all the examples I find seem to only ever solve for one equation. Maybe I am not looking at the right examples? If these examples do not exist, but there is some code somewhere in how to implement them. I would be happy to write the tutorial to make a 2d code example. |
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Hope all's good! I would love to test PySR despite my pretty poor skills in Python/Julia (I work much more on R but I should definitely change that). I was a Eureqa user and I really love the interface/tool. I also got pretty cool results with Eureqa that were confirmed later with new data.
I would like to test PySR with my dataset (taxa abundance and gene frequency) to predict toxicity. I am trying to follow the PySR doc with Docker but I am not sure how to define the predictive var and the response var. Is there any tutorial (for beginners) that you know of?
I can see how to set up the model and the search space. Not sure if there is a convergence or any search parameters to tell the model to stop searching (a bit what there was with Eureqa and convergence)?
Is there a way to select (after the search) the simplest model based on AIC for example?
Sorry if these are very naive questions!
Thanks for your help,
Nico
Beta Was this translation helpful? Give feedback.
All reactions