Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding accuracy of expressions #464

Open
zhuyi-bjut opened this issue Nov 14, 2023 · 5 comments
Open

Understanding accuracy of expressions #464

zhuyi-bjut opened this issue Nov 14, 2023 · 5 comments

Comments

@zhuyi-bjut
Copy link

hello!
In my recent research, I used pysr to do some symbolic regression tasks. I found that pysr 's loss is even smaller than ANN in some cases. How can I explain this magic of pysr ? Why is the result of low-dimensional expressions better than high-dimensional networks ?
Thanks!

@MilesCranmer
Copy link
Owner

Hi @prozhuyi,

Thanks for this. Yes I also find sometimes symbolic expressions beat neural nets for specific problems. It really has to do with priors over the space of functions. When you train a neural net, there is an implicit prior that the function will be smooth and other properties.

Symbolic regression imposes a different prior over the space of functions. Sometimes you will have that this prior is superior to the neural net prior, especially if the operators you are using are an efficient basis for describing your field.

cheers,
Miles

@MilesCranmer MilesCranmer changed the title About the effect of pysr Understanding accuracy of expressions Nov 14, 2023
@zhuyi-bjut
Copy link
Author

I seem to understand ! Thank you for your answer!

@zhuyi-bjut
Copy link
Author

hello Miles @MilesCranmer

I recently had another question, which is the ' score ' given by pysr. How is this ' score ' obtained ? Is it obtained by this step ?

           `if lastMSE is None:
                cur_score = 0.0
            else:
                if curMSE > 0.0:
                    # TODO Move this to more obvious function/file.
                    cur_score = -np.log(curMSE / lastMSE) / (curComplexity - lastComplexity)
                else:
                    cur_score = np.inf`

and what is its significance ?
thanks again!

@MilesCranmer
Copy link
Owner

Yes, that is the score. It basically is a heuristic that looks for sharp decreases in loss when increasing complexity (traditional metric for "best" equation in SR). There are more details on this in the PySR paper: https://arxiv.org/abs/2305.01582

@tanweer-mahdi
Copy link
Contributor

Hi @MilesCranmer ,

It is a very interesting discussion. Just elaborating your answer a little more and correct me if I am wrong:

The ANN assumes a prior over the space of smooth (and other properties) functions whereas Symbolic Regression can allow non-smooth functions as well, which sometimes can be a more suitable prior for a particular problem.

Is the above statement correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants