Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating trees with features not in dataset #133

Open
Jgmedina95 opened this issue Sep 14, 2022 · 1 comment
Open

Evaluating trees with features not in dataset #133

Jgmedina95 opened this issue Sep 14, 2022 · 1 comment

Comments

@Jgmedina95
Copy link

Hi, im working on a process that uses the equation returned in the Pareto Frontier.
I was playing around with it and found the following:
Setting a tree like:
image
and evaluating with a dataset with 6 features, gives no problem, as expected.
image

but if i create a tree that uses more features than in the original dataset like:
image

I expected an error, but the function eval_tree_array gave an output and completion ==true.
image
I've realized this will not impact what I'm doing as trees made by the program will never (right?) have more features than the dataset, but I suppose its interesting enough of a bug to share.

@MilesCranmer
Copy link
Owner

Hi @Jgmedina95,

Thanks for raising this! So this should be because @inbounds is used throughout the evaluation to have faster compute. I make an in-bounds assumption since trees made during the search should never have different features than available in the dataset. However, as you raise, I don't think it hurts to check before running the evaluation, so maybe we could do that. (it could be that the user could save their search state, then re-run with a different dataset, and be surprised when trees access undefined memory?)

I just looked at the code, and it seems like sometimes @inbounds is actually not used, such as:

return (cX[tree.feature, :], true)

Whereas, @inbounds is used in the functions which fuse operators together, like here:

x_l = op_l(val_ll, cX[feature_lr, j])::T

So perhaps for some trees, this would raise an error, and sometimes it would not. Will think more whether we should add a bounds check.

Best,
Miles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants