Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_data.default() to fall back on backup procedure in case of malformed model.frame() output #624

Closed
ngreifer opened this issue Aug 29, 2022 · 1 comment
Labels
Bug 🐛 Something isn't working

Comments

@ngreifer
Copy link

Hello,

In get_data.default(), if the call to model.frame() fails, a backup procedure is enacted, i.e.,

insight/R/get_data.R

Lines 62 to 73 in 878b3ae

if (is.null(mf)) {
mf <- tryCatch(
{
dat <- .recover_data_from_environment(x)
vars <- find_variables(x, flatten = TRUE, verbose = FALSE)
dat[, intersect(vars, colnames(dat)), drop = FALSE]
},
error = function(x) {
NULL
}
)
}

However, this only gets triggered when mf returns NULL, i.e., when model.frame() returns an error. In some cases, model.frame() may return a malformed data frame, e.g., one with zero rows, which is then missed by the backup procedure.

As an example, consider running get_data() on the output of estimatr::iv_robust(), which, due to a bug, returns a malformed dataset when model.frame() is called on the output (as of estimatr v1.0.0):

#Using lalonde dataset in MatchIt
fit <- estimatr::iv_robust(re78 ~ treat + age + race | educ + age + race,
                           data = MatchIt::lalonde)

#Malformed output; 0 rows
model.frame(fit)
#> Warning in Ops.factor(treat + age, race): '+' not meaningful for factors
#> Warning in Ops.factor(educ + age, race): '+' not meaningful for factors
#> [1] re78                                  
#> [2] treat + age + race | educ + age + race
#> <0 rows> (or 0-length row.names)

#Returns NULL with warnings above
dat <- insight::get_data(fit)
#> Warning in Ops.factor(treat + age, race): '+' not meaningful for factors

#> Warning in Ops.factor(treat + age, race): '+' not meaningful for factors
#> Warning: Could not get model data.

#Manually running backup produces correct output
dat <- insight:::.recover_data_from_environment(fit)
vars <- insight::find_variables(fit, flatten = TRUE, verbose = FALSE)
dat <- dat[, intersect(vars, colnames(dat)), drop = FALSE]
str(dat)
#> 'data.frame':    614 obs. of  5 variables:
#>  $ re78 : num  9930 3596 24909 7506 290 ...
#>  $ treat: int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ age  : int  37 22 30 27 33 22 23 32 22 33 ...
#>  $ race : Factor w/ 3 levels "black","hispan",..: 1 2 1 1 1 1 1 1 1 3 ...
#>  $ educ : int  11 9 12 11 8 9 12 11 16 12 ...

Created on 2022-08-29 with reprex v2.0.2

I have already submitted an issue with the estimatr team, but a fix might be more useful in general.

@strengejacke strengejacke added the Bug 🐛 Something isn't working label Aug 30, 2022
@ngreifer
Copy link
Author

FYI I still get warnings when running get_data() because the call to model.frame() produces a warning even though its output is not returned to the user (i.e., the backup is used). That is, we have

Attempt model.frame() -> warning thrown and 0 row df returned -> attempt backup -> succeed

The problem is that the user still sees a warning due to the call to model.frame(), even though that warning is not relevant to them because the backup succeeded in returning the data. Might it be possible to hold onto warnings from the model.frame() attempt and only throw them if the backup does not succeed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants