Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic error handling within eia_data() from metadata layer #17

Merged
merged 20 commits into from
Nov 16, 2023
Merged

Conversation

mghoff
Copy link
Collaborator

@mghoff mghoff commented Nov 11, 2023

No description provided.

Copy link
Member

@leonawicz leonawicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminded me we should have some unit tests that make use of all possible frequency values, so we'll probably have to test against at least a couple different data sources for which different frequencies can be requested. But my biggest question is the one about the algo for requesting metadata within a data request.

R/data.R Outdated
if(!is.null(end)) .end_check(end, freq, md$Frequency, md_end, md_start)
if(!is.null(sort)) .sort_check(sort)
if(!is.null(length)) .lng_check(length)
if(!is.null(offset)) .ofs_check(offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment about lines 83-105: What do you think about migrating all the instances of this if(!is.null(x)) stuff into the functions that check or handle the rest of the respective operations? Seems like since we have all these other internal helper functions handling cases and returning something (or nothing), they could also handle the NULL case themselves as well. Then outer functions like .eia_data_url() and .eia_data_check() can be further simplified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I will make this happen. Good catch.

R/data.R Outdated
@@ -58,6 +58,8 @@ eia_data <- function(dir, data = NULL, facets = NULL,
}

.eia_data <- function(dir, data, facets, freq, start, end, sort, length, offset, tidy, key){
md <- eia_metadata(dir, TRUE, TRUE, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that every API request becomes a minimum of two API requests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. I'm not sure how else having truly dynamic validation of input values could be accomplished - unless we precompiled within the package itself (maybe within data-raw/...?) all the available options for every API data endpoint.

Copy link
Member

@leonawicz leonawicz Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to only check as part of error handling itself, like catching an exception and handling it as needed.

Meaning, data request is made. Then, if data is returned in some expected manner, do nothing but return the data. This would be the 99% common use case and only ever makes one request. Otherwise, handle the exception when it occurs (whether that is a try catch around an actual error, or checking for some message or lack of data in the response).

So a second API request, for metadata, would only be made at the end of the parent data request function if needed based on the result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, completely dynamic- typical exception handling around the data call, rather than an automatic preemptive request.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it, I'm thinking I could just re-order the metadata call to come after the data call...

Let me first commit and push the is.null() checks to the helper functions, and then I'll play around with error handling on error/warning, rather than preemptively.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing to realize, is that it will only make two API calls on the first function call to that endpoint. After that initial call, that endpoint and its metadata will be cached, so only one API call is made from then on within the context of that API endpoint (dir entry).

Copy link
Member

@leonawicz leonawicz Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That's an excellent benefit of the memoization.

Another option is to make available a check_metadata = FALSE (default) parameter to eia_data(), ideally after tidy and before key, as perhaps the second least likely parameter to be passed an argument by the user.

Then, if they want to run a call with a more robust check that (potentially) makes another API request, they can set check_metadata = TRUE. Think of it like a sort of debug mode, though I wouldn't use that name. Then within the code, the metadata request and the subsequent checks could all be wrapped in an if(check_metadata).

It's just another option; if it's not worth the trouble to have the metadata request and checks be fully dynamic/only run as needed, then having the extra checking be optional but off by default is good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no - I like this. Good suggestion.

…rom metadata check; `sort` made more robust where `order` may now handle a list of varying length respective of `cols` length; messaging made more consistent with API error messaging
@mghoff
Copy link
Collaborator Author

mghoff commented Nov 16, 2023

All this unit testing work has made me really re-think the user experience and the in-package (non-API) error handling I've written. I've modified this fairly heavily; e.g. removing sort, length and offset from metadata checks and more.

One thing I'm still concerned with, and would appreciate your input on, is with the start and end arguments. The API doesn't actually care if the format of start matches the format of freq. For example, if freq == "annual" and start == "2020-06", then what would be returned without my error handling would be all of 2020. As such, I'm forcing the user to be more exacting with their input choices.

Further, I am still planning to find a API data endpoint offering hourly granularity so I can add testing around that - just haven't gotten to it yet.

@leonawicz
Copy link
Member

That's a good point about the start and freq formats. There is nothing inherently wrong with imposing more formality, requiring matching format, in this area simply because the API itself is more loose in its requirement around this. But of course it's also fine to only reproduce the API's behavior. I could go either way.

You can always indicate in the documentation something in the details about how if both arguments are given and the one is a higher temporal resolution than the other, it's resolution is reduced to match that of freq and that this is the API behavior.

@leonawicz leonawicz merged commit 2139d22 into master Nov 16, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants