-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add verbose logging option to ProviderDataIngester
#4068
Add verbose logging option to ProviderDataIngester
#4068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
Thanks for you feedback @sarayourfriend! I pushed some commits and I'm running the linter locally now but I gotta go now. I will hopefully push the linting changes tonight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @nicoepp! I've left a few suggestions to clarify the (admittedly confusing 😄) distinction between terms like "record"/"batch item" in the logs. In addition, I think we should add at least one more verbose log to log the full result of get_batch_data
.
Since that is quite the verbose log, I've also asked that we not enable this logging by default locally, treating this more as an optional development tool. (CC @AetherUnbound to give you a chance to disagree with me if that's not what you had in mind! 😄)
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
I'm playing around with the idea to log the full result of I modified my code now to just log the first 5 and the last 5 dictionaries in each list returned by |
Absolutely perfect, even just the first five is probably fine for our needs! |
Here is the full log file (from the screenshot) for better reference: |
Ok. Thanks for the feedback. I'm only logging the first 5 dictionaries of the batch for now. I also pushed a commit making the verbose logging always controlled by the Airflow variable. No more checking the "ENVIRONMENT" variable while deciding to log or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just a few very small comments and then we should ship this 💯
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
I've also just noticed the catalog test failures -- that's because we are mocking all the results of MockVariable.get.side_effect = [
20, # ingestion_limit
{}, # skipped_ingestion_errors
+ [], # should_verbose_log
] |
Based on the low urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was ready for review 6 day(s) ago. PRs labelled with low urgency are expected to be reviewed within 5 weekday(s)2. @nicoepp, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nico, I'm converting this to a draft. Please let us know when this is ready for review. Thanks!
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
catalog/dags/providers/provider_api_scripts/provider_data_ingester.py
Outdated
Show resolved
Hide resolved
To log for a certain DAG set SHOULD_VERBOSE_LOG to a list containing that dag id e.g. ["smk_workflow"]
Co-authored-by: sarayourfriend <[email protected]>
Co-authored-by: sarayourfriend <[email protected]>
Co-authored-by: Krystle Salazar <[email protected]>
and make amount of lines logged configurable
306cef1
to
342edce
Compare
I've made a few changes to this PR but I think it's ready to undraft:
|
@AetherUnbound just a heads up some of the tests are failing. It looks like maybe in some cases we're expecting the logging to always be a single parsable JSON document? I haven't looked deeply at it, but just a guess based on the changes you described and the test logs. I can look more into this if you'd like help sorting this PR out, just let me know. |
Ahh, thanks for the heads up @sarayourfriend - I'll fix the tests tomorrow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I still think it would be nice for there to be some way of disabling this locally with the variables, but that's absolutely not a blocker. Something we can adjust later if it does end up becoming necessary :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
To log for a certain DAG set SHOULD_VERBOSE_LOG to a list containing that dag id.
For example, set the SHOULD_VERBOSE_LOG Airflow variable to: ["smk_workflow"]
Fixes
Fixes #1420 by @AetherUnbound
Description
Testing Instructions
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin