Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time_hour discrepancy with weather.rda #1

Closed
dmcmurchy opened this issue Jan 22, 2024 · 5 comments
Closed

time_hour discrepancy with weather.rda #1

dmcmurchy opened this issue Jan 22, 2024 · 5 comments
Assignees

Comments

@dmcmurchy
Copy link

I'm not sure if this is an issue with the original data source or in the version stored here, but the weather data time_hour column appears to be using a UTC offset (+8 for the pacific time zone).

Weather data

year month hour time_hour
2022 1 0 2022-01-01 08:00:00
2022 1 1 2022-01-01 09:00:00
2022 1 2 2022-01-01 10:00:00
2022 1 3 2022-01-01 11:00:00
2022 1 4 2022-01-01 12:00:00
2022 1 5 2022-01-01 13:00:00
2022 1 6 2022-01-01 14:00:00
2022 1 7 2022-01-01 15:00:00
2022 1 8 2022-01-01 16:00:00
2022 1 9 2022-01-01 17:00:00

Where the flights data is using local.

Flights data

year month hour time_hour
2022 1 23 2022-01-01 23:00:00
2022 1 22 2022-01-01 22:00:00
2022 1 23 2022-01-01 23:00:00
2022 1 23 2022-01-01 23:00:00
2022 1 23 2022-01-01 23:00:00
2022 1 23 2022-01-01 23:00:00
2022 1 0 2022-01-01 00:00:00
2022 1 22 2022-01-01 22:00:00
2022 1 23 2022-01-01 23:00:00
2022 1 23 2022-01-01 23:00:00

If you attempt to use the time_hour column to merge data between these two sources, you'd end up with some potentially misleading data.

Based on what I could find on the Bureau of Transportation website all time values are local.

@ismayc
Copy link
Member

ismayc commented Dec 26, 2024

Thanks for flagging! I'm investigating this further.

@ismayc
Copy link
Member

ismayc commented Dec 26, 2024

@dmcmurchy After digging further into this, it does seem that a tz argument is missing when the anyflights package that I used when it pulls down the weather data. For example, I'm currently on Mountain time which appears to be showing my values as an hour off where it should be (see below). I've created a PR into {anyflights} with what I think is a fix for this here.

Screenshot 2024-12-26 at 2 11 32 PM

@ismayc
Copy link
Member

ismayc commented Dec 26, 2024

Please reopen and tag me if you see the same issue again. Apologies for the very long delay!

@dmcmurchy
Copy link
Author

I had forgotten I submitted this issue, glad it wasn't something I was doing on my end. I had to go back and see what I was doing with this data and it was while I was recreating the CSV data used in the Analyzing Flight Delays and Cancellations project on DataCamp. I used the pyreadr python package and .rda files on your github to do so.

Do you know if you'll be updating the related data files you have on Github? If not, I'll just put a note into the related notebook.

Thanks

@ismayc
Copy link
Member

ismayc commented Dec 26, 2024

Ah, good deal! I'll plan to update the data whenever I get a response on my anyflights request. I can reach out to DataCamp on getting things updated there too if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants