Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended alternative? #28

Open
iandoug opened this issue Dec 30, 2020 · 8 comments
Open

Recommended alternative? #28

iandoug opened this issue Dec 30, 2020 · 8 comments

Comments

@iandoug
Copy link

iandoug commented Dec 30, 2020

Hi

Anyone able to recommend an alternative data feed?

Thanks, Ian

@iandoug iandoug changed the title Recommended alternative& Recommended alternative? Dec 30, 2020
@rvneil
Copy link

rvneil commented Dec 31, 2020

https://covid.ourworldindata.org/data/owid-covid-data.csv
has everything in one file. I guess I'm going to have to switch to this source, as this repo seems to be completely dead.

https://github.com/CSSEGISandData has some data split out by province/state (for Canada, Australia, China, etc), which I didn't want. I suppose I could code something to come up with totals for those countries.

@iandoug
Copy link
Author

iandoug commented Jan 1, 2021

I think I'm going to use this source..
https://github.com/datasets/covid-19

in particular, probably https://datahub.io/core/covid-19/r/countries-aggregated.csv since I don't need province/state data..
Link on https://datahub.io/core/covid-19

I had issues with John Hopkins data before which is why I switched to this repo, my suggestion claims to have cleaned up the messy bits in JH data.

Need to update my loader program and deal with possible country name issues today.

Cheers, Ian

@kallewoof
Copy link

FWIW, I made a (very simple) converter https://github.com/kallewoof/covid19-csv-converter between the old format (John Hopkins IIRC) and this one, and I will probably add another mode for the covid.ourworldindata.org variant soon, since this one also seems to have gone under..

@iandoug
Copy link
Author

iandoug commented Jan 1, 2021

I think I'm going to use this source..
https://github.com/datasets/covid-19

Looks like even after their clean-up, there are still strange bumps in the data. Guess I will just have to live with it.

Cheers, Ian

@iandoug iandoug closed this as completed Jan 1, 2021
@kallewoof
Copy link

@iandoug I'm not super happy with the owid dataset, so I am probably going to switch to the datasets one. Could you work around the strange bumps by using this dataset and append only the missing data?

@iandoug
Copy link
Author

iandoug commented Jan 2, 2021

@iandoug I'm not super happy with the owid dataset, so I am probably going to switch to the datasets one. Could you work around the strange bumps by using this dataset and append only the missing data?

Mmmnnn.. that's an idea I didn't think of.

I'm a bit reluctant though, because THIS repo used end-of-day around midnight GMT (or maybe 2am, never could figure it out, I fetched at 4am GMT) and datasets/John Hopkins uses ((I think) midnight Eastern Standard time as their cut-off point. So "cases on 2020-xx-yy" is going to differ between the two sets, making a merge tricky.

I see "datasets" has not updated since yesterday, and several closed tickets on their repo about it NOT updating in the past, so that's a bit worrying in terms of reliability. I switched from JH data long ago because they had so many issues and kept changing their file layouts etc.

Regarding the bumps, given the number of sites using datasets data, you'd think they would have sorted it out by now. :-(

Let me ponder your idea a bit more.

I had to fix these country names between this repo, datasets, and my names, your fix list may be similar or not.

     "Korea, South"  :  South Korea  (annoying, that one)
    Burma  :   Myanmar
    Cabo Verde  :  Cape Verde
    China  :  Mainland China
     Congo (Brazzaville)  :  Congo
     Congo (Kinshasa)  :  DRC
     Cote d'Ivoire  :  Cote d’Ivoire
     Eswatini  :  eSwatini
     Holy See  :  Vatican
     Kazakhstan  :  Kazakstan
     Kyrgyzstan  :  Kyrgystan
     Taiwan*  :  Taiwan
     US  :  United States
     West Bank and Gaza  :  Palestine

Cheers, Ian

@iandoug iandoug reopened this Jan 2, 2021
@kallewoof
Copy link

kallewoof commented Jan 9, 2021

Hi Ian,

Yeah, I think I followed your exact foot steps. It's still a rough proof of concept, but I have a tool to convert between these here: https://github.com/kallewoof/csvman

To get the github.com/datasets/covid-19.git data set into the ulklc format, clone the above, then:

g++ -O3 -std=c++11 parser/*.cpp *.cpp -o compile
./compile formulas/covid-19/gds.cmf GDSDIR/time-series-19-covid-combined.csv -f formulas/covid-19/ulklc.cmf result.csv

It's still a WIP but yeah, it supports fixing names and such manually. I've got part of the ones you listed but will add the others.

Also, not sure what you mean by the dates being 1 off -- are the actual dates in the file showing for one day earlier/later depending on the set??

Edit: I don't see several of the country name differences that you are listing (e.g. both this repo and the datasets/covid-19 one use "China", "Kyrgyzstan", "Kazakhstan", ...).

@iandoug
Copy link
Author

iandoug commented Jan 9, 2021

Also, not sure what you mean by the dates being 1 off -- are the actual dates in the file showing for one day earlier/later depending on the set??

It depends on when countries release their figures, and when the various sites process the numbers.
eg

site 1 : day ends at midnight GMT
site 2: day ends at 6am GMT

so figures released at 2am GMT is going to be on different days in each data set.

datasets data is a mess around 13-14 December because the Turkey figure is wrong. I did raise it as an issue but it looks like can't fix/won't fix because that's what they get from JH. Which is exactly the kind of reason I stopped using JH in the first place.

What also bothers me is the huge discrepancy between their numbers and WorldoMeter ... eg yesterday WoM 89,343,183, datasets 88,860,500, about half a million less.There used to be around 10-40k difference before which I accepted as end-of-day differences.

Still hoping ulklc will resurface.

Cheers, Ian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants