Storing data in a data in database #98

ExpDev07 · 2020-03-19T21:39:00Z

Right now the data is just stored in cache. Is it perhaps better to sync the data to an actual MySQL database? It would allow for fast querying.

Kilo59 · 2020-03-21T22:56:41Z

From what I have seen you guys talking about on other issues it seems like the format of the data is not totally stable.
If that's the case wouldn't it be easier to use a NoSQL database over a RDBMS?

ExpDev07 · 2020-03-21T22:58:22Z

Yes, that would probably be better. MongoDB or alike.

Kilo59 · 2020-03-22T00:18:18Z

Happy to help with this as well.

focus1691 · 2020-03-25T07:14:00Z

This would be good too because the API is now broken for me and others

ExpDev07 · 2020-03-25T07:17:21Z

@traderjosh can you explain in more detail how it’s broken for you? JHU (our data provider) made some pretty drastic changes lately which has caused the API’s outputs to change (of note the ID indexing and provinces no longer being present for USA).

focus1691 · 2020-03-25T07:36:12Z

@ExpDev07 it works now - it was that cors header field missing from the API. The recovered field is missing now it seems. Is that gone forever? And some countries don't have an id, you're right.

I think a database would be good for this in case of website changes. MongoDB would be good because the data can change unexpectedly.

ExpDev07 · 2020-03-25T07:39:01Z

For JHU, yes, the recovery stats is gone forever unless they decide to bring it back. I’m gonna see if I can find some other reputable sources that offer it and add it to the API.

I believe their reasoning was that no reputable sources were providing accurate recovery numbers, so they just decided to remove it.

focus1691 · 2020-03-25T07:40:32Z

Ok that's not an issue. Do you want me to help setup MongoDB? I can write the boilerplate and you can integrate an account for it.

ExpDev07 · 2020-03-25T07:42:36Z

It would be awesome if you can start drafting a PR for it. It needs to be compatible with our service provider system (see “app.services.locations” module). But I think MongoDB will be perfect for it. I’m thinking we periodically sync the DB with data retrieved from the data sources.

focus1691 · 2020-03-25T07:50:11Z

And it should be in Python? Not my speciality but I could do some research.

ExpDev07 · 2020-03-25T07:54:01Z

Yeah, feel like that would be best.

Kilo59 · 2020-03-25T11:53:53Z

And it should be in Python? Not my speciality but I could do some research.

https://api.mongodb.com/python/current/

Kilo59 · 2020-03-29T16:34:24Z

Perhaps we should use Mongo to store and update the normalized data?
We can keep the data in a format that is easy to translate into our various responses.

Collections for each source
Documents for the countries/locations
Background tasks to refresh the sources according to how frequently they are each updated.
Continue to use caching to minimize database reads.

cyenyxe · 2020-03-31T00:36:52Z

From what I have seen in the code so far, both the JHU and CSBS location models derive from the Location class, with CSBS having some additional fields. This kind of inheritance relationship should be easy to represent (and query) in an RDBMS.

Splitting the data in multiple collections in Mongo wouldn't really add much value unless you want to support historical records in multiple formats, which is horrible to query anyway if they aren't backwards compatible.

Kilo59 · 2020-04-25T18:59:20Z

We are deployed with gunicorn that runs multiple worker processes (4) each with their own caches.
@cyenyx once we are storing the data in any database (RDBMS, Mongo, Redis, etc) the workers can use it like a shared cache. Then they don't have to independently rebuild their own separate caches every hour.
It also adds resiliency when one of the dependent services timeouts, or encounters some kind of error, which is often the problem that causes this API to go down.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing data in a data in database #98

Storing data in a data in database #98

ExpDev07 commented Mar 19, 2020

Kilo59 commented Mar 21, 2020 •

edited

ExpDev07 commented Mar 21, 2020

Kilo59 commented Mar 22, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020 •

edited

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

Kilo59 commented Mar 25, 2020

Kilo59 commented Mar 29, 2020 •

edited

cyenyxe commented Mar 31, 2020 •

edited

Kilo59 commented Apr 25, 2020 •

edited

Storing data in a data in database #98

Storing data in a data in database #98

Comments

ExpDev07 commented Mar 19, 2020

Kilo59 commented Mar 21, 2020 • edited

ExpDev07 commented Mar 21, 2020

Kilo59 commented Mar 22, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020 • edited

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

focus1691 commented Mar 25, 2020

ExpDev07 commented Mar 25, 2020

Kilo59 commented Mar 25, 2020

Kilo59 commented Mar 29, 2020 • edited

cyenyxe commented Mar 31, 2020 • edited

Kilo59 commented Apr 25, 2020 • edited

Related reading

Kilo59 commented Mar 21, 2020 •

edited

ExpDev07 commented Mar 25, 2020 •

edited

Kilo59 commented Mar 29, 2020 •

edited

cyenyxe commented Mar 31, 2020 •

edited

Kilo59 commented Apr 25, 2020 •

edited