Skip to content
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.

docs: understanding locations #554

Open
chapmanjacobd opened this issue Sep 21, 2022 · 1 comment
Open

docs: understanding locations #554

chapmanjacobd opened this issue Sep 21, 2022 · 1 comment

Comments

@chapmanjacobd
Copy link

chapmanjacobd commented Sep 21, 2022

Good day,

I'm trying to understand the context of place_id in various files. I know that place_id is just an identifier but I have encountered some puzzling things. Before I dive deep into my questions I will start light by asserting my beliefs about the data and how it is joined together. If there are incorrect beliefs please correct them:

  • google-research/open-covid-19-data was started before this repo
░░▒█ ~ (main|?1) [2|1]🦋 curl -sS https://api.github.com/repos/GoogleCloudPlatform/covid-19-open-data | grep created_at
  "created_at": "2020-07-23T23:43:51Z",
▓█░▒ ~ (main|?1) [0|0]🥞 curl -sS https://api.github.com/repos/google-research/open-covid-19-data | grep created_at
  "created_at": "2020-05-21T03:35:01Z",

How does mobility.csv relate to Global_Mobility_Report.csv ?

They seem to be talking about exactly the same thing...

But it seems like they are different data products entirely:

sqlite-utils memory Global_Mobility_Report.csv "select count(distinct place_id) from t1"
[{"count(distinct place_id)": 13249}]

sqlite-utils memory mobility.csv "select count(distinct location_key) from t1"
[{"count(distinct location_key)": 7351}]

as well as with aggregated.csv:

xsv select place_id aggregated.csv | sort --unique > aggregated_place_ids.csv
xsv select place_id Global_Mobility_Report.csv | sort --unique > Global_Mobility_Report_place_ids.csv

combine aggregated_place_ids.csv not Global_Mobility_Report_place_ids.csv  | count
14283
combine Global_Mobility_Report_place_ids.csv not aggregated_place_ids.csv  | count
5913
@chapmanjacobd
Copy link
Author

After reading through more code I think I get it now

https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/e2f6c1c0840fa1dc301ed798f6a624781b453c19/src/pipelines/mobility/google_mobility.py
https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/15e2bdd4b1c7a523a74f42b3ada89f3686dbc882/src/pipelines/mobility/config.yaml

"Global_Mobility_Report.csv" is a source dataset which joins with other data, via knowledge_graph.csv, to create "mobility.csv" and "aggregates.csv"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant