COVID-19 Open Data

This repository contains datasets of daily time-series data related to COVID-19 for 50+ countries around the world. The data is at the spatial resolution of states/provinces for most regions and at county/municipality resolution for Argentina, Brazil, Chile, Colombia, Czech Republic, Mexico, Netherlands, Peru, United Kingdom, and USA. All regions are assigned a unique location key, which resolves discrepancies between ISO / NUTS / FIPS codes, etc. The different aggregation levels are:

0: Country
1: Province, state, or local equivalent
2: Municipality, county, or local equivalent
3: Locality which may not follow strict hierarchical order, such as "city" or "nursing homes in X location"

There are multiple types of data:

Outcome data Y(i,t), such as cases, deaths, tests, for regions i and time t
Static covariate data X(i), such as population size, GDP, latitude/ longitude
Dynamic covariate data X(i,t), such as mobility, weather
Dynamic interventional data A(i,t), such as government lockdowns

The data is drawn from multiple sources, as listed below, and stored in separate csv / json files, which can be easily merged due to the use of consistent geographic (and temporal) keys.

Table	Keys¹	Content	URL	Source²
Main	`[key][date]`	Flat table with records from (almost) all other tables joined by `date` and/or `key`	main.csv	All tables below
Index	`[key]`	Various names and codes, useful for joining with other datasets	index.csv, index.json	Wikidata, DataCommons
Demographics	`[key]`	Various (current³) population statistics	demographics.csv, demographics.json	Wikidata, DataCommons
Economy	`[key]`	Various (current³) economic indicators	economy.csv, economy.json	Wikidata, DataCommons
Epidemiology	`[key][date]`	COVID-19 cases, deaths, recoveries and tests	epidemiology.csv, epidemiology.json	Various²
Geography	`[key]`	Geographical information about the region	geography.csv, geography.json	Wikidata
Health	`[key]`	Health indicators for the region	health.csv, health.json	Wikidata, WorldBank
Hospitalizations	`[key][date]`	Information related to patients of COVID-19 and hospitals	hospitalizations.csv, hospitalizations.json	Various²
Mobility	`[key][date]`	Various metrics related to the movement of people	mobility.csv, mobility.json	Google
Government Response	`[key][date]`	Government interventions and their relative stringency	oxford-government-response.csv, oxford-government-response.json	University of Oxford
Weather	`[key][date]`	Dated meteorological information for each region	weather.csv, weather.json	NOAA
WorldBank	`[key]`	Latest record for each indicator from WorldBank for all reporting countries	worldbank.csv, worldbank.json	WorldBank
WorldPop	`[key]`	Demographics data extracted from WorldPop	worldpop.csv, worldpop.json	WorldPop
By Age	`[key][date]`	Epidemiology and hospitalizations data stratified by age	by-age.csv, by-age.json	Various²
By Sex	`[key][date]`	Epidemiology and hospitalizations data stratified by sex	by-sex.csv, by-sex.json	Various²

¹ key is a unique string for the specific geographical region built from a combination of codes such as ISO 3166, NUTS, FIPS and other local equivalents.
² Refer to the data sources for specifics about each data source and the associated terms of use.
³ Datasets without a date column contain the most recently reported information for each datapoint to date.

For more information about how to use these files see the section about using the data, and for more details about each dataset see the section about understanding the data.

Why another dataset?

There are many other public COVID-19 datasets. However, we believe this dataset is unique in the way that it merges multiple global sources, at a fine spatial resolution, using a consistent set of region keys. We hope this will make it easier for researchers to use. We are also very transparent about the data sources, and the code for ingesting and merging the data is easy to understand and modify.

Explore the data


A simple visualization tool was built to explore the Open COVID-19 datasets, the Open COVID-19 Explorer:	If you want to see interactive charts with a unique UX, don't miss what @Mahks built using the Open COVID-19 dataset:	You can also check out the great work of @quixote79, a MapBox-powered interactive map site:
Experience clean, clear graphs with smooth animations thanks to the work of @jmullo:	Become an armchair epidemiologist with the COVID-19 timeline simulation tool built by @LeviticusMB:	Whether you want an interactive map, compare stats or look at charts, @saadmas has you covered with a COVID-19 Daily Tracking site:
Compare per-million data at Omnimodel thanks to @OmarJay1:	Look at responsive, comprehensive charts thanks to the work of @davidjohnstone:	Reproduction Live lets you track COVID-19 outbreaks in your region and visualise the spread of the virus over time:

Use the data

The data is available as CSV and JSON files, which are published in Github Pages so they can be served directly to Javascript applications without the need of a proxy to set the correct headers for CORS and content type. Even if you only want the CSV files, using the URL served by Github Pages is preferred in order to avoid caching issues and potential, future breaking changes.

For the purpose of making the data as easy to use as possible, there is a main table which contains the columns of all other tables joined by key and date. However, performance-wise, it may be better to download the data separately and join the tables locally.

Each region has its own version of the main table, so you can pull all the data for a specific region using a single endpoint, the URL for each region is:

Data for key in CSV format: https://storage.googleapis.com/covid19-open-data/v2/${key}/main.csv
Data for key in JSON format: https://storage.googleapis.com/covid19-open-data/v2/${key}/main.json

Each table has a full version as well as subsets with only the last 30, 14, 7 and 1 days of data. The full version is accessible at the URL described in the table above. The subsets can be found by appending the number of days to the path. For example, the subsets of the main table are available at the following locations:

Full version: https://storage.googleapis.com/covid19-open-data/v2/main.csv
Latest: https://storage.googleapis.com/covid19-open-data/v2/latest/main.csv

Note that the latest version contains the last non-null record for each key, whereas all others contain the last N days of data (all of which could be null for some keys). All of the above listed tables have a corresponding JSON version; simply replace csv with json in the link.

If you are trying to use this data alongside your own datasets, then you can use the Index table to get access to the ISO 3166 / NUTS / FIPS code, although administrative subdivisions are not consistent among all reporting regions. For example, for the intra-country reporting, some EU countries use NUTS2, others NUTS3 and many ISO 3166-2 codes.

You can find several examples in the examples subfolder with code showcasing how to load and analyze the data for several programming environments. If you want the short version, here are a few snippets to get started.

BigQuery

This dataset is part of the BigQuery Public Datasets Program, so you may use BigQuery to run SQL queries directly from the online query editor.

Google Colab

You can use Google Colab if you want to run your analysis without having to install anything in your computer, simply go to this URL: https://colab.research.google.com/github/open-covid-19/data.

Google Sheets

You can import the data directly into Google Sheets, as long as you stay within the size limits. For instance, the following formula loads the latest epidemiology data into the current sheet:

=IMPORTDATA("https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv")

Note that Google Sheets has a size limitation, so only data from the latest subfolder can be imported automatically. To work around that, simply download the file and import it via the File menu.

R

If you prefer R, then this is all you need to do to load the epidemiology data:

data <- read.csv("https://storage.googleapis.com/covid19-open-data/v2/main.csv")

Python

In Python, you need to have the package pandas installed to get started:

import pandas
data = pandas.read_csv("https://storage.googleapis.com/covid19-open-data/v2/main.csv")

jQuery

Loading the JSON file using jQuery can be done directly from the output folder, this code snippet loads the epidemiology table into the data variable:

$.getJSON("https://storage.googleapis.com/covid19-open-data/v2/epidemiology.json", data => { ... }

Powershell

You can also use Powershell to get the latest data for a country directly from the command line, for example to query the latest epidemiology data for Australia:

Invoke-WebRequest 'https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv' | ConvertFrom-Csv | `
    where key -eq 'AU' | select date,total_confirmed,total_deceased,total_recovered

Understand the data

Make sure that you are using the URL linked at the table above and not the raw GitHub file, the latter is subject to change at any moment in non-compatible ways, and due to the configuration of GitHub's raw file server you may run into potential caching issues.

Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported.

Main

Flat table with records from all other tables joined by key and date. See below for information about all the different tables and columns. Tables not included in the main table are:

WorldBank: A subset of individual indicators are added as columns to other tables instead; for example, the health table.
WorldPop: Age and sex demographics breakdowns are normalized and added to the demographics table instead.
By Age: Age breakdowns of epidemiology and hospitalization data are normalized and added to the by-age-normalized table instead (TABLE TO BE ADDED).
By Sex: Sex breakdowns of epidemiology and hospitalization data are normalized and added to the by-sex-normalized table instead (TABLE TO BE ADDED).

Index

Non-temporal data related to countries and regions. It includes keys, codes and names for each region, which is helpful for displaying purposes or when merging with other data:

Name	Type	Description	Example
key	`string`	Unique string identifying the region	US_CA_06001
wikidata	`string`	Wikidata ID corresponding to this key	Q107146
datacommons	`string`	DataCommons ID corresponding to this key	geoId/06001
country_code	`string`	ISO 3166-1 alphanumeric 2-letter code of the country	US
country_name	`string`	American English name of the country, subject to change	United States of America
subregion1_code	`string`	(Optional) ISO 3166-2 or NUTS 2/3 code of the subregion	CA
subregion1_name	`string`	(Optional) American English name of the subregion, subject to change	California
subregion2_code	`string`	(Optional) FIPS code of the county (or local equivalent)	06001
subregion2_name	`string`	(Optional) American English name of the county (or local equivalent), subject to change	Alameda County
3166-1-alpha-2	`string`	ISO 3166-1 alphanumeric 2-letter code of the country	US
3166-1-alpha-3	`string`	ISO 3166-1 alphanumeric 3-letter code of the country	USA
aggregation_level	`integer` `[0-2]`	Level at which data is aggregated, i.e. country, state/province or county level	2

Demographics

Information related to the population demographics for each region:

Name	Type	Description	Example
key	`string`	Unique string identifying the region	KR
population	`integer`	Total count of humans	51606633
population_male	`integer`	Total count of males	25846211
population_female	`integer`	Total count of females	25760422
rural_population	`integer`	Population in a rural area	9568386
urban_population	`integer`	Population in an urban area	42038247
largest_city_population	`integer`	Population in the largest city of the region	9963497
clustered_population	`integer`	Population in urban agglomerations of more than 1 million	25893097
population_density	`double` `[persons per squared kilometer]`	Population per squared kilometer of land area	529.3585
human_development_index	`double` `[0-1]`	Composite index of life expectancy, education, and per capita income indicators	0.903
population_age_`${lower}`_`${upper}`	`integer`	Estimated population between the ages of `${lower}` and `${upper}`, both inclusive	42038247

Economy

Information related to the economic development for each region:

Name	Name	Description	Example
key	`string`	Unique string identifying the region	CN_HB
gdp	`integer` `[USD]`	Gross domestic product; monetary value of all finished goods and services	24450604878
gdp_per_capita	`integer` `[USD]`	Gross domestic product divided by total population	1148
human_capital_index	`double` `[0-1]`	Mobilization of the economic and professional potential of citizens	0.765

Epidemiology

Information related to the COVID-19 infections for each date-region pair:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	CN_HB
new_confirmed¹	`integer`	Count of new cases confirmed after positive test on this date	34
new_deceased¹	`integer`	Count of new deaths from a positive COVID-19 case on this date	2
new_recovered¹	`integer`	Count of new recoveries from a positive COVID-19 case on this date	13
new_tested²	`integer`	Count of new COVID-19 tests performed on this date	13
total_confirmed³	`integer`	Cumulative sum of cases confirmed after positive test to date	6447
total_deceased³	`integer`	Cumulative sum of deaths from a positive COVID-19 case to date	133
total_tested^2,3	`integer`	Cumulative sum of COVID-19 tests performed to date	133

¹Values can be negative, typically indicating a correction or an adjustment in the way they were measured. For example, a case might have been incorrectly flagged as recovered one date so it will be subtracted from the following date.
²When the reporting authority makes a distinction between PCR and antibody testing, only PCR tests are reported here.
³Total count will not always amount to the sum of daily counts, because many authorities make changes to criteria for counting cases, but not always make adjustments to the data. There is also potential missing data. All of that makes the total counts drift away from the sum of all daily counts over time, which is why the cumulative values, if reported, are kept in a separate column.

Geography

Information related to the geography for each region:

Name	Type	Description	Example
key	`string`	Unique string identifying the region	CN_HB
latitude	`double`	Floating point representing the geographic coordinate	30.9756
longitude	`double`	Floating point representing the geographic coordinate	112.2707
elevation	`integer` `[meters]`	Elevation above the sea level	875
area	`integer` [squared kilometers]	Area encompassing this region	3729
rural_area	`integer` [squared kilometers]	Area encompassing rural land in this region	3729
urban_area	`integer` [squared kilometers]	Area encompassing urban land this region	3729

Health

Health related indicators for each region:

Name	Type	Description	Example
key	`string`	Unique string identifying the region	BN
life_expectancy	`double` `[years]`	Average years that an individual is expected to live	75.722
smoking_prevalence	`double` `[%]`	Percentage of smokers in population	16.9
diabetes_prevalence	`double` `[%]`	Percentage of persons with diabetes in population	13.3
infant_mortality_rate	`double`	Infant mortality rate (per 1,000 live births)	9.8
adult_male_mortality_rate	`double`	Mortality rate, adult, male (per 1,000 male adults)	143.719
adult_female_mortality_rate	`double`	Mortality rate, adult, female (per 1,000 male adults)	98.803
pollution_mortality_rate	`double`	Mortality rate attributed to household and ambient air pollution, age-standardized (per 100,000 population)	13.3
comorbidity_mortality_rate	`double` `[%]`	Mortality from cardiovascular disease, cancer, diabetes or cardiorespiratory disease between exact ages 30 and 70	16.6
hospital_beds	`double`	Hospital beds (per 1,000 people)	2.7
nurses	`double`	Nurses and midwives (per 1,000 people)	5.8974
physicians	`double`	Physicians (per 1,000 people)	1.609
health_expenditure	`double` `[USD]`	Health expenditure per capita	671.4115
out_of_pocket_health_expenditure	`double` `[USD]`	Out-of-pocket health expenditure per capita	34.756348

Note that the majority of the health indicators are only available at the country level.

Hospitalizations

Information related to patients of COVID-19 and hospitals:

Name	Type	Description	Example
key	`string`	Unique string identifying the region	CN_HB
new_hospitalized*	`integer`	Count of new cases hospitalized after positive test on this date	34
new_intensive_care*	`integer`	Count of new cases admitted into ICU after a positive COVID-19 test on this date	2
new_ventilator*	`integer`	Count of new COVID-19 positive cases which require a ventilator on this date	13
total_hospitalized**	`integer`	Cumulative sum of cases hospitalized after positive test to date	6447
total_intensive_care**	`integer`	Cumulative sum of cases admitted into ICU after a positive COVID-19 test to date	133
total_ventilator**	`integer`	Cumulative sum of COVID-19 positive cases which require a ventilator to date	133
current_hospitalized**	`integer`	Count of current (active) cases hospitalized after positive test to date	34
current_intensive_care**	`integer`	Count of current (active) cases admitted into ICU after a positive COVID-19 test to date	2
current_ventilator**	`integer`	Count of current (active) COVID-19 positive cases which require a ventilator to date	13

*Values can be negative, typically indicating a correction or an adjustment in the way they were measured. For example, a case might have been incorrectly flagged as recovered one date so it will be subtracted from the following date.

**Total count will not always amount to the sum of daily counts, because many authorities make changes to criteria for counting cases, but not always make adjustments to the data. There is also potential missing data. All of that makes the total counts drift away from the sum of all daily counts over time, which is why the cumulative values, if reported, are kept in a separate column.

Mobility

Google's Mobility Reports are presented in CSV form joined with our known location keys as mobility.csv with the following columns:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	US_CA
mobility_transit_stations	`double` `[%]`	Percentage change in visits to transit station locations compared to baseline	-15
mobility_retail_and_recreation	`double` `[%]`	Percentage change in visits to retail and recreation locations compared to baseline	-15
mobility_grocery_and_pharmacy	`double` `[%]`	Percentage change in visits to grocery and pharmacy locations compared to baseline	-15
mobility_parks	`double` `[%]`	Percentage change in visits to park locations compared to baseline	-15
mobility_residential	`double` `[%]`	Percentage change in visits to residential locations compared to baseline	-15
mobility_workplaces	`double` `[%]`	Percentage change in visits to workplace locations compared to baseline	-15

These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places.

Link to data: https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv
Help Center: https://support.google.com/covid19-mobility
Terms: In order to download or use the data or reports, you must agree to the Google Terms of Service

Government Response

Summary of a government's response to the events, including a stringency index, collected from University of Oxford:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	US_CA
school_closing	`integer` `[0-3]`	Schools are closed	2
workplace_closing	`integer` `[0-3]`	Workplaces are closed	2
cancel_public_events	`integer` `[0-3]`	Public events have been cancelled	2
restrictions_on_gatherings	`integer` `[0-3]`	Gatherings of non-household members are restricted	2
public_transport_closing	`integer` `[0-3]`	Public transport is not operational	0
stay_at_home_requirements	`integer` `[0-3]`	Self-quarantine at home is mandated for everyone	0
restrictions_on_internal_movement	`integer` `[0-3]`	Travel within country is restricted	1
international_travel_controls	`integer` `[0-3]`	International travel is restricted	3
income_support	`integer` `[USD]`	Value of fiscal stimuli, including spending or tax cuts	20449287023
debt_relief	`integer` `[0-3]`	Debt/contract relief for households	0
fiscal_measures	`integer` `[USD]`	Value of fiscal stimuli, including spending or tax cuts	20449287023
international_support	`integer` `[USD]`	Giving international support to other countries	274000000
public_information_campaigns	`integer` `[0-2]`	Government has launched public information campaigns	1
testing_policy	`integer` `[0-3]`	Country-wide COVID-19 testing policy	1
contact_tracing	`integer` `[0-2]`	Country-wide contact tracing policy	1
emergency_investment_in_healthcare	`integer` `[USD]`	Emergency funding allocated to healthcare	500000
investment_in_vaccines	`integer` `[USD]`	Emergency funding allocated to vaccine research	100000
stringency_index	`double` `[0-100]`	Overall stringency index	71.43

For more information about each field and how the overall stringency index is computed, see the Oxford COVID-19 government response tracker.

Weather

Daily weather information from nearest station reported by NOAA:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	US_CA
noaa_station*	`string`	Identifier for the weather station	USC00206080
noaa_distance*	`double` `[kilometers]`	Distance between the location coordinates and the weather station	28.693
average_temperature	`double` `[celsius]`	Recorded hourly average temperature	11.2
minimum_temperature	`double` `[celsius]`	Recorded hourly minimum temperature	1.7
maximum_temperature	`double` `[celsius]`	Recorded hourly maximum temperature	19.4
rainfall	`double` `[millimeters]`	Rainfall during the entire day	51.0
snowfall	`double` `[millimeters]`	Snowfall during the entire day	0.0

*The reported weather station refers to the nearest station which provides temperature measurements, but rainfall and snowfall may come from a different nearby weather station. In all cases, only weather stations which are at most 300km from the location coordinates are considered.

WorldBank

Most recent value for each indicator of the WorldBank Database.

Name	Type	Description	Example
key	`string`	Unique string identifying the region	ES
`${indicator}`	`double`	Value of the indicator corresponding to this column, column name is indicator code	0

Refer to the WorldBank documentation for more details, or refer to the worldbank_indicators.csv file for a short description of each indicator. Each column uses the indicator code as its name, and the rows are filled with the values for the corresponding key.

Note that WorldBank data is only available at the country level and it's not included in the main table. If no values are reported by WorldBank for the country since 2015, the row value will be null.

WorldPop

Demographics data extracted from WorldPop, estimating total number of people per region broken down by age and sex groupings.

Name	Type	Description	Example
key	`string`	Unique string identifying the region	ES
`${sex}`_`${age_bin}`	`double`	Total number of people categorized as `${sex}` (`m` or `f`) in age bin `${age_bin}`	1334716

Refer to the WorldPop documentation for more details. This data is normalized into buckets that are consistent with other tables and added into the demographics table; it is kept as a separate table to preserve access to the original data without any modification beyond aggregation by regional unit.

By Age

Epidemiology and hospitalizations data stratified by age:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	FR
`${statistic}`_age_bin_`${index}`	`integer`	Value of `${statistic}` for age bin `${index}`	139
age_bin_`${index}`	`integer`	Range for the age values inside of bin `${index}`, both ends inclusive	30-39

Values in this table are stratified versions of the columns available in the epidemiology and hospitalizatons tables. Each row contains up to 10 distinct bins, for example: {new_deceased_age_00: 1, new_deceased_age_01: 45, ... , new_deceased_age_09: 32}.

Each row may have different bins, depending on the data source. This table tries to capture the raw data with as much fidelity as possible up to 10 bins. The range of each bin is encoded into the age_bin_${index} variable, for example: {age_bin_00: 0-9, age_bin_01: 10-19, age_bin_02: 20-29, ... , age_bin_09: 90-}.

Several things worth noting about this table:

This table contains very sparse data, with very few combinations of regions and variables available.
Records without a known age bin are discarded, so the sum of all ages may not necessary amount to the variable from the corresponding table.
The upper and lower range of the range are inclusive values. For example, range 0-9 includes individuals with age zero up to (and including) 9.
A row may have less than 10 bins, but never more than 10. For example: {age_bin_00: 0-49, age_bin_01: 50-, age_bin_02: null, ...}

By Sex

Epidemiology and hospitalizations data stratified by sex:

Name	Type	Description	Example
date	`string`	ISO 8601 date (YYYY-MM-DD) of the datapoint	2020-03-30
key	`string`	Unique string identifying the region	FR
`${statistic}_sex_male`	`integer`	Value of `${statistic}` for male individuals	87
`${statistic}_sex_female`	`integer`	Value of `${statistic}` for female individuals	68

Values in this table are stratified versions of the columns available in the epidemiology and hospitalizatons tables. Each row contains each variable with either _male or _female suffix: {new_deceased_male: 45, new_deceased_female: 32, new_tested_male: 45, new_tested_female: 32, ...}.

Several things worth noting about this table:

This table contains very sparse data, with very few combinations of regions and variables available.
Records without a known sex are discarded, so the sum of all ages may not necessary amount to the variable from the corresponding table.

Notes about the data

For countries where both country-level and subregion-level data is available, the entry which has a null value for the subregion level columns in the index table indicates upper-level aggregation. For example, if a data point has values {country_code: US, subregion1_code: CA, subregion2_code: null, ...} then that record will have data aggregated at the subregion1 (i.e. state/province) level. If subregion1_codewere null, then it would be data aggregated at the country level.

Another way to tell the level of aggregation is the aggregation_level of the index table, see the schema documentation for more details about how to interpret it.

Please note that, sometimes, the country-level data and the region-level data come from different sources so adding up all region-level values may not equal exactly to the reported country-level value. See the data loading tutorial for more information.

There is also a notices.csv file which is manually updated with quirks about the data. The goal is to be able to query by key and date, to get a list of applicable notices to the requested subset.

Licensing

The output data files are published under the CC BY-SA license. All data is subject to the terms of agreement individual to each data source, refer to the sources of data table for more details. All other code and assets are published under the Apache License 2.0.

Sources of data

All data in this repository is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health.

Data	Source	License and Terms of Use
Metadata	Wikipedia	CC BY-SA
Demographics	Wikidata	CC0
Demographics	DataCommons	Attribution required
Demographics	WorldBank	CC BY 4.0
Demographics	WorldPop	CC BY 4.0
Economy	Wikidata	CC0
Economy	DataCommons	Attribution required
Economy	WorldBank	CC BY 4.0
Geography	Wikidata	CC0
Geography	WorldBank	CC BY 4.0
Health	Wikidata	CC0
Health	WorldBank	CC BY 4.0
Weather	NOAA	Attribution required, non-commercial use
Google Mobility data	https://www.google.com/covid19/mobility/	Google Terms of Service
Government response data	Oxford COVID-19 government response tracker	CC BY 4.0
Country-level data	ECDC	Attribution required
Country-level data	Our World in Data	CC BY 4.0
Afghanistan	HDX	CC BY-SA
Argentina	Datos Argentina	Public domain
Australia	https://covid-19-au.com/	Attribution required, educational and academic research purposes
Austria	COVID19 EU Data	MIT
Bangladesh	HDX	CC BY-SA
Bolivia	Latin America Covid-19 Data Repository	CC BY-SA
Brazil	Brazil Ministério da Saúde	Creative Commons Atribuição
Brazil (Rio de Janeiro)	http://www.data.rio/	Dados abertos
Brazil (Ceará)	https://saude.ce.gov.br	Dados abertos
Canada	Department of Health Canada	Attribution required
Chile	Ministerio de Ciencia de Chile	Terms of use
China	DXY COVID-19 dataset	MIT
Colombia	Datos Abiertos Colombia	Attribution required
Costa Rica	Latin America Covid-19 Data Repository	CC BY-SA
Cuba	Latin America Covid-19 Data Repository	CC BY-SA
Czech Republic	Ministry of Health of the Czech Republic	Open Data
Democratic Republic of Congo	HDX	CC BY-SA
Ecuador	Latin America Covid-19 Data Repository	CC BY-SA
El Salvador	Latin America Covid-19 Data Repository	CC BY-SA
Finland	Finnish institute for health and welfare	CC BY 4.0
France	data.gouv.fr	Open License 2.0
Germany	https://github.com/jgehrcke/covid-19-germany-gae	MIT
Guatemala	Latin America Covid-19 Data Repository	CC BY-SA
Haiti	HDX	CC BY-SA
Honduras	Latin America Covid-19 Data Repository	CC BY-SA
India	Wikipedia	CC BY-SA
India	Covid 19 India Organisation	CC BY-SA
Indonesia	https://catchmeup.id/covid-19	Permission required
Italy	Italy's Department of Civil Protection	CC BY 4.0
Iraq	HDX	CC BY-SA
Japan	https://github.com/swsoyee/2019-ncov-japan	MIT
Libya	HDX	CC BY-SA
Luxembourg	data.public.lu	CC0
Malaysia	Wikipedia	CC BY-SA
Mexico	https://github.com/mexicovid19/Mexico-datos	MIT
Mexico	Secretaría de Salud Mexico	Attribution Required
Netherlands	RIVM	Public Domain
Nicaragua	Latin America Covid-19 Data Repository	CC BY-SA
Norway	COVID19 EU Data	MIT
Norway	COVID19 EU Data	MIT
Pakistan	Wikipedia	CC BY-SA
Panama	Latin America Covid-19 Data Repository	CC BY-SA
Paraguay	Latin America Covid-19 Data Repository	CC BY-SA
Peru	Datos Abiertos Peru	ODC BY
Philippines	Philippines Department of Health	Attribution required
Poland	COVID19 EU Data	MIT
Portugal	COVID-19: Portugal	MIT
Romania	https://github.com/adrianp/covid19romania	CC0
Russia	https://стопкоронавирус.рф	CC BY-SA
Slovenia	https://www.gov.si	CC BY-SA
South Africa	Data Science for Social Impact research group, the University of Pretoria	CC BY-SA
South Korea	Wikipedia	CC BY-SA
Spain	Government Authority	Attribution required
Sudan	HDX	CC BY-SA
Sweden	Public Health Agency of Sweden
Switzerland	OpenZH data	CC 4.0
United Kingdom	https://github.com/tomwhite/covid-19-uk-data	The Unlicense
Uruguay	Latin America Covid-19 Data Repository	CC BY-SA
USA	NYT COVID Dataset	Attribution required, non-commercial use
USA	COVID Tracking Project	CC BY-NC 4.0
USA (New York)	New York City Health Department	Public Domain
USA (Texas)	Texas Department of State Health Services	Attribution required, non-commercial use
Venezuela	HDX	CC BY-SA
Venezuela	Latin America Covid-19 Data Repository	CC BY-SA

Running the data extraction pipeline

To update the contents of the output folder, first install the dependencies:

pip install -r requirements.txt

Then run the following script from the source folder to update all datasets:

cd src
python update.py

See the source documentation for more technical details.

Contribute

If you spot an error in the data, or there's a country you would like to include, the best way to contribute to this project is by helping maintain the data on the relevant Wikipedia article. Not only can that data be parsed automatically by this project, but it will also help inform millions of others that receive their information from Wikipedia.

For code contributions, take a look at the source directory for more information.

If you do something cool with the data (e.g., visualization or analysis), please let us know!

Acknowledgments

The following persons have made significant contributions to this project:

Oscar Wahltinez
Kevin Murphy
Michael Brenner
Matt Lee
Anthony Erlinger
Mayank Daswani
Pranali Yawalkar

Recommended citation

Please use the following when citing this project as a source of data:

@article{Wahltinez2020,
  author = "Oscar Wahltinez and Kevin Murphy and Michael Brenner and Matt Lee and Anthony Erlinger and Mayank Daswani and Pranali Yawalkar",
  year = 2020,
  title = "COVID-Open-Data: curating a fine-grained, global-scale COVID-19 data repository",
  note = "Work in progress",
  url = {https://github.com/open-covid-19/data/blob/main/README.md}
}

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
output		output
src		src
.gitignore		.gitignore
CC-BY-SA		CC-BY-SA
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

themonk911/data

Folders and files

Latest commit

History

Repository files navigation