Phase I: A Preliminary Analysis of the Covid-19 Pandemic,
the Phoenomena, and What it Means
The Week of Monday April 6, 2020
Dr. Joseph Robertson
Mato Ohitika Analytics LLC
Friday April 10, 2020
The Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).The disease was first identified in 2019 in Wuhan, Central China, and has since spread globally, resulting in the 2019–20 coronavirus pandemic. As this phenomenon reached the United States in January of 2020, the number of cases like other countries outside of China had begun to sky rocket.
The consortium of organizations concerned with first the outbreak, and then a full blown pandemic had been providing the public a live map portal for outbreak cases, data, and other demographic information. Mato Ohitika Analytics LLC has been following this pandemic with great interest and concern due to the complete disruption of our everyday lives, employment, and financial stability.
It is a rare occurrence to have a live dataset available for stakeholders to study and better understand the nature of the pandemic. This exploratory process represents Mato Ohitika Analytics LLC’s contribution to the data science related procedures possible in examining this type of data which is spatial and temporal in nature.
The steps of this exploratory analysis are the beginnings of broader study of the data structures and the information contained within that could be more useful to our citizens, tribal, local, and state governments. The current analysis is focused on the United States at the county level since the virus is spreading so fast in this country, a comprehensive review in real time is paramount.
The Data Sovereignty Initiative created by Dr. Joseph Robertson of Mato Ohitika Analytics is a framework to leverage data as an act of sovereignty whether it is for American Indian tribes or concerned citizens of our country. Citizen science is the conceptual concept that drives this analysis: to teach the public about what we know and don’t know about the Corona virus, how it is spreading, and possible predictions of what might happen next. Data sovereignty is the act of democratization of data for the community’s benefit.
Phase I of this Analysis will Focus on:
Exploratory Analysis Synopsis
This preliminary analysis is focused on the time series data from January 22, 2020 to March 30, 2020 and also from January 22, 2020 to April 6, 2020. As this analysis continues, the next steps will be to pivot and maintain more exploratory mapping of the data coming in daily.
The first step in this process is to examine the data using an exploratory data analysis or EDA, to assess the strengths and weaknesses of the data structure itself. So what does this mean in terms of the numbers and where is this data located?
There are two dataset sources that are maintaining daily data:
1. The datasets that are being updated are hosted by John Hopkins University Center for Systems Science and Engineering on GitHub:
This data is updated every day.
In terms of numbers, this data set has the following properties:
This has been divided into province/regions and by country. Thus, this data has spatial temporal properties that allow for the visualization of this data using GIS software and statistical modeling. On March 22, 2020, the datasets underwent a systematic change to create US county aggregate data and international aggregates separately to for easier study.
2. USA Facts is also maintaining a current database with the methodology: “This interactive feature aggregates data from the Centers for Disease Control and Prevention (CDC), state- and local-level public health agencies. County-level data is confirmed by referencing state and local agencies directly.” The data is updated at:
The USA Facts data has the following properties:
During the past month, the Covid-19 Pandemic has grown from approximately four thousand cases in the second week of March 2020 to nearly five hundred thousand cases as of April 10, 2020. This is a staggering number as this analysis has begun around that time. After working extensively with both data sources, the decision was made to focus primarily on the United States data on the county level. There is a reason for this decision:
The next release of this analysis will provide the preliminary results of this mapping and how this data was structured to accomodate the rapidly moving cases in the Covid-19 Pandemic.
Last Updated Friday April 17, 2020