Covid 19

Mato Ohitika Analytics LLC

Joseph C. Robertson PhD

Data Science Solutions. Statistical Consulting. Machine Learning and Artificial Intelligence

Research & Development.

Mato Ohitika Analytics LLC

Corona Virus (Covid-19)

Analysis Landing Page

Quick Links

Covid-19 Analysis Phases


Phase I: A Preliminary Analysis of the Covid-19 Pandemic,

the Phoenomena, and What it Means


The Week of Monday April 6, 2020

Dr. Joseph Robertson

Mato Ohitika Analytics LLC

Friday April 10, 2020

Abstract

The Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).The disease was first identified in 2019 in Wuhan, Central China, and has since spread globally, resulting in the 2019–20 coronavirus pandemic. As this phenomenon reached the United States in January of 2020, the number of cases like other countries outside of China had begun to sky rocket.


The consortium of organizations concerned with first the outbreak, and then a full blown pandemic had been providing the public a live map portal for outbreak cases, data, and other demographic information. Mato Ohitika Analytics LLC has been following this pandemic with great interest and concern due to the complete disruption of our everyday lives, employment, and financial stability.


It is a rare occurrence to have a live dataset available for stakeholders to study and better understand the nature of the pandemic. This exploratory process represents Mato Ohitika Analytics LLC’s contribution to the data science related procedures possible in examining this type of data which is spatial and temporal in nature.


The steps of this exploratory analysis are the beginnings of broader study of the data structures and the information contained within that could be more useful to our citizens, tribal, local, and state governments.  The current analysis is focused on the United States at the county level since the virus is spreading so fast in this country, a comprehensive review in real time is paramount.


The Data Sovereignty Initiative created by Dr. Joseph Robertson of Mato Ohitika Analytics is a framework to leverage data as an act of sovereignty whether it is for American Indian tribes or concerned citizens of our country. Citizen science is the conceptual concept that drives this analysis: to teach the public about what we know and don’t know about the Corona virus, how it is spreading, and possible predictions of what might happen next. Data sovereignty is the act of democratization of data for the community’s benefit.


Phase I of this Analysis will Focus on:


  1. Understanding the nature of the covid data structure (i.e. we have count data of confirmed, deaths, and recovered)
  2. This data reported behaves in a spatial temporal nature and the counts could possible indicate a point process, though the lack of covariates in the publicly available data sets makes this problematic.
  3. How to map this over time using GIS (Not just mapping the cumulative counts everyday by country or region).


Exploratory Analysis Synopsis


This preliminary analysis is focused on the time series data from January 22, 2020 to March 30, 2020 and also from January 22, 2020 to April 6, 2020. As this analysis continues, the next steps will be to pivot and maintain more exploratory mapping of the data coming in daily.


The first step in this process is to examine the data using an exploratory data analysis or EDA, to assess the strengths and weaknesses of the data structure itself. So what does this mean in terms of the numbers and where is this data located?


Data Sources


There are two dataset sources that are maintaining daily data:


1. The datasets that are being updated are hosted by John Hopkins University Center for Systems Science and Engineering on GitHub: 


https://github.com/CSSEGISandData/COVID-19


This data is updated every day.


In terms of numbers, this data set has the following properties:


  • There are counts of the number of confirmed cases of the Covid-19 virus.
  • There are counts of the number of deaths related to the Covid-19 virus.
  • This data contains a GPS coordinate related to both counts. These location coordinates are in decimal degrees and has a latitude and longitude associated with each count recorded.
  • Each location has an ordered temporal component, or time series which contains a start and end date for the Covid-19 cases that have been recorded so far.
  • This data has all globally recorded data from all countries that have contracted the Covid-19 virus.


This has been divided into province/regions and by country. Thus, this data has spatial temporal properties that allow for the visualization of this data using GIS software and statistical modeling. On March 22, 2020, the datasets underwent a systematic change to create US county aggregate data and international aggregates separately to for easier study.


2. USA Facts is also maintaining a current database with the methodology: “This interactive feature aggregates data from the Centers for Disease Control and Prevention (CDC), state- and local-level public health agencies. County-level data is confirmed by referencing state and local agencies directly.” The data is updated at:


https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/


The USA Facts data has the following properties:


  • This data contains data for the United States at the county level only
  • There are counts of the number of confirmed cases of the Covid-19 virus.
  • There are counts of the number of deaths related to the Covid-19 virus.
  • Each location has an ordered temporal component, or time series which contains a start and end date for the Covid-19 cases that have been recorded so far.
  • This data set also contains unallocated state data that has not been structured in any way to the county level; and thus can only be part of a state wide aggregate
  • The Diamond Princess is also included in the California state data but is also not attributed to any particular county
  • The data is in the form of cumulative sums



Discussion (4/10/2020)


During the past month, the Covid-19 Pandemic has grown from approximately four thousand cases in the second week of March 2020 to nearly five hundred thousand cases as of April 10, 2020. This is a staggering number as this analysis has begun around that time. After working extensively with both data sources, the decision was made to focus primarily on the United States data on the county level. There is a reason for this decision:


  1. The data is count data with a spatial and temporal (time) attributes
  2. We need a carefully constructed time series of what is happening on the county level.
  3. There needs to be some research done on utilizing a space time cube to examine the geographic trends of the spread of the virus.
  4. An emerging hot spot analysis in real time can show the confirmed cases and deaths as the spread of this virus is moving state to state rapidly.
  5. These analyses can help inform in decision making outcomes


The next release of this analysis will provide the preliminary results of this mapping and how this data was structured to accomodate the rapidly moving cases in the Covid-19 Pandemic.



Next: Phase II: A Look at Where the Covid-19 Data is and What it Means



Updated Friday April 10, 2020

Phone:      (605) 691-2248


Location:  Sioux Falls, South Dakota USA


Email:         info@bravebearanalytics.com

Mato Ohitika Analytics LLC


Specializing in American Indian and

Tribal Government Data Science Solutions

including Machine Learning and

Artificial Intelligence Research and Development

Copyright (2017-2020) Mato Ohitika Analytics LLC

All Images and Logos are Trademarks of

Mato Ohitika Analytics LLC

All Rights Reserved