Covid 19 Phase IV

Mato Ohitika Analytics LLC's

Corona Virus (Covid-19) Research

Quick Links

Covid-19 Analysis Phases

< 0 1  2  3  4  5  6  7  8 >

Phase IV: Emerging Hot Spot and Trends Mapping Preview


The Week of Monday May 4, 2020

Dr. Joseph Robertson

Mato Ohitika Analytics LLC

Tuesday May 5, 2020



The final phase of this analysis is the final construction of Space Time Cube (STC) to study the emerging trends of the Covid-19 pandemic in US counties over 100 days of data collection. Phase III contained an extensive analysis on how to create the data structures that allows for many different ways to look at Covid-19 from a spatial temporal perspective.

Theoretical Space Time Cube
Theoretical Space Time Cube
Courtesy of ESRI, All Rights Reserved.


Recall that a STC (above right) is a series of bins that contain a measurement value of some type. In the simplest cases like shown above, the best cases scenario is to have a "perfect cube" to work with so all of the contemporary theory on spatial points, patterns, and relationships behave in best way possible.


To understand this process, we must first define a few critical components of the underlying process and that is the concept of spatial autocorrelation.


O'Sullivan & Unwin (2010) write "Quite simply, spatial autocorrelation is a complicated name for the obvious fact that data from locations near one another in space are more likely to be similar than data from locations remote from one another.


There are three general possibilities: positive autocorrelation, negative autocorrelation, and noncorrelation or zero autocorrelation. Positive autocorrelation is the most commonly observed case and refers to situations where nearby observations are likely to be similar to one another. Negative autocorrelation is much less common and occurs when observations from nearby locations are likely to be different from one another.


Zero autocorrelation is the case where no spatial effect is discernible and observations seem to vary randomly through space. It is important to be clear about the difference between negative and zero autocorrelation, as students frequently confuse the two.


Describing and modeling patterns of variation across a study region, effectively describing the autocorrelation structure, is of primary importance in spatial analysis. Again, in general terms, spatial variation is of two kinds: first- and second-order.


First-order spatial variation occurs when observations across a study region vary from place to place due to changes in the underlying properties of the local environment. For example, the rates of incidence of crime might vary spatially simply because of variations in the population density, such that they increase near the center of a large city.


In contrast, second-order variation is due to interaction effects between observations, such as the occurrence of crime in an area making it more likely that there will be crimes surrounding that area, perhaps in the shape of local ‘‘hotspots’’ in the vicinity of bars and clubs or near local street drug markets.


In practice, it is difficult to distinguish between first- and second-order effects, but it is often necessary to model both when developing statistical methods for handling spatial data".


The essential idea of any approach to autocorrelation is to assess how similar or different attribute values at geographic locations are relative to how spatially close or distant are the associated locations. In broad terms, it is easy to see how we can assess similarity in attribute values using some simple calculation based on the difference in the attribute values. The real research question is how to incorporate spatial proximity into a measure of autocorrelation."


Source: O’Sullivan, D., & Unwin, D. J. (2010). Geographic Information Analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons, Inc.


There are two other things we need in order to look at spatial autocorrelation before we look at a test case from the current Covid-19 dataset.


First, we need to define what we mean by neighbors. Let's look to the right and we will examine one of the hot spots on the southern tip of Florida. The primary focus here is to examine what defines a neighbor? If the grid is a lattice like in the figure below (right) is a unit square then we can see one square (not on the edges) shares a border with every other square thus forming a Rook's case in shared borders.


Similarly, if we include not only the shared borders but the shared vertices, then we form a Queen’s case in shared borders.

 

If you examine the Florida example of a Rook and Queen cases in the figure, it becomes obvious that if we are trying to identify spatial relationships between US counties, simply using a nearest neighbor approach such as defining an arbitrary radius of x miles or using a Rooks case only will not capture all of the relationships we are interested in; at least not in the most precise way possible.

 

Six Southern Florida Counties
Six Southern Florida Counties
Current Southern Florida Covid-19 Hot Spot
Current Southern Florida Covid-19 Hot Spot
southern florida example2
southern florida example2
Defining an Adjacency Matrix
Defining an Adjacency Matrix
Nieghborhood Example of Rook and Queen Case
Nieghborhood Example of Rook and Queen Case

If you examine all of the 3,142 contiguous counties in the United States (the counties targeted in this analysis), you will see many counties are not bound by simply geometry. In fact some counties may have one or two neighbors or seven or eight neighbors in a Rook's alone. Thus, to account for this, the most sensible strategy to study the Covid-19 data on the US county level was by using a Queens Case of shared borders to maximize the number of neighbors we will use in creating the space time bins.


O'Sullivan & Unwin (2010) continue: "In the measurement of autocorrelation, we need to capture the spatial relationship between all pairs of locations, and this is done using a spatial weights or spatial structure matrix generally denoted W. In the first row of the matrix, we record the spatial relationship between the first location and every other location in the map in turn, so that the value in the first row, second column of the matrix represents the relationship between the first and second locations in the map."


More generally, the element in row i, column j of the weights matrix, denoted Wij, represents the relationship between location i and location j, so that each Wij is dependent on the spatial relationship between locations i and j. The Wij will have a value of 1 if two locations are adjacent and a 0 if they are not. This is what we refer to as adjacency.


The figures below demonstrate how this process works using a simple example.


Adjacency Matrix W
Adjacency Matrix W
Example of Shared Borders Using a Contiguity Matrix
Example of Shared Borders Using a Contiguity Matrix

The following R code calculates the Queen's case contiguity of the US counties included in this analysis as a method of studying of how many neighbors collectively are shared nationwide.


# Covid-19 Analysis
# May 2020
# Mato Ohitika Analytics LLC
# Dr. Joseph Robertson
# Spatial Analysis of Lattice Data / Spatial Autocorrelation


library(spdep) # Loads sp and maptools as well
library(rgdal) # functions for spatial data input/output


# Import a STC shapefile as a map object that was specially designed for the STC Construction


ngp.spdf <- readOGR(dsn = "C:\\data\\2019_us_county_STC_ID.shp")


#Examine the Structure

str(ngp.spdf)
names(ngp.spdf)

plot(ngp.spdf)


##### Contiguity We want to build a neighbor object based on queen's continuity

#This is accomplished using the ploy2nb() function. With 3142 counties this matrix will be big!

#If you use the row.names function, this will provide the county fips as an index

$
ngp.nb.queen = poly2nb(ngp.spdf, queen=TRUE, row.names=ngp.spdf$GEOID)
ngp.nb.queen

str(ngp.nb.queen)
summary(ngp.nb.queen)
table(card(ngp.nb.queen) #The card function is a matrix of all of the number of neighbors a county has


Here are the results and the number ID's represent the County FIPS codes to easily check and audit the results:

  1. We have three regions with no links which are the Hawaiian Islands.
  2. The seventeen regions with one link are mostly townships considered a county that are within the boundaries of a greater county.
  3. There is one county that shares 14 neighbors which is San Juan county in Utah! Wow, you can't even make that up.


Finally, it is helpful to know that the mean number of links in the US is about 6 neighbors if we are calculating a queens case in the emerging hot spot analysis. I used a more conservative value of a minimum of 4 neighbors in the current maps series but future map series will reflect the intelligence gained through this analysis. A histogram of the distribution of the Queen's neighbors has also been included and not surpisingly, it follows a  normal distribution!


US Counties Queen's Cases Results
San Jaun County with 16 Contiguity Neighbors
San Jaun County with 16 Contiguity Neighbors
16 Neighbors in Queen's case
Histogram of Adjacency Neighbors in US Counties
Histogram of Adjacency Neighbors in US Counties


The Space Time Cube is Now Live, Let's Take a Look!


The construction of the STC happens in ArcGIS Pro. I felt there was a need for me to create all of this groundwork to create ease of use once the data structures have been built. This essentially falls in the 80/20 rule that most statisticians spend 80% of their time on cleaning and preparing the data and 20% actually analyzing it. This was no small feat.


The STC functions in ArcGIS Pro are very poweful tools provided you know have the know how and the spatial statistics background to understand what is happening at every step. All of the phases of this Covid-19 project were designed to address how this entire process unfolds: from examining the stream of data coming in, to building the data structure through EDA, and finally using this information to build a spatial temporal construct good enough to answer some questions about the Covid-19 pandemic, like "I know six weeks isn't a lot of time normally, but how can I visually understand what has happened over this time?"


These are also the questions I set out to understand since it has become very apparent we have switched from being fearful of the uncertainty of what the Covid-19 Virus will do to me and my family to fear of the uncertainty of what science will tell us that we do not want to hear.


So this is how I speak on Covid-19, through the lens of science and provide information to best of my ability to inform the public about my findings and what my interpretations of these findings are. It is imperative to always be transparent in these processes, since our scientists in the government have bowed out in favor of magical thinking. 


Having said that, let's look at the magic of the space time cube!


The first thing we do is write a master shapefile of the newly created data structure, this example is from the data collected through May 5, 2020. This process can be completely automated using a jupyter notebook inside ArcGIS Pro, but since I have streamlined nearly all of this process, a few clicks in the UI are minimal. Take a look at this gallery to see the steps of how the cube is constructed:


Space Time Cube Construction
Space Time Cube Construction
First, we write a master shapefile of the confirmed cases and deaths.
Space Time Cube Construction
Space Time Cube Construction
Second, we need to write a second shapefile in to a projected coordinate system. This is the only way the STC function will work. In this case, I chose NAD 1983 2011 Contiguous USA Albers. As you can see the master county ID file I created is already projected this way.
Space Time Cube Construction
Space Time Cube Construction
This verifies that the time cube with all of the row normalized data has been successful. Notice the confirmed and deaths are what they were reported to be. Each of these bins can be queried by state or county now.
Space Time Cube Construction
Space Time Cube Construction
Now observe, we now an entire set of pre-made bins for each county represented by the point in the centroid of each county. The next step is to use the create space time cube from defined locations, this will create a relationship with each county polygon rather than using the points themselves. Essentially the county becomes the bin.
Space Time Cube Construction
Space Time Cube Construction
The next step was to link ID's from the county level polygons and the master row normalized point data. In this case the STC .nc file is structured types bins, temporal aggregation, statistic, case, and finally date. This also uses end time, time step alignment to start at the end of the series and go backwards.
Space Time Cube Construction
Space Time Cube Construction
After the process is finished the results are published for time cube. In this case, we have 21 time steps in 5 day intervals starting on May 5, 2020. The overall trends for confirmed cases and deaths are still increasing. The temporal bias is product of having an unequal time step, thus placing the bias at the begining is optimal since almost all counties around January 22, 2020 were zero. This allows for now an exploratory analysis based on the given prarmeters.
Space Time Cube Construction
Space Time Cube Construction
This Emerging Hot Spot result was done by using the original parameters of this analysis by using the maximum confirmed case, spatial neighbor binning, and over a 5 day span. This result is one neighborhood time step, in that it will look exclusively over the past 5 days from the end date. In this case we are looking at hot spots bins dating back to April 30, 2020.
STC_Trends_results_5-05-20
STC_Trends_results_5-05-20
This Msnn-KendallTrend Test result was done by using the original parameters of this analysis by using the maximum confirmed case, spatial neighbor binning, and over a 5 day span. In this case we are looking at trends by county 21 time steps 5 days apart. Clearly, there is simply nothing but upward trends throughout the entire US.


The Construction of the Space Time Cube is Now Complete, Now What?

Discussion (5/6/2020)


Some things in life are never really finished. The journey to complete this exploratory analysis took many weeks of going over data sources, parsing through things that required major precision to truly understand what is truly going on. Let's take a look now at what the map gallery on the next pages will look like in context. Remember context is everything.

 

In spatial statistics, there are two fundamental processes at work: Global and Local measures. Since I have deemed this to be an exploratory process, we are not bound by strict rules of interpretation, at least not yet. But the fact remains that despite even constructing these maps using rather conservative parameters, the trends and hot spot maps in the following pages show a continued upward trajectory not only in specific cities and their surrounding counties but many rural areas as well.


The next steps will begin constructing additional maps and time series that address the state by state trajectories.  In the meantime, please have a look at what the Space Time Cube looks like in 3-space.

STC of Confirmed Cases in 3D
STC of Confirmed Cases in 3D
This is a 3D visualization of a spatial temporal process. As you can see, time is a three dimensional bin. The bar plot underneath represents the sum of cases from Miami-Dade County. As the bins rise up, the top bin represents the most recent case. In this case, it represents the last 5 days of confirmed case reporting.
STC of Confirmed Cases in 3D
STC of Confirmed Cases in 3D
This is a more refined look at the space time bins near Miami-Dade County as we zoom in closer.

Next: Phase V: The Maps Series Meta & Designs

Last Updated Thursday May 7, 2020

Quick Links Covid-19 Analysis Phases           < 0 1  2  3  4  5  6  7  8 >