Joseph C. Robertson PhD

Data Science Solutions. Statistical Consulting. Machine Learning and Artificial Intelligence

Research & Development.

Proud to be an ESRI Partner

Mato Ohitika Analytics LLC

Corona Virus (Covid-19)

Analysis Landing Page 4

Phase IV: Emerging Hot Spot and Trends Mapping Preview

The Week of Monday May 4, 2020

Dr. Joseph Robertson

Mato Ohitika Analytics LLC

Tuesday May 5, 2020

The final phase of this analysis is the final construction of Space Time Cube (STC) to study the emerging trends of the Covid-19 pandemic in US counties over 100 days of data collection. Phase III contained an extensive analysis on how to create the data structures that allows for many different ways to look at Covid-19 from a spatial temporal perspective.

Recall that a STC (above right) is a series of bins that contain a measurement value of some type. In the simplest cases like shown above, the best cases scenario is to have a "perfect cube" to work with so all of the contemporary theory on spatial points, patterns, and relationships behave in best way possible.

To understand this process, we must first define a few critical components of the underlying process and that is the concept of spatial autocorrelation.

O'Sullivan & Unwin (2010) write "Quite simply, spatial autocorrelation is a complicated name for the obvious fact that data from locations near one another in space are more likely to be similar than data from locations remote from one another.

There are three general possibilities: positive autocorrelation, negative autocorrelation, and noncorrelation or zero autocorrelation. Positive autocorrelation is the most commonly observed case and refers to situations where nearby observations are likely to be similar to one another. Negative autocorrelation is much less common and occurs when observations from nearby locations are likely to be different from one another.

Zero autocorrelation is the case where no spatial effect is discernible and observations seem to vary randomly through space. It is important to be clear about the difference between negative and zero autocorrelation, as students frequently confuse the two.

Describing and modeling patterns of variation across a study region, effectively describing the autocorrelation structure, is of primary importance in spatial analysis. Again, in general terms, spatial variation is of two kinds: first- and second-order.

First-order spatial variation occurs when observations across a study region vary from place to place due to changes in the underlying properties of the local environment. For example, the rates of incidence of crime might vary spatially simply because of variations in the population density, such that they increase near the center of a large city.

In contrast, second-order variation is due to interaction effects between observations, such as the occurrence of crime in an area making it more likely that there will be crimes surrounding that area, perhaps in the shape of local ‘‘hotspots’’ in the vicinity of bars and clubs or near local street drug markets.

In practice, it is difficult to distinguish between first- and second-order effects, but it is often necessary to model both when developing statistical methods for handling spatial data".

The essential idea of any approach to autocorrelation is to assess how similar or different attribute values at geographic locations are relative to how spatially close or distant are the associated locations. In broad terms, it is easy to see how we can assess similarity in attribute values using some simple calculation based on the difference in the attribute values. The real research question is how to incorporate spatial proximity into a measure of autocorrelation."

Source: O’Sullivan, D., & Unwin, D. J. (2010). Geographic Information Analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons, Inc.

There are two other things we need in order to look at spatial autocorrelation before we look at a test case from the current Covid-19 dataset.

First, we need to define what we mean by neighbors. Let's look to the right and we will examine one of the hot spots on the southern tip of Florida. The primary focus here is to examine what defines a neighbor? If the grid is a lattice like in the figure below (right) is a unit square then we can see one square (not on the edges) shares a border with every other square thus forming a Rook's case in shared borders.

Similarly, if we include not only the shared borders but the shared vertices, then we form a Queen’s case in shared borders.

If you examine the Florida example of a Rook and Queen cases in the figure, it becomes obvious that if we are trying to identify spatial relationships between US counties, simply using a nearest neighbor approach such as defining an arbitrary radius of x miles or using a Rooks case only will not capture all of the relationships we are interested in; at least not in the most precise way possible.

If you examine all of the 3,142 contiguous counties in the United States (the counties targeted in this analysis), you will see many counties are not bound by simply geometry. In fact some counties may have one or two neighbors or seven or eight neighbors in a Rook's alone. Thus, to account for this, the most sensible strategy to study the Covid-19 data on the US county level was by using a Queens Case of shared borders to maximize the number of neighbors we will use in creating the space time bins.

O'Sullivan & Unwin (2010) continue: "In the measurement of autocorrelation, we need to capture the spatial relationship between all pairs of locations, and this is done using a spatial weights or spatial structure matrix generally denoted W. In the first row of the matrix, we record the spatial relationship between the first location and every other location in the map in turn, so that the value in the first row, second column of the matrix represents the relationship between the first and second locations in the map."

More generally, the element in row i, column j of the weights matrix, denoted Wij, represents the relationship between location i and location j, so that each Wij is dependent on the spatial relationship between locations i and j. The Wij will have a value of 1 if two locations are adjacent and a 0 if they are not. This is what we refer to as adjacency.

The figures below demonstrate how this process works using a simple example.

The following R code calculates the Queen's case contiguity of the US counties included in this analysis as a method of studying of how many neighbors collectively are shared nationwide.

# Covid-19 Analysis

# May 2020

# Mato Ohitika Analytics LLC

# Dr. Joseph Robertson

# Spatial Analysis of Lattice Data / Spatial Autocorrelation

library(spdep) # Loads sp and maptools as well

library(rgdal) # functions for spatial data input/output

# Import a STC shapefile as a map object that was specially designed for the STC Construction

ngp.spdf <- readOGR(dsn = "C:\\data\\2019_us_county_STC_ID.shp")

#Examine the Structure

str(ngp.spdf)

names(ngp.spdf)

plot(ngp.spdf)

##### Contiguity We want to build a neighbor object based on queen's continuity

#This is accomplished using the ploy2nb() function. With 3142 counties this matrix will be big!

#If you use the row.names function, this will provide the county fips as an index

$

ngp.nb.queen = poly2nb(ngp.spdf, queen=TRUE, row.names=ngp.spdf$GEOID)

ngp.nb.queen

str(ngp.nb.queen)

summary(ngp.nb.queen)

table(card(ngp.nb.queen) #The card function is a matrix of all of the number of neighbors a county has

Here are the results and the number ID's represent the County FIPS codes to easily check and audit the results:

- We have three regions with no links which are the Hawaiian Islands.
- The seventeen regions with one link are mostly townships considered a county that are within the boundaries of a greater county.
- There is one county that shares 14 neighbors which is San Juan county in Utah! Wow, you can't even make that up.

Finally, it is helpful to know that the mean number of links in the US is about 6 neighbors if we are calculating a queens case in the emerging hot spot analysis. I used a more conservative value of a minimum of 4 neighbors in the current maps series but future map series will reflect the intelligence gained through this analysis. A histogram of the distribution of the Queen's neighbors has also been included and not surpisingly, it follows a normal distribution!

The Space Time Cube is Now Live, Let's Take a Look!

The construction of the STC happens in ArcGIS Pro. I felt there was a need for me to create all of this groundwork to create ease of use once the data structures have been built. This essentially falls in the 80/20 rule that most statisticians spend 80% of their time on cleaning and preparing the data and 20% actually analyzing it. This was no small feat.

The STC functions in ArcGIS Pro are very poweful tools provided you know have the know how and the spatial statistics background to understand what is happening at every step. All of the phases of this Covid-19 project were designed to address how this entire process unfolds: from examining the stream of data coming in, to building the data structure through EDA, and finally using this information to build a spatial temporal construct good enough to answer some questions about the Covid-19 pandemic, like "I know six weeks isn't a lot of time normally, but how can I visually understand what has happened over this time?"

These are also the questions I set out to understand since it has become very apparent we have switched from being fearful of the uncertainty of what the Covid-19 Virus will do to me and my family to fear of the uncertainty of what science will tell us that we do not want to hear.

So this is how I speak on Covid-19, through the lens of science and provide information to best of my ability to inform the public about my findings and what my interpretations of these findings are. It is imperative to always be transparent in these processes, since our scientists in the government have bowed out in favor of magical thinking.

Having said that, let's look at the magic of the space time cube!

The first thing we do is write a master shapefile of the newly created data structure, this example is from the data collected through May 5, 2020. This process can be completely automated using a jupyter notebook inside ArcGIS Pro, but since I have streamlined nearly all of this process, a few clicks in the UI are minimal. Take a look at this gallery to see the steps of how the cube is constructed:

The Construction of the Space Time Cube is Now Complete, Now What?

Discussion (5/6/2020)

Some things in life are never really finished. The journey to complete this exploratory analysis took many weeks of going over data sources, parsing through things that required major precision to truly understand what is truly going on. Let's take a look now at what the map gallery on the next pages will look like in context. Remember context is everything.

In spatial statistics, there are two fundamental processes at work: Global and Local measures. Since I have deemed this to be an exploratory process, we are not bound by strict rules of interpretation, at least not yet. But the fact remains that despite even constructing these maps using rather conservative parameters, the trends and hot spot maps in the following pages show a continued upward trajectory not only in specific cities and their surrounding counties but many rural areas as well.

The next steps will begin constructing additional maps and time series that address the state by state trajectories. In the meantime, please have a look at what the Space Time Cube looks like in 3-space.

Specializing in American Indian and

Tribal Government Data Science Solutions

including Machine Learning and

Artificial Intelligence Research and Development

Copyright (2017-2020) Mato Ohitika Analytics LLC

All Images and Logos are Trademarks of

Mato Ohitika Analytics LLC

All Rights Reserved