Many Geographical Analysis utilizes spatial autocorrelation, that
allows us to study the geographical evolution from different points of
view. One measurement for spatial autocorrelation is Moran’s I, that is
based on Pearson’s correlation coefficient in general statistics <arXiv:1606.03658> ##
Performing the Analysis This package offers a straight fordward to
perform the whole analisys by using the function rescaleI
which requires an input file with a specific format you can see it at Loading data section
{r whole_analysis} library(Irescale) fileInput<-system.file("testdata", "chen.csv", package="Irescale") data<-loadFile(fileInput) scaledI<-rescaleI(data,samples=1000, scalingUpTo="MaxMin") fn = file.path(tempdir(),"output.csv",fsep = .Platform$file.sep) saveFile(fn,scaledI) if (file.exists(fn)) #Delete file if it exists file.remove(fn)
The analysis can be done following the steps
The input file1 should have the following format.
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
head(read.csv(fileInput))
To load data to performe the analysis is quite simple. The function
loadFile
provides the interface to make it. loadFile
returns a list with two variables, data
and
varOfInterest
, the first one represents a vector with
latitude and longitude; varOfInterest
is a matrix with all
the measurements from the field.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
head(input$data)
head(input$varOfInterest)
If the data has a chessboard shape,the file is organized in rows and
columns, where the rows represent latitute and columns longitude, the
measurements are in the cell. The function loadChessBoard
can be used to load into the analysis.
library(Irescale)
fileInput<-"../inst/testdata/chessboard.csv"
input<-loadChessBoard(fileInput)
head(input$data)
head(input$varOfInterest)
Once the data is loaded, The distance matrix, the distance between all the points might be calcualted. The distance can be calculated using `calculateEuclideanDistance’ if the points are taken in a geospatial location.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
distM[1:5,1:5]
If the data is taken from a chessboard a like field, the Manhattan distance can be used.
library(Irescale)
fileInput<-"../inst/testdata/chessboard.csv"
input<-loadChessBoard(fileInput)
distM<-calculateManhattanDistance(input$data)
distM[1:5,1:5]
The weighted distance matrix can be calculated it using the function
calculateWeightedDistMatrix
, however it is not required to
do it, because ‘calculateMoranI’ does it.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
distW<-calculateWeightedDistMatrix(distM)
distW[1:5,1:5]
It is time to calculate the spatial autocorrelation statistic Morans’
I. The function calcualteMoranI
, which requires the
distance matrix, and the variable you want are interested on.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
I
The scaling process is made using Monte Carlo resampling method. The idea is to shuffle the values and recalculate I for at least 1000 times. In the code below, after resampling the value of I, a set of statistics are calculated for that generated vector.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
statsVI
To see how the value of I is distribuited, the method
plotHistogramOverlayNormal
provides the functionality to
get a histogram of the vector generated by resampling with a theorical
normal distribution overlay.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
plotHistogramOverlayNormal(vI,statsVI, main=colnames(input$varOfInterest))
Once we have calculated the null distribution via resampling, you
need to scale by centering and streching. The method
iCorrection
, return an object with the resampling vector
rescaled, and all the summary for this vector, the new value of I is
returned in a variable named newI
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
corrections<-iCorrection(I,vI)
corrections$newI
In order to provide a significance to this new value, you can
calculate the pvalue using the method calculatePvalue
. This
method requires the scaled vector, you get this
vector,scaledData
, the scaled I, newI
and the
mean of the scaledData
.
library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
corrections<-iCorrection(I,vI)
pvalueIscaled<-calculatePvalue(corrections$scaledData,corrections$newI,corrections$summaryScaledD$mean)
pvalueIscaled
In order to determine how many iterations it is necessary to run the resampling method, it is possible to run a stability analysis. This function draw a chart in log scale (10^x) of the number of interations needed to achieve the stability in the Monte Carlo simulation.
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
resultsChen<-buildStabilityTable(data=input, times=100, samples=1000, plots=TRUE)
The data used in this example is taken from [@chen2009].↩︎