# Load the package and dependencies
library(ppgm)
#> Loading required package: sp
#> Loading required package: sf
#> Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
PaleoPhyloGeographic Models (PPGM) are a useful way to incorporate paleontological data into analysis of climatic niche shifts in modern taxa.
Here we will walk you through the basic steps of the PPGM workflow, using the included data within the package.
Three things are needed to run a PPGM. Species occurrences, a dated phylogenetic tree, and paleoclimate data. Fossils are optional but highly recommended.
There are many sources for species occurrences, which will depend on your specific analysis. For instructions on how to find and clean extant species occurrence data, please see this webpage.
We are using the Sceloporus data included in the package. This data was first collected for Lawing et al (2016).
After loading the package, you can load the data using the
data
function
This file contains Sceloporus occurrences for 53 species, Longitude and Latitude for each occurrence point, and all 19 bioclimatic variables for each of these points.
Only the first 3 columns are required for PPGM to run, as there is code that extracts bioclimate variables from the first layer of paleoclimate data. If the bioclimatic variables are included in the input dataset, then the package will default to estimating climate envelopes using these variables, rather than the paleoclimate data.
All 19 bioclimatic variables must be included if including any bioclimatic variables at this stage. If some variables are known, the columns may be left blank
# View the occurrence data to check all in order
head(occurrences,n=3)
#> Species Longitude Latitude bio1 bio2 bio3 bio4 bio5 bio6 bio7 bio8 bio9
#> 1 arenicolus -104.03 33.15 152 180 45 7808 342 -52 394 240 70
#> 2 arenicolus -104.02 33.18 151 181 45 7835 342 -53 395 240 69
#> 3 arenicolus -104.00 33.28 150 181 45 7823 341 -54 395 238 67
#> bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19
#> 1 251 50 350 62 9 65 171 31 157 32
#> 2 251 49 352 63 9 65 173 31 159 32
#> 3 249 48 357 64 10 66 174 32 163 33
PPGM requires dated phylogenetic trees. This is so that the code can associate ages of nodes with appropriate paleoclimate layers.
A sample of the Leache & Sites (2010) phylogenetic trees, trimmed for Sceloporus analysis by Lawing et al (2016), is included in the package. For the current analysis in the vignette we will use a subset of this dataset, in order to save computational time.
These trees can be accessed using the data
function.
We will store 10 trees in the variable vigtrees
for this
analysis to save on computational time. A larger number of phylogenies
will increase computational time. We recommend using the inbuilt
parallel function to speed large analyses up, see later.
We include a sample paleoclimate dataset within the package. This is a dataset
We will also use the Sceloporus fossil dataset collected for Lawing et al (2016), representing a sampling of fossils from the literature. This dataset includes 45 Sceloporus fossils identified to the genus level.
This dataset is included in the package, and can be accessed using:
data("scel_fossils")
head(scel_fossils)
#> MinAge MaxAge Longitude Latitude
#> [1,] 13.600 15.970 -100.0333 42.78333
#> [2,] 13.600 15.970 -97.6500 42.60000
#> [3,] 15.970 20.430 -103.3177 42.37400
#> [4,] 15.970 20.430 -103.1000 41.70000
#> [5,] 13.600 15.970 -100.0505 42.80610
#> [6,] 0.126 0.781 -105.7500 37.36667
This dataset includes minimum and maximum age for each fossil, with longitude and latitude for each point.
Fossil data is in the format of a table. It is important that the ages are in the first two columns, not the name of the species as with the modern occurrences.
PPGM uses phylogenetic comparative techniques to estimate climatic
envelopes through time. Setting appropriate bounds is an important step.
When no bounds are included, ppgm
will use the default
bounds from the fitContinuous
function, which is likely not
appropriate for analysis of climate niche.
Here we will set up a reasonable bounds for this analysis, using bioclimatic variable 1 (Mean Annual Temperature)
First we can look at the summary of bio1 in the occurrences dataset,
using getBioclimVars
.
bio1occ <- getBioclimVars(occurrences = occurrences, which.biovars=1)
summary(bio1occ$bio1)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -29 117 149 153 184 291
The minimum Mean Annual Temperature gets is -29 , and the max is 291, so we can use these values to give an idea for the bounds.
NOTE: If using multiple variables, you should use the bounds for the largest number. For example, seasonality:
bio4occ <- getBioclimVars(occurrences = occurrences, which.biovars=4)
summary(bio4occ$bio4)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 477 2444 5729 5032 7116 10861
PPGM will use the same bounds for all biovariables, so make sure to set bounds accordingly.
Here we will set bounds for the bio1 and bio4 analysis for a Brownian Motion model.
And set bounds for the bio1 analysis for an Ornstein-Uhlenbeck model.
PPGMs can be run for a variety of models, and using either just present occurrences, or fossil and present occurrences. The ‘present’ is set by the youngest tips in the phylogeny - PPGMs run on extinct clades must take this into account.
For this first run, we will show a ppgm run with 10 trees and no fossils.
We will use our previously made objects, occurrences
and
vigtrees
.
To select biovariables 1 and 4 we use
which.biovars = c(1,4)
.
To select the Brownian Motion model we use
model = "BM"
.
To view results in the format of a TraitGram, we use the parameter
plot.TraitGram
and set it to TRUE
.
To change the default name of the output (which defaults to bio[#]),
we use path = "BM"
. This will add the characters “BM” to
the start of every plot name produced by the PPGM.
To speed up processing time when using multiple phylogenies, you can
add the parameter use.parallel = TRUE
. This will detect the
number of cores on the current machine and run the node estimation
process on multiple cores.
ppgm_bm <- ppgm(occurrences, trees = vigtrees, bounds = bmbounds, control = contr,
which.biovars = c(1,4), model = "BM", path = "BM")
If you add the argument plot.TraitGram = TRUE, you will get a figure in your working directory called “BMbio1.pdf” and “BMbio4.pdf”.
Let’s take a look at the traitgram for mean annual temperature (bio 1).
Traitgram showing Mean Annual Temperature
This is a traitgram for the mean annual temperature variable. The minimum is shown in grey, and the maximum in blue.
Below is the traitgram for temperature seasonality.
Traitgram showing Temperature Seasonality
Let’s have a look at the outputs from another commonly used model, OU.
Remember that this model uses different bounds.
ppgm_ou <- ppgm(occurrences, trees = vigtrees, bounds = oubounds, control = contr,
which.biovars = c(1,4), model = "OU", path = "OU")
You can also examine how the reconstructed climate for each node differs from the Brownian Motion in the traitgram.
Traitgram showing Mean Annual Temperature
Traitgram showing Temperature Seasonality
The ppgm
function produces a list of outputs that can be
used separately with other functions in the package, or with further
downstream analysis. Let’s take a look at these.
The climate envelope for each species is stored in cem
.
This contains the minimum, mean, and max, for the biovariable used in
the ppgm workflow. As we have included two variables (bio1 and bio4), we
have 6 columns.
head(ppgm_bm$cem)
#> 1Min 4Min 1Mean 4Mean 1Max 4Max
#> arenicolus 139 7373 151.0787 7629.559 168 7835
#> bicanthalis 48 941 134.7313 1706.000 257 2362
#> cautus 137 2275 180.3571 3199.357 226 5066
#> clarkii 81 2211 192.1046 5449.590 255 7665
#> couchii 137 2603 205.8000 5036.850 230 6041
#> cryptus 121 1059 143.2727 1206.182 199 1474
The output envelope
is a list (of however many trees)
containing the climate envelope of each node of the tree. This is used
downstream in the function getLineageClimate
and is the
equivalent to the output of the function getEnvelopes
.
head(ppgm_bm$envelope[[1]])
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 119 119 289 289
#> [2,] 0 156 156 246 246
#> [3,] 0 40 40 258 258
#> [4,] 0 204 204 240 240
#> [5,] 0 57 57 239 239
#> [6,] 0 153 153 239 239
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 676 676 2136 2136
#> [2,] 0 2996 2996 5290 5290
#> [3,] 0 2840 2840 10000 10000
#> [4,] 0 3369 3369 4635 4635
#> [5,] 0 2884 2884 7497 7497
#> [6,] 0 3382 3382 4626 4626
The outputs treedata_min
and treedata_max
contain the minimum values and maximum climatic variables for every
tree.
Here we can take a look at the data that is in the first tree:
head(ppgm_bm$treedata_min[[1]]$data)
#> bio1 bio4
#> pyrocephalus 119 676
#> zosteromus 156 2996
#> magister 40 2840
#> hunsakeri 204 3369
#> orcutti 57 2884
#> licki 153 3382
If you didn’t plot a traitgram during the initial run of the PPGM,
these outputs can be used in the functions plotTraitGram
(for a single phylogeny) or plotTraitGramMultiPhylo
(for
many phylogenies).
The traitgram results show how these individual bioclimatic variables change through time. To see how these relate to geographic space, we can create a MESS map.
MESS, multivariate environmental suitability surfaces
To put the output of ppgm in the correct format, we need to extract
the minimum and maximum climate envelope from the cem
object. We need columns 1 and 2 for minimum, and 4 and 6 for the
maximum. We also need to make sure that the row names from the
cem
output are retained.
cem_min <- cbind(ppgm_bm$cem[,1], ppgm_bm$cem[,2])
cem_max <- cbind(ppgm_bm$cem[,5], ppgm_bm$cem[,6])
rownames(cem_min) <- rownames(cem_max) <- rownames(ppgm_bm$cem)
Now we can calculate the MESS scores and create the MESS map. We
select the option which.plot="mess"
to select the MESS
map.
mess <- ppgmMESS(cem_min, cem_max, ppgm_bm$node_est, tree = vigtrees, timeslice = 15,
which.biovars = c(1,4), which.plot = "none")
This will produce a plot in the working directory for the timeslices
selected using the option timeslice
. As we selected
timeslice 15 (representing 15 mya), we get a plot of the reconstructed
climate niche availability at 15 mya for the entire clade, output as a
plot called “MESS15Multi.pdf”.
MESS at 15mya
A PPGM can be run with fossils, and we recommend running a PPGM with and without fossils to see what the effect of fossil inclusion can have on the reconstruction of climate niche tolerances through time. The set up for the PPGM is the same as with the extant PPGM, only now we will use our dataset of fossils.
Here we’ll select the first 6 fossil occurrences, to show how only a few fossils can have a really interesting effect on results
This dataset includes 45 fossil Sceloporus occurrences ranging from early Miocene to late Pleistocene that are identified to genus level, but species identification is not possible.
One of the good things about PPGM is that we can include these
fossils even though their taxonomic position is unknown. When fossils
with unknown phylogenetic placement are included in the
ppgm
, the workflow utilises the function
addFossil
to randomly add fossils to appropriate branches
that are present in the phylogeny within the age range of the fossil as
given.
Here we will run a ppgm with the same variables as the previous models to see the changes that happen when fossils are included.
As the fossils are being added to the tree randomly, we can account
for the uncertainty in node placement introduced by alternate fossil
positions by increasing the number of permutations. Here we’ll set the
number of permutations to 5 using permut = 5
, however you
can experiment with your own data to see how many permutations are
necessary to include the range of uncertainty in node estimation.
Increasing the number of permutations will increase the computational time, so be aware that this step could take some time depending on your personal machine.
ppgm_bmfossil <- ppgm(occurrences, trees = vigtrees, fossils = vigfossils, bounds = bmbounds,
control = contr, which.biovars = c(1,4), model = "BM",
path = "BMfossil", permut = 5)
We can compare the results with the extant only ppgm
Traitgram without fossils showing Mean Annual Temperature
Traitgram with fossils showing Mean Annual Temperature
Even with the addition of only 6 fossils, the traitgram shows that the model reconstructs a different ancestral state for the climate tolerance of the Sceloporus total clade.
We can also create a MESS map with the fossil run and compare.
Format output for MESS as before:
fcem_min <- cbind(ppgm_bmfossil$cem[,1], ppgm_bmfossil$cem[,2])
fcem_max <- cbind(ppgm_bmfossil$cem[,5], ppgm_bmfossil$cem[,6])
rownames(fcem_min) <- rownames(fcem_max) <- rownames(ppgm_bmfossil$cem)
Run the MESS analysis for the fossil ppgm. Change the below code to
say which.plot = "all"
.
mess_fossil <- ppgmMESS(fcem_min, fcem_max, ppgm_bmfossil$node_est, tree=vigtrees, timeslice=15,
which.biovars = c(1,4), which.plot = "none", fossils = vigfossils,
path = "fossil")
MESS at 15mya, no fossils
MESS at 15mya, fossils
As you can see the MESS map using fossils is very different, suggesting climatic tolerances for Sceloporus much more North than using extant occurrences alone. This changes our understanding of where suitable climate for Sceloporus was during their early evolutionary history.