How to use PPGM

# Load the package and dependencies
library(ppgm)
#> Loading required package: sp
#> Loading required package: sf
#> Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE

PaleoPhyloGeographic Models (PPGM) are a useful way to incorporate paleontological data into analysis of climatic niche shifts in modern taxa.

Here we will walk you through the basic steps of the PPGM workflow, using the included data within the package.

Input Data

Three things are needed to run a PPGM. Species occurrences, a dated phylogenetic tree, and paleoclimate data. Fossils are optional but highly recommended.

There are many sources for species occurrences, which will depend on your specific analysis. For instructions on how to find and clean extant species occurrence data, please see this webpage.

Extant Data

We are using the Sceloporus data included in the package. This data was first collected for Lawing et al (2016).

After loading the package, you can load the data using the data function

# load Sceloporus occurrence data
data(occurrences)

This file contains Sceloporus occurrences for 53 species, Longitude and Latitude for each occurrence point, and all 19 bioclimatic variables for each of these points.

Only the first 3 columns are required for PPGM to run, as there is code that extracts bioclimate variables from the first layer of paleoclimate data. If the bioclimatic variables are included in the input dataset, then the package will default to estimating climate envelopes using these variables, rather than the paleoclimate data.

All 19 bioclimatic variables must be included if including any bioclimatic variables at this stage. If some variables are known, the columns may be left blank

# View the occurrence data to check all in order
head(occurrences,n=3)
#>      Species Longitude Latitude bio1 bio2 bio3 bio4 bio5 bio6 bio7 bio8 bio9
#> 1 arenicolus   -104.03    33.15  152  180   45 7808  342  -52  394  240   70
#> 2 arenicolus   -104.02    33.18  151  181   45 7835  342  -53  395  240   69
#> 3 arenicolus   -104.00    33.28  150  181   45 7823  341  -54  395  238   67
#>   bio10 bio11 bio12 bio13 bio14 bio15 bio16 bio17 bio18 bio19
#> 1   251    50   350    62     9    65   171    31   157    32
#> 2   251    49   352    63     9    65   173    31   159    32
#> 3   249    48   357    64    10    66   174    32   163    33

Phylogenetic Trees

PPGM requires dated phylogenetic trees. This is so that the code can associate ages of nodes with appropriate paleoclimate layers.

A sample of the Leache & Sites (2010) phylogenetic trees, trimmed for Sceloporus analysis by Lawing et al (2016), is included in the package. For the current analysis in the vignette we will use a subset of this dataset, in order to save computational time.

These trees can be accessed using the data function.

We will store 10 trees in the variable vigtrees for this analysis to save on computational time. A larger number of phylogenies will increase computational time. We recommend using the inbuilt parallel function to speed large analyses up, see later.

# Load the Sceloporus phylogenies and select only 10
data(sampletrees)
vigtrees <- sample(sampletrees, size=10)

Paleoclimate Data

We include a sample paleoclimate dataset within the package. This is a dataset

Fossil Data

We will also use the Sceloporus fossil dataset collected for Lawing et al (2016), representing a sampling of fossils from the literature. This dataset includes 45 Sceloporus fossils identified to the genus level.

This dataset is included in the package, and can be accessed using:

data("scel_fossils")
head(scel_fossils)
#>      MinAge MaxAge Longitude Latitude
#> [1,] 13.600 15.970 -100.0333 42.78333
#> [2,] 13.600 15.970  -97.6500 42.60000
#> [3,] 15.970 20.430 -103.3177 42.37400
#> [4,] 15.970 20.430 -103.1000 41.70000
#> [5,] 13.600 15.970 -100.0505 42.80610
#> [6,]  0.126  0.781 -105.7500 37.36667

This dataset includes minimum and maximum age for each fossil, with longitude and latitude for each point.

Fossil data is in the format of a table. It is important that the ages are in the first two columns, not the name of the species as with the modern occurrences.

Set Up Analysis

Bounds

PPGM uses phylogenetic comparative techniques to estimate climatic envelopes through time. Setting appropriate bounds is an important step. When no bounds are included, ppgm will use the default bounds from the fitContinuous function, which is likely not appropriate for analysis of climate niche.

Here we will set up a reasonable bounds for this analysis, using bioclimatic variable 1 (Mean Annual Temperature)

First we can look at the summary of bio1 in the occurrences dataset, using getBioclimVars.

bio1occ <- getBioclimVars(occurrences = occurrences, which.biovars=1)
summary(bio1occ$bio1)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     -29     117     149     153     184     291

The minimum Mean Annual Temperature gets is -29 , and the max is 291, so we can use these values to give an idea for the bounds.

NOTE: If using multiple variables, you should use the bounds for the largest number. For example, seasonality:

bio4occ <- getBioclimVars(occurrences = occurrences, which.biovars=4)
summary(bio4occ$bio4)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     477    2444    5729    5032    7116   10861

PPGM will use the same bounds for all biovariables, so make sure to set bounds accordingly.

Here we will set bounds for the bio1 and bio4 analysis for a Brownian Motion model.

# Set bounds for Brownian Motion
bmbounds <- list(sigsq = c(min = 0, max = 10000))

And set bounds for the bio1 analysis for an Ornstein-Uhlenbeck model.

# Set bounds for OU
oubounds <- list(sigsq = c(min = 0, max = 1000000),alpha = c(min = 0, max = 100000))

Control

Remember to also set a control. These are the settings used by fitContinuous to optimize the model likelihood.

#set control for model
contr <- list(niter=20)

Run PPGM

PPGMs can be run for a variety of models, and using either just present occurrences, or fossil and present occurrences. The ‘present’ is set by the youngest tips in the phylogeny - PPGMs run on extinct clades must take this into account.

Many Trees - No Fossils - Brownian Motion

For this first run, we will show a ppgm run with 10 trees and no fossils.

We will use our previously made objects, occurrences and vigtrees.

To select biovariables 1 and 4 we use which.biovars = c(1,4).

To select the Brownian Motion model we use model = "BM".

To view results in the format of a TraitGram, we use the parameter plot.TraitGram and set it to TRUE.

To change the default name of the output (which defaults to bio[#]), we use path = "BM". This will add the characters “BM” to the start of every plot name produced by the PPGM.

To speed up processing time when using multiple phylogenies, you can add the parameter use.parallel = TRUE. This will detect the number of cores on the current machine and run the node estimation process on multiple cores.

ppgm_bm <- ppgm(occurrences, trees = vigtrees, bounds = bmbounds, control = contr, 
                    which.biovars = c(1,4), model = "BM",  path = "BM")

If you add the argument plot.TraitGram = TRUE, you will get a figure in your working directory called “BMbio1.pdf” and “BMbio4.pdf”.

Let’s take a look at the traitgram for mean annual temperature (bio 1).

Traitgram showing Mean Annual Temperature

Traitgram showing Mean Annual Temperature

This is a traitgram for the mean annual temperature variable. The minimum is shown in grey, and the maximum in blue.

Below is the traitgram for temperature seasonality.

Traitgram showing Temperature Seasonality

Traitgram showing Temperature Seasonality

Many Trees - No Fossils - Ornstein Uhlenbeck

Let’s have a look at the outputs from another commonly used model, OU.

Remember that this model uses different bounds.

ppgm_ou <- ppgm(occurrences, trees = vigtrees, bounds = oubounds, control = contr, 
                    which.biovars = c(1,4), model = "OU", path = "OU")

You can also examine how the reconstructed climate for each node differs from the Brownian Motion in the traitgram.

Traitgram showing Mean Annual Temperature

Traitgram showing Mean Annual Temperature

Traitgram showing Temperature Seasonality

Traitgram showing Temperature Seasonality

PPGM results

The ppgm function produces a list of outputs that can be used separately with other functions in the package, or with further downstream analysis. Let’s take a look at these.

Climate Envelope

The climate envelope for each species is stored in cem. This contains the minimum, mean, and max, for the biovariable used in the ppgm workflow. As we have included two variables (bio1 and bio4), we have 6 columns.

head(ppgm_bm$cem)
#>             1Min 4Min    1Mean    4Mean 1Max 4Max
#> arenicolus   139 7373 151.0787 7629.559  168 7835
#> bicanthalis   48  941 134.7313 1706.000  257 2362
#> cautus       137 2275 180.3571 3199.357  226 5066
#> clarkii       81 2211 192.1046 5449.590  255 7665
#> couchii      137 2603 205.8000 5036.850  230 6041
#> cryptus      121 1059 143.2727 1206.182  199 1474

The output envelope is a list (of however many trees) containing the climate envelope of each node of the tree. This is used downstream in the function getLineageClimate and is the equivalent to the output of the function getEnvelopes.

head(ppgm_bm$envelope[[1]])
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0  119  119  289  289
#> [2,]    0  156  156  246  246
#> [3,]    0   40   40  258  258
#> [4,]    0  204  204  240  240
#> [5,]    0   57   57  239  239
#> [6,]    0  153  153  239  239
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]  [,4]  [,5]
#> [1,]    0  676  676  2136  2136
#> [2,]    0 2996 2996  5290  5290
#> [3,]    0 2840 2840 10000 10000
#> [4,]    0 3369 3369  4635  4635
#> [5,]    0 2884 2884  7497  7497
#> [6,]    0 3382 3382  4626  4626

Treedata Objects

The outputs treedata_min and treedata_max contain the minimum values and maximum climatic variables for every tree.

Here we can take a look at the data that is in the first tree:

head(ppgm_bm$treedata_min[[1]]$data)
#>              bio1 bio4
#> pyrocephalus  119  676
#> zosteromus    156 2996
#> magister       40 2840
#> hunsakeri     204 3369
#> orcutti        57 2884
#> licki         153 3382

If you didn’t plot a traitgram during the initial run of the PPGM, these outputs can be used in the functions plotTraitGram (for a single phylogeny) or plotTraitGramMultiPhylo (for many phylogenies).

Viewing climate envelope occupancy geographically - MESS

The traitgram results show how these individual bioclimatic variables change through time. To see how these relate to geographic space, we can create a MESS map.

MESS, multivariate environmental suitability surfaces

To put the output of ppgm in the correct format, we need to extract the minimum and maximum climate envelope from the cem object. We need columns 1 and 2 for minimum, and 4 and 6 for the maximum. We also need to make sure that the row names from the cem output are retained.

cem_min <- cbind(ppgm_bm$cem[,1], ppgm_bm$cem[,2])
cem_max <- cbind(ppgm_bm$cem[,5], ppgm_bm$cem[,6])
rownames(cem_min) <- rownames(cem_max) <- rownames(ppgm_bm$cem)

Now we can calculate the MESS scores and create the MESS map. We select the option which.plot="mess" to select the MESS map.

mess <- ppgmMESS(cem_min, cem_max, ppgm_bm$node_est, tree = vigtrees, timeslice = 15,
                 which.biovars = c(1,4), which.plot = "none")

This will produce a plot in the working directory for the timeslices selected using the option timeslice. As we selected timeslice 15 (representing 15 mya), we get a plot of the reconstructed climate niche availability at 15 mya for the entire clade, output as a plot called “MESS15Multi.pdf”.

MESS at 15mya

MESS at 15mya

A PPGM with Fossils

A PPGM can be run with fossils, and we recommend running a PPGM with and without fossils to see what the effect of fossil inclusion can have on the reconstruction of climate niche tolerances through time. The set up for the PPGM is the same as with the extant PPGM, only now we will use our dataset of fossils.

Here we’ll select the first 6 fossil occurrences, to show how only a few fossils can have a really interesting effect on results

vigfossils <- head(scel_fossils)

This dataset includes 45 fossil Sceloporus occurrences ranging from early Miocene to late Pleistocene that are identified to genus level, but species identification is not possible.

One of the good things about PPGM is that we can include these fossils even though their taxonomic position is unknown. When fossils with unknown phylogenetic placement are included in the ppgm, the workflow utilises the function addFossil to randomly add fossils to appropriate branches that are present in the phylogeny within the age range of the fossil as given.

Here we will run a ppgm with the same variables as the previous models to see the changes that happen when fossils are included.

As the fossils are being added to the tree randomly, we can account for the uncertainty in node placement introduced by alternate fossil positions by increasing the number of permutations. Here we’ll set the number of permutations to 5 using permut = 5, however you can experiment with your own data to see how many permutations are necessary to include the range of uncertainty in node estimation.

Increasing the number of permutations will increase the computational time, so be aware that this step could take some time depending on your personal machine.

ppgm_bmfossil <- ppgm(occurrences, trees = vigtrees, fossils = vigfossils, bounds = bmbounds,
                      control = contr, which.biovars = c(1,4), model = "BM",
                      path = "BMfossil", permut = 5)

We can compare the results with the extant only ppgm

Traitgram without fossils showing Mean Annual Temperature

Traitgram without fossils showing Mean Annual Temperature

Traitgram with fossils showing Mean Annual Temperature

Traitgram with fossils showing Mean Annual Temperature

Even with the addition of only 6 fossils, the traitgram shows that the model reconstructs a different ancestral state for the climate tolerance of the Sceloporus total clade.

We can also create a MESS map with the fossil run and compare.

Format output for MESS as before:

fcem_min <- cbind(ppgm_bmfossil$cem[,1], ppgm_bmfossil$cem[,2])
fcem_max <- cbind(ppgm_bmfossil$cem[,5], ppgm_bmfossil$cem[,6])
rownames(fcem_min) <- rownames(fcem_max) <- rownames(ppgm_bmfossil$cem)

Run the MESS analysis for the fossil ppgm. Change the below code to say which.plot = "all".

mess_fossil <- ppgmMESS(fcem_min, fcem_max, ppgm_bmfossil$node_est, tree=vigtrees, timeslice=15,
                        which.biovars = c(1,4), which.plot = "none", fossils = vigfossils,
                        path = "fossil")
MESS at 15mya, no fossils

MESS at 15mya, no fossils

MESS at 15mya, fossils

MESS at 15mya, fossils

As you can see the MESS map using fossils is very different, suggesting climatic tolerances for Sceloporus much more North than using extant occurrences alone. This changes our understanding of where suitable climate for Sceloporus was during their early evolutionary history.