diff --git a/.Rbuildignore b/.Rbuildignore index ac4170d..e0f2835 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -8,3 +8,6 @@ misc/ ^packrat/ ^\.Rprofile$ ^\.github$ +^_pkgdown\.yml$ +^docs$ +^pkgdown$ diff --git a/.gitignore b/.gitignore index 7fbfdfe..787433b 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,8 @@ vignettes/BRCindicators_cache/ vignettes/BRCindicators_files/ misc/ tests/testthat/Rplots.pdf +error_reproduction_scripts + +# To reduce repo size +vignettes/vignette_files +vignettes/vignette_cache \ No newline at end of file diff --git a/DESCRIPTION b/DESCRIPTION index a3c11cd..2c707b0 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -52,6 +52,6 @@ VignetteBuilder: Config/testthat/edition: 3 Encoding: UTF-8 remotes: BiologicalRecordsCentre/sparta -RoxygenNote: 7.2.3 +RoxygenNote: 7.3.1 SystemRequirements: JAGS (https://sourceforge.net/projects/mcmc-jags/files/JAGS/) diff --git a/README.md b/README.md index 4a60461..7ce94d7 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,4 @@ Installing the package is easy and can be done in a couple of lines in R library(devtools) install_github(repo = 'biologicalrecordscentre/BRCindicators') -For more info read the vignette - - vignette('BRCindicators') +For more info read the vignette [here](https://biologicalrecordscentre.github.io/BRCindicators/articles/vignette.html) diff --git a/_pkgdown.yml b/_pkgdown.yml new file mode 100644 index 0000000..d71acfb --- /dev/null +++ b/_pkgdown.yml @@ -0,0 +1,4 @@ +url: ~ +template: + bootstrap: 5 + diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 0000000..318d93d --- /dev/null +++ b/docs/404.html @@ -0,0 +1,91 @@ + + +
+ + + + +
+require(snowfall)
This document shows how to use the indicator pipeline to create +biodiversity indicators such as those for DEFRA’s Biodiversity Indicators in +Your Pocket. The pipeline is shared in the form of an R package +called ‘BRCindicators’ making it easy to share and maintain.
+The functions in BRCindicators work with yearly estimates of species +abundance or occurrence and aggregate them into an scaled indicator +value with bootstrapped confidence intervals
+This package has the ability to read in the output of occupancy +models created in the R package sparta, a package for estimating species +trends from occurrence data. This package can be installed from Github +and details of how to use the package are given in the +package vignette. There is no need to use sparta to create your +yearly species estimates as BRCindicators can also work with other +data.
+To create an indicator we first need to have species trends, let’s +create some using the sparta R package.
+If you already have yearly estimates of abundance or occurrence for +your species you can skip this stage. Here we show how you can create +these estimates from raw species observation data using sparta.
+
+# Install BRCindicators from github
+library(devtools)
+#install_github('biologicalrecordscentre/sparta')
Let’s assume you have some raw data already, we can under take +occupancy modelling like this
+
+# Create data
+n <- 8000 # size of dataset
+nyr <- 50 # number of years in data
+nSamples <- 200 # set number of dates
+nSites <- 100 # set number of sites
+set.seed(125) # set a random seed
+
+# Create somes dates
+first <- as.Date(strptime("1950/01/01", "%Y/%m/%d"))
+last <- as.Date(strptime(paste(1950+(nyr-1),"/12/31", sep=''), "%Y/%m/%d"))
+dt <- last-first
+rDates <- first + (runif(nSamples)*dt)
+
+# taxa are set semi-randomly
+taxa_probabilities <- seq(from = 0.1, to = 0.7, length.out = 26)
+taxa <- sample(letters, size = n, TRUE, prob = taxa_probabilities)
+
+# sites are visited semi-randomly
+site_probabilities <- seq(from = 0.1, to = 0.7, length.out = nSites)
+site <- sample(paste('A', 1:nSites, sep=''), size = n, TRUE, prob = site_probabilities)
+
+# the date of visit is selected semi-randomly from those created earlier
+time_probabilities <- seq(from = 0.1, to = 0.7, length.out = nSamples)
+time_period <- sample(rDates, size = n, TRUE, prob = time_probabilities)
+
+myData <- data.frame(taxa, site, time_period)
For demonstration purposes I have a faked dataset of 8000 species +observations. In my dataset the species are named after the letters in +the alphabet. Below I show how I can use the Bayesian occupancy models +in sparta to create yearly estimates of occurrence. For more information +please see the vignette +for sparta
+
+# Preview of my data
+head(myData)
## taxa site time_period
+## 1 r A51 1970-01-14
+## 2 v A87 1980-09-29
+## 3 e A56 1996-04-14
+## 4 z A28 1959-01-16
+## 5 r A77 1970-09-21
+## 6 x A48 1990-02-25
+
+# First format our data
+formattedOccData <- formatOccData(taxa = myData$taxa,
+ site = myData$site,
+ survey = myData$time_period)
## Warning in errorChecks(taxa = taxa, site = site, survey = survey, replicate =
+## replicate, : 94 out of 8000 observations will be removed as duplicates
+
+# Here we are going to use the package snowfall to parallelise
+library(snowfall)
+
+# I have 4 cpus on my PC so I set cpus to 4
+# when I initialise the cluster
+sfInit(parallel = TRUE, cpus = 4)
## R Version: R version 4.4.0 (2024-04-24 ucrt)
+## snowfall 1.84-6.3 initialized (using snow 0.4-4): parallel execution on 4 CPUs.
+
+# Export my data to the cluster
+sfExport('formattedOccData')
+
+# I create a function that takes a species name and runs my model
+occ_mod_function <- function(taxa_name){
+
+ library(sparta)
+
+ # Note that this will write you results to your computer
+ # the location is set to your user folder
+ occ_out <- occDetFunc(taxa_name = as.character(taxa_name),
+ n_iterations = 200,
+ burnin = 15,
+ occDetdata = formattedOccData$occDetdata,
+ spp_vis = formattedOccData$spp_vis,
+ write_results = TRUE,
+ output_dir = '~/Testing_indicator_pipe',
+ seed = 123)
+}
+# I then run this in parallel
+system.time({
+para_out <- sfClusterApplyLB(unique(myData$taxa), occ_mod_function)
+})
## user system elapsed
+## 0.06 0.07 162.00
+
+# Stop the cluster
+sfStop()
##
+## Stopping cluster
+
+# We can see all the files this has created
+list.files('~/Testing_indicator_pipe')
## [1] "a.rdata" "b.rdata" "c.rdata" "d.rdata" "e.rdata" "f.rdata" "g.rdata"
+## [8] "h.rdata" "i.rdata" "j.rdata" "k.rdata" "l.rdata" "m.rdata" "n.rdata"
+## [15] "o.rdata" "p.rdata" "q.rdata" "r.rdata" "s.rdata" "t.rdata" "u.rdata"
+## [22] "v.rdata" "w.rdata" "x.rdata" "y.rdata" "z.rdata"
+Installing the package is easy and can be done in a couple of +lines
+
+library(devtools)
+
+install_github('biologicalrecordscentre/BRCindicators')
Now that we have some species trends data to work with (no doubt you +already have your own) we can use the first function in BRCindicators. +This function reads in all the output files from sparta (which are quite +large and complex) and returns a simple summary table that we can use +for calculating the indicator. If you have done your analysis without +using sparta you can skip to the next step.
+
+library(BRCindicators)
+
+# All we have to supply is the directory where out data is saved
+# You will note this is the 'output_dir' passed to sparta above.
+trends_summary <- summarise_occDet(input_dir = '~/Testing_indicator_pipe')
## Loading data...done
+
+# Lets see the summary
+head(trends_summary[,1:5])
## year a b c d
+## [1,] 1950 0.71950820 0.7444262 0.5445902 0.7289617
+## [2,] 1951 0.64792350 0.7115847 0.6653552 0.5564481
+## [3,] 1952 0.61234973 0.6830055 0.2850820 0.5503279
+## [4,] 1953 0.47890710 0.4786339 0.5400546 0.6146448
+## [5,] 1954 0.40245902 0.5300546 0.6093443 0.4449180
+## [6,] 1955 0.04765027 0.4463934 0.6180874 0.3305464
+Returned from this function is a summary of the data as a matrix. In +each row we have the year, specified in the first column, and each +subsequent column is a species. The values in the table are the mean of +the posterior for the predicted proportion of sites occupied, a measure +of occurrence.
+Once we have species-year indicies we are in a position to proceed to +calculating an indictor. To do this there are a number of mehods +available, some of which are presented here in ‘BRCindicators’
+The geometric mean method is often used with data that do not have +errors associated with them.
+The first step is to re-scale the data so that the value for all +species in the first year is the same. Once this is done we calculate +the geometric mean across species for each year creating the indicator +value. This function also accounts for species that have no data at the +beginning of the dataset by entering them at the geometric mean for that +year, this stops them dramatically changing the indicator value in the +year they join the dataset. It also accounts for species that leave the +dataset before the end by holding them at their last value. Finally +limits to species values can be given, preventing extremely high or low +values biasing the indicator.
+The data I have generated in ‘trends_summary’ is very easy to work +with but to show off what this function can do I’m going to mess it up a +bit.
+
+trends_summary[1:3, 'a'] <- NA
+trends_summary[1:5, 'b'] <- NA
+trends_summary[2:4, 'c'] <- 1000
+trends_summary[45:50, 'd'] <- NA
+
+# Let's have a look at these changes
+head(trends_summary[,1:5])
## year a b c d
+## [1,] 1950 NA NA 0.5445902 0.7289617
+## [2,] 1951 NA NA 1000.0000000 0.5564481
+## [3,] 1952 NA NA 1000.0000000 0.5503279
+## [4,] 1953 0.47890710 NA 1000.0000000 0.6146448
+## [5,] 1954 0.40245902 NA 0.6093443 0.4449180
+## [6,] 1955 0.04765027 0.4463934 0.6180874 0.3305464
+
+tail(trends_summary[,1:5])
## year a b c d
+## [45,] 1994 0.3276503 0.5717486 0.1508743 NA
+## [46,] 1995 0.5151913 0.4853552 0.6555738 NA
+## [47,] 1996 0.4673224 0.4795628 0.4220765 NA
+## [48,] 1997 0.3340984 0.6895628 0.7624590 NA
+## [49,] 1998 0.6545355 0.1724044 0.3939891 NA
+## [50,] 1999 0.5532787 0.3660656 0.7278142 NA
+Now that I have ‘messed up’ the data a bit we have two species with +data missing at the beginning and one species with data missing at the +end. We also have one species with some very high values.
+Now lets run this through the re-scaling function.
+
+# Let's run this data through our scaling function (all defaults used)
+rescaled_trends <- rescale_species(Data = trends_summary)
+
+# Here's the result
+head(rescaled_trends[,c('year', 'indicator', 'a', 'b', 'c', 'd')])
## year indicator a b c d
+## [1,] 1950 100.00000 NA NA 100.0000 100.00000
+## [2,] 1951 113.66130 NA NA 10000.0000 76.33433
+## [3,] 1952 124.43795 NA NA 10000.0000 75.49475
+## [4,] 1953 112.24009 112.24009 NA 10000.0000 84.31784
+## [5,] 1954 92.53638 94.32317 NA 111.8904 61.03448
+## [6,] 1955 94.92278 11.16766 94.92278 113.4959 45.34483
+
+## year indicator a b c d
+## [45,] 1994 94.69719 76.79046 121.57878 27.70419 96.72414
+## [46,] 1995 109.25300 120.74391 103.20775 120.37929 96.72414
+## [47,] 1996 103.80961 109.52502 101.97604 77.50351 96.72414
+## [48,] 1997 110.66363 78.30168 146.63123 140.00602 96.72414
+## [49,] 1998 93.88742 153.40162 36.66071 72.34598 96.72414
+## [50,] 1999 100.38848 129.67035 77.84156 133.64439 96.72414
+You can see that species ‘a’ and ‘b’ enter the dataset at the +geometric mean (the indicator value), all species are indexed at 100 in +the first year and the very high values in ‘c’ are capped at 10000 at +the end ‘d’ has been held at it’s end value.
+The ‘indicator’ column that is returned here is our indicator, +calculated as the geometric mean of all the species in the data set.
+We can get confidence intervals for this indicator by bootstrapping +across species. We have a function for that too!
+
+# This function takes just the species columns
+scaled_species <- rescaled_trends[,!colnames(rescaled_trends) %in% c('year', 'indicator')]
+indicator_CIs <- bootstrap_indicator(Data = scaled_species)
## | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
+
+# Returned are the CIs for our indicator
+head(indicator_CIs)
## quant_025 quant_975
+## [1,] 100.00000 100.0000
+## [2,] 79.51664 185.3712
+## [3,] 92.57867 197.7989
+## [4,] 71.91175 186.9973
+## [5,] 76.48790 111.0772
+## [6,] 74.68679 116.8558
+It is sometimes desirable to create a smoothed indicator value from +the raw values. This can be achieved by fitting a GAM (general additive +model) to the indicator using a spline. This spline is a smoothed curve +that goes through the raw values for the indicator and is fitted using +the function ‘gam’ in the ‘mgcv’ R package.
+
+# The smoothing function takes the indicator values
+smoothed_indicator <- GAM_smoothing(rescaled_trends[,'indicator'])
+
+# In this example there is little support for a non-linear trend and
+# so the line almost linear
+plot(x = rescaled_trends[,'year'], y = rescaled_trends[,'indicator'])
+lines(x = rescaled_trends[,'year'], y = smoothed_indicator, col = 'red')
+# But if our indicator did support a non-linear trend it might look
+# like this
+eg_indicator <- jitter(sort(rnorm(50)), amount = 0.5)
+eg_smoothed <- GAM_smoothing(eg_indicator)
+plot(x = 1:50, y = eg_indicator)
+lines(x = 1:50, y = eg_smoothed, col = 'red')
Where there is little support for a non-linear trend a GAM smoothed +line will tend towards linear. Where there is good support for a +non-linear trend the smoothed line will become more ‘bendy’.
+We now have our indicator and the confidence intervals around it. The +next step is to plot it. We have included a function that creates a +simple plot using ggplot2, however you could easily create your own +plots in R using the data.
+
+# Plot our indicator.
+plot_indicator(indicator = rescaled_trends[,'indicator'],
+ smoothed_line = smoothed_indicator,
+ CIs = indicator_CIs)
In this plot you can see the high upper confidence interval in years +2-4, this is due to the artificially high values we gave to species +‘c’.
+The Bayesian Meta-Analysis method, or BMA, is suited to data with +standard errors associated with them. As with other methods we require +data from more than one species, across a number of years, with an error +for each species-year estimate.
+
+# Here is an example dataset for the BMA method
+data <- data.frame(species = rep(letters, each = 50),
+ year = rep(1:50, length(letters)),
+ index = runif(n = 50 * length(letters), min = 0, max = 1),
+ se = runif(n = 50 * length(letters), min = 0.01, max = .1))
+head(data)
## species year index se
+## 1 a 1 0.4783269 0.02657337
+## 2 a 2 0.4443901 0.08308297
+## 3 a 3 0.9653410 0.07088607
+## 4 a 4 0.2612954 0.06101668
+## 5 a 5 0.7607192 0.04758781
+## 6 a 6 0.7500249 0.01699128
+It is important that your data is in the same format and that your
+columns are in the same order and have the same names. Remember you can
+use the function read.csv()
to read in the data from a .csv
+on your computer.
BMA is run using the function bma
, here we will use the
+default settings and then see what we can change.
+bma_indicator <- bma(data)
## [1] "Warning: No negative index values detected. Are you sure you transformed the data?"
+##
+## Processing function input.......
+##
+## Done.
+##
+## Compiling model graph
+## Resolving undeclared variables
+## Allocating nodes
+## Graph information:
+## Observed stochastic nodes: 1274
+## Unobserved stochastic nodes: 1291
+## Total graph size: 8158
+##
+## Initializing model
+##
+## Adaptive phase.....
+## Adaptive phase complete
+##
+##
+## Burn-in phase, 5000 iterations x 3 chains
+##
+##
+## Sampling from joint posterior, 5000 iterations x 3 chains
+##
+##
+## Calculating statistics.......
+##
+## Done.
+
+The function returns a plot to your screen which is a diagnostic plot
+of the model. When the model has converged (i.e. reached a point where
+the three chains agree on the answer) the lines on the plots on the left
+will sit on top of one another and the plots on the right will have a
+nice bell shape. You can turn off this plot by setting plot
+to FALSE
. By default the method runs the chains in series.
+Running them in parallel makes the models run faster (about half the
+time) but will slow down your computer more. We can change this with the
+parameter parallel
. The number of iterations the model runs
+is controlled by n.iter
and defaults to 10000. If you can
+it is better to run it for more iterations, though this will take
+longer. m.scale
gives the scale your data is on. It is very
+important that this is correct, choose from ‘loge’ (natural log,
+sometimes simply called ‘log’), ‘log10’ (log to the base 10), or ‘logit’
+(output from models of proportions or probabilities).
Let’s implement a few of these changes
+
+bma_indicator2 <- bma(data,
+ parallel = TRUE,
+ n.iter = 500,
+ m.scale = 'log10')
## [1] "Warning: No negative index values detected. Are you sure you transformed the data?"
+##
+## Processing function input.......
+##
+## Done.
+##
+## Beginning parallel processing using 11 cores. Console output will be suppressed.
+##
+## Parallel processing completed.
+##
+## Calculating statistics.......
+##
+## Done.
+
+Because we have reduced the number of interations the model no longer +has a good convergence. The lines on the graphs on the left do not +overlap and the graphs on the right are no longer a smooth bell +shape.
+The object that is returned is a data.frame with years as rows and
+columns giving the year value, index value and confidence intervals. You
+can write this to a csv using the function write.csv
.
+head(bma_indicator)
## Year Index.Mprime lowerCI.Mprime upperCI.Mprime Index.M lowerCI.M upperCI.M
+## 1 1 100.00000 100.00000 100.0000 100.00000 100.00000 100.0000
+## 2 2 99.28257 96.55888 101.9275 99.36687 96.95978 101.7398
+## 3 3 98.70232 94.93140 102.6840 98.96504 94.93866 103.0453
+## 4 4 98.33416 93.95436 102.8283 98.76247 93.68349 104.0300
+## 5 5 98.37354 93.83166 103.2512 98.72980 92.84002 104.7310
+## 6 6 98.82353 94.46779 103.5405 98.83989 92.47682 105.3083
+We can use the plotting function in BRCindicators to plot the results +of this analysis, which in this case are not all that interesting!
+
+plot_indicator(indicator = bma_indicator[,'Index.M'],
+ CIs = bma_indicator[,c(3,4)])
The multi-species indicator method was developed by Statistics
+Netherlands and the code is made available on their
+website. To find out more about the inner working of this method
+please read the detailed
+documentation on the authors website. Here is a simple example of
+how this method runs in BRCindicators
.
+# Create some example data in the format required
+nyr = 20
+species = rep(letters, each = nyr)
+year = rev(rep(1:nyr, length(letters)))
+
+# Create an index value that increases with time
+index = rep(seq(50, 100, length.out = nyr), length(letters))
+# Add randomness to species
+index = index * runif(n = length(index), 0.7, 1.3)
+# Add correlated randomness across species, to years
+index = index * rep(runif(0.8, 1.2, n = nyr), length(letters))
+
+se = runif(n = nyr * length(letters), min = 10, max = 20)
+
+data <- data.frame(species, year, index, se)
+
+# Our species are decreasing
+plot(data$year, data$index)
+# Species index values need to be 100 in the base year. Here I use
+# the first year as my base year and rescale to 100. The standard error
+# in the base year should be 0.
+min_year <- min(data$year)
+
+for(sp in unique(data$species)){
+
+ subset_data <- data[data$species == sp, ]
+ multi_factor <- 100 / subset_data$index[subset_data$year == min_year]
+ data$index[data$species == sp] <- data$index[data$species == sp] * multi_factor
+ data$se[data$species == sp] <- data$se[data$species == sp] * multi_factor
+ data$se[data$species == sp][1] <- 0
+
+}
+
+# Our first year is now indexed at 100
+plot(data$year, data$index)
+# Alternativly I could read in data from a csv
+# data <- read.csv('path/to/my/data.csv')
+
+# Run the MSI function
+msi_out <- msi(data)
+head(msi_out$CV)
## species mean_CV
+## 1 a 0.2073205
+## 2 b 0.2097742
+## 3 c 0.2327759
+## 4 d 0.2181222
+## 5 e 0.2148589
+## 6 f 0.2171923
+
+# I can capture the output figures too
+# pdf('test.pdf')
+# msi_out <- msi(data)
+# dev.off()
The code returns two plots to the console, the first plot shows the
+coefficient of variation (CV) for each of the species. Species with high
+values of CV may adversly effect the relaibility of the trend
+estimation. Use this graph to identify the CV values of the species and
+use the maxCV
parameter to set a threshold above which
+species will be excluded. The results of excluding species in this way
+can be tested by comparing trend plots. The CV values are hard to assign
+to species from this plot as the species are coded to numbers. To see
+the raw values look at the CV component of msi_out
+(i.e. msi_out$CV
). The second plot shows the smoothed trend
+and the MSI values. These two figures can be captured in the usual way
+in R by using pdf()
for example. In the example I create a
+dataset from random numbers but usually you would use
+read.csv()
to read in data from a local file.
Here is a second example which sets some additional parameters. The
+parameters for msi
get passed to msi_tool
so
+to see a list of all the parameters you can change look at the help
+documentation in msi_tool
usign ?msi_tool
at
+the R console. I cover most of hte important ones here.
+msi_out <- msi(data,
+ nsim = 500, # The number of Mote Carlo simulations
+ SEbaseyear = 10, # The year to index on
+ plotbaseyear = 15, # The year to set as 100 in plots
+ index_smoot = 'INDEX', # plotbaseyear uses MSI not trend
+ span = 0.7, # 'wigglyness' of line, between 0 and 1
+ lastyears = 5, # last X years of time series for short-term trends
+ maxCV = 10, # maximum allowed Coefficient of Variation
+ changepoint = 10, # compare trends before and after this year
+ truncfac = 8, # max year-to-year index ratio
+ TRUNC = 5, #set all indices below TRUNC to this
+ plot = TRUE # should the plots be returned?)
+ )
This set of parameters is unrealistic but shows the options +available. Note that in the second graph the year 10 point now has a se += 0, year 15 MSI is set to 100, and the short term trend is reported for +the last 5 years.
+The analysis also returns data which provide more insights into the +analysis and let you create your own plots if required.
+
+# The returned object has 2 elements
+head(msi_out$results)
## year MSI sd_MSI lower_CL_MSI upper_CL_MSI Trend lower_CL_trend
+## 1 1 174.55 8.59 158.52 192.22 165.98 153.83
+## 2 2 160.34 7.87 145.63 176.55 172.92 164.98
+## 3 3 189.88 9.07 172.92 208.52 177.94 171.72
+## 4 4 169.84 9.00 153.08 188.44 180.72 173.44
+## 5 5 164.29 9.15 147.30 183.24 180.97 173.44
+## 6 6 198.95 9.07 181.95 217.54 179.10 171.72
+## upper_CL_trend trend_class
+## 1 178.73 moderate_decline
+## 2 182.34 moderate_decline
+## 3 184.17 moderate_decline
+## 4 187.89 moderate_decline
+## 5 187.89 moderate_decline
+## 6 187.89 moderate_decline
+The first of the two elements (results
) returned gives
+all the data, and a little more, that is presented in the second
+figure.
+# The returned object has 2 elements
+msi_out$trends
## Measure value significance
+## 1 overall trend 0.9633 moderate decline
+## 2 SE overall trend 0.0021
+## 3 trend last 5 years 0.9359 moderate decline
+## 4 SE trend last 5 years 0.0166
+## 5 changepoint (10) 10.0000
+## 6 trend before changepoint (10) 0.9891 moderate decline
+## 7 SE trend before changepoint (10) 0.0048
+## 8 trend after changepoint (10) 0.9621 moderate decline
+## 9 SE trend after changepoint (10) 0.0045
+## 10 % change -43.3920 p<0.01
+## 11 SE % change 3.0630
+## 12 % change last 5 years -9.6510 p<0.01
+## 13 SE % change last 5 years 2.5580
+## 14 changepoint NA p<0.01
+
+# I could write this as a csv too
+# write.csv(msi_out$trends, file = 'path/to/my/output.csv')
The second element (trends
) returned give a summary of
+various trend assessments across the time series.
We have also added a plot method for the MSI output which provides a +plot similar to that of the second figure we have seen already. Lets use +this plot method to explore the effect of changing the span value in the +analysis
+
+for(i in c(0.3, 0.5, 0.7)){ # use a range of values for span
+
+ msi_out <- msi(data, span = i, # span is set to i
+ nsim = 200, plot = FALSE)
+ print( # print makes the plot visible in the for loop
+ plot(msi_out, title = paste('MSI - span =', i)) # plot
+ )
+
+}
As the value of span gets closer to 1 the trend line gets +smoother.
+The lambda indicator calculates an indicator using growth rates from +one year to the next. Formulating the indicator in terms of growth rates +has two distinct advantages over the conventional approach to +constructing indicators. First, it means that the categorisation of +species as ‘increasing’ or ‘decreasing’ can be made from the same set of +data (the growth rates) as the construction of the indicator. Second, it +provides an elegant solution to the problem of species that join the +indicator after the first year (i.e. where the first year is +unreliable): other indicators typically adopt a complicated rescaling +approach to ensure that species entering the indicator after the first +year do not bias the overall assessment. It also makes a simple and +robust, though untestable, assumption about species that drop out of the +indicator prior to the final year: specifically it assumes that their +fluctuations are the same, in aggregate, as those of the species that +remain in the indicator. For more details see http://webarchive.nationalarchives.gov.uk/20170302170037/http://jncc.defra.gov.uk/Docs/UKBI2015_TechBG_C4b-D1c_Bayesian_Final.docx
+Very few species’ models produced reliable occupancy estimates for +every year, so a majority of the time series contain missing values. +This presents a problem for estimating growth rates for each +species-year combination. Missing values of growth rate that would be +equivalent to linear interpolation of the log odds between adjacent +years with reliable estimates were therefore calculated. This indicator +can therefore work with missing values.
+Input data is on the occupancy scale, and is therefore bounded +between 0 and 1.
+
+# number of species
+nsp = 50
+
+# number of years
+nyr = 40
+
+#number of iterations
+iter = 500
+
+# Build a random set of data
+myArray <- array(data = rnorm(n = nsp*nyr*iter,
+ mean = 0.5,
+ sd = 0.1),
+ dim = c(nsp, nyr, iter),
+ dimnames = list(paste0('SP',1:nsp),
+ 1:nyr,
+ 1:iter))
+
+# Ensure values are bounded by 0 and 1
+myArray[myArray > 1] <- 1
+myArray[myArray < 0] <- 0
+
+str(myArray)
## num [1:50, 1:40, 1:500] 0.501 0.44 0.454 0.309 0.307 ...
+## - attr(*, "dimnames")=List of 3
+## ..$ : chr [1:50] "SP1" "SP2" "SP3" "SP4" ...
+## ..$ : chr [1:40] "1" "2" "3" "4" ...
+## ..$ : chr [1:500] "1" "2" "3" "4" ...
+lambda_indicator
takes in an array of data, a
+three-dimensional matrix. The dimensions of this array represent
+species, years, and iterations. Each row represents a species and each
+column a year. The third dimension of the array contains the iterations.
+Essentially each slice contains occupancy estimates for each species
+year combination for a single iteration and the overall array contains
+as many slices as there are iterations.
+# Run the lambda_interpolation method on this data
+myIndicator <- lambda_indicator(myArray)
+
+# Plot the indicator
+plot_indicator(myIndicator$summary[,'indicator'],
+ myIndicator$summary[,c('lower' ,'upper')])
There are a number of options available in the
+lambda_indicator
function
+myIndicator <- lambda_indicator(myArray,
+ index = 1, # Set the index value to 1 not 100
+ year_range = c(30,40), # Set year range
+ threshold_yrs = 5) # set a threshold
+plot_indicator(myIndicator$summary[,'indicator'],
+ myIndicator$summary[,c('lower' ,'upper')])
Note that there are a range of threshold functions that allow you to +adjust which data points are used in the indicator. There are options to +remove species year estimates based on their standard deviaction, Rhat +value and based on the number of years a species is present in the +dataset. Note that the Rhat threshold can only be used if you are using +a directory path as you input rather than an array.
+We have demonstrated how you might run the indicator functions one at +a time, however in a ‘pipeline’ we want data to flow through seamlessly. +Additionally there are a number of parameters in the functions that we +have not shown you that you might find useful. Here is an example of how +you can create your own pipeline function. Our function will wrap around +the functions described above, setting the parameters to meet our needs. +Once we have done this it will allow use to execute our pipeline in one +line.
+
+# I call my function 'run_pipeline' and the only arguement it
+# takes is the directory of sparta's output
+run_pipeline <- function(input_dir){
+
+ require(sparta)
+ require(BRCindicators)
+
+ # Create the trends summary
+ trends_summary <- summarise_occDet(input_dir = input_dir)
+
+ # Rescale the values and get the indicator values
+ # Here I set the index to 1 and change the value limits
+ rescaled_trends <- rescale_species(Data = trends_summary,
+ index = 1,
+ max = 100,
+ min = 0.001)
+
+ # Bootstrap the indicator to get CIs
+ scaled_species <- rescaled_trends[,!colnames(rescaled_trends) %in% c('year', 'indicator')]
+ # This time I set the iterations to twice the default and
+ # use custom confidence intervals
+ indicator_CIs <- bootstrap_indicator(Data = scaled_species,
+ CI_limits = c(0.25, 0.75),
+ iterations = 20000)
+
+ # Get the smoothed indicator line
+ smoothed_indicator <- GAM_smoothing(rescaled_trends[,'indicator'])
+
+ # This time I specify the years and index value
+ plot_indicator(indicator = rescaled_trends[,'indicator'],
+ year = rescaled_trends[,'year'],
+ index = 1,
+ CIs = indicator_CIs,
+ smoothed_line = smoothed_indicator)
+
+ ## I'll return all my data
+ return(cbind(smoothed_indicator, indicator_CIs, as.data.frame(trends_summary)))
+ }
Once we have created this function we can run this pipeline on a +directory in one line, or put it in a loop to run across many +directories.
+
+# Now we can run the pipeline in one line, like a boss
+indicator_data <- run_pipeline(input_dir = '~/Testing_indicator_pipe')
## Loading data...done
+## | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
+
+
+head(indicator_data)
## smoothed_indicator quant_25 quant_75 year a b c
+## 1 0.9176004 1.0000000 1.0000000 1950 0.71950820 0.7444262 0.5445902
+## 2 0.9180913 0.8852408 1.0106197 1951 0.64792350 0.7115847 0.6653552
+## 3 0.9185822 0.9424826 1.0381675 1952 0.61234973 0.6830055 0.2850820
+## 4 0.9190731 0.8238429 1.0059378 1953 0.47890710 0.4786339 0.5400546
+## 5 0.9195640 0.8421503 0.9556367 1954 0.40245902 0.5300546 0.6093443
+## 6 0.9200549 0.8428048 1.0049734 1955 0.04765027 0.4463934 0.6180874
+## d e f g h i j
+## 1 0.7289617 0.5140437 0.5191257 0.6771585 0.77109290 0.3362842 0.6269399
+## 2 0.5564481 0.4214208 0.6875956 0.6949180 0.15005464 0.5807104 0.6743716
+## 3 0.5503279 0.4503825 0.4346995 0.8420765 0.75590164 0.7197268 0.4124590
+## 4 0.6146448 0.4059563 0.7480874 0.4806557 0.02382514 0.8195082 0.7612022
+## 5 0.4449180 0.8789071 0.8591257 0.3380874 0.49256831 0.4837705 0.6448634
+## 6 0.3305464 0.8898907 0.6350820 0.8497814 0.81546448 0.1274863 0.5738251
+## k l m n o p q
+## 1 0.6919126 0.7751366 0.4428415 0.9062295 0.5555191 0.3422404 0.8508743
+## 2 0.3614754 0.7466667 0.7452459 0.8836066 0.9113661 0.7820219 0.3772131
+## 3 0.8724590 0.6345902 0.6355191 0.6290710 0.7693443 0.6283607 0.8969399
+## 4 0.6209836 0.6358470 0.8611475 0.9265027 0.7610929 0.4493989 0.9014754
+## 5 0.3384153 0.7289617 0.5253552 0.7102186 0.9492896 0.7979781 0.8237158
+## 6 0.5411475 0.7955191 0.8191257 0.8801639 0.9270492 0.9244262 0.6891803
+## r s t u v w x
+## 1 0.3624590 0.8491257 0.9315847 0.8820765 0.7602186 0.9207650 0.8680328
+## 2 0.7379235 0.6858470 0.4902732 0.9032240 0.6163388 0.8891803 0.7600546
+## 3 0.8562295 0.8755738 0.5028415 0.9459563 0.6913115 0.6890710 0.7570492
+## 4 0.6147541 0.8377049 0.9469945 0.7153552 0.9408743 0.7243169 0.8547541
+## 5 0.4848634 0.3191803 0.9298361 0.3034426 0.8632787 0.8720765 0.4791257
+## 6 0.6475410 0.8012568 0.6865574 0.9415301 0.8552459 0.7222404 0.8656284
+## y z
+## 1 0.8845355 0.9492350
+## 2 0.8683607 0.8152459
+## 3 0.8835519 0.8234973
+## 4 0.8295082 0.7040437
+## 5 0.9681421 0.8576503
+## 6 0.8666667 0.7646448
+