Skip to content

Well Sampled Sites (WSS)

Dr Tom August edited this page Apr 11, 2014 · 1 revision

This function undertakes a 'Well Sampled Sites' (WSS) analysis as laid out by Roy et al (2012). This method accounts for variation in recording intensity between sites and excludes data that may introduce error into trend estimates.

The approach, based on a mixed effects model, uses continuous time in contrast to some of the other methods in sparta. That is to say that rather than comparing two time periods the WSS method predicts the trend in occurrence over time using a generalised linear mixed-effects model.

Another strength of WSS is that poor quality data that might result in inaccurate models is removed prior to analysis

# Load the sparta library
library(sparta)

#load example dataset
data(ex_dat)

# Run the mixed model analysis
MM_out<-WSS(Data=ex_dat,
            year_range=c(1970,2000),
            min_list=1,
            min_years=2,
            site_col='kmsq',
            sp_col='CONCEPT',
            start_col='TO_STARTDATE',
            end_col='Date')

min_list and min_years specify the requirements you are placing on the data for it to be included in the analysis. In this instance we are saying that we are only including data from visits that return more than 1 species, and only from sites where this criteria was fulfilled in 2 or more years. We can also specify the year range using year_range, this will have the same effect as subsetting your data manually and then running WSS without specifying year_range. Since every dataset is different there are arguments in WSS that allow you to specify which columns are your date, species, site, etc. There is also flexibility as to whether each observation has a known year (use 'year_col') or whether observations are associated with a year range (use 'start_col' and 'end_col').

When the analysis is running progress will be reported to the console

[1] "Starting list-length model"
[1] "Recasting data..."
[1] "Modelling Species 1 - Species 1 of 52"
[1] "Modelling Species 10 - Species 2 of 52"
[1] "Modelling Species 11 - Species 3 of 52"
[1] "Modelling Species 12 - Species 4 of 52"
[1] "Modelling Species 13 - Species 5 of 52"
...
...
...

When fitting models to large datasets this process can take some time and use a significant amount of memory. For example on a dataset of 614 species totaling 420,000 observations using a regular PC takes approximately 30 minutes.

The mixed model requires that observations are accurate to the day. As a result observations will be removed from the analysis if the start and end dates are different. This happens in the above example:

Warning messages:
1: In WSS(Data = ex_dat, year_range = c(1970, 2000), min_list = 1,  :
  1307 rows of data were removed as start date and end date were from different dates

Additionally we will get any warnings generated during the modelling process. In this example we have a few models that had problems reacting convergence.

2: In mer_finalize(ans) : false convergence (8)
3: In mer_finalize(ans) : false convergence (8)
4: In mer_finalize(ans) : false convergence (8)
5: glm.fit: algorithm did not converge 

These model warnings are captured in the output

head(MM_out)

              CONCEPT         year    year_SE year_zscore year_pvalue  intercept
Species 1   Species 1 -0.006134995 0.01534148  -0.3998960   0.6892331  -2.272305
Species 10 Species 10 -0.012316615 0.04308105  -0.2858940   0.7749593  -8.038884
Species 11 Species 11 -0.994450942 3.88958208  -0.2556704   0.7982054 -29.005549
Species 12 Species 12  0.665963463 0.65654133   1.0143512   0.3104152 -22.674402
Species 13 Species 13 -0.011563443 0.01453529  -0.7955424   0.4262980  -2.047039
Species 14 Species 14  0.051192771 0.10629963   0.4815894   0.6300977  -9.496142
           intercept_SE yearZero Ymin Ymax cvg_code pVisitsUsed nVisitsUsed
Species 1     0.1277307     1985  -15   15        4   0.3609172        1086
Species 10    0.9948553     1985  -15   15        8   0.3609172        1086
Species 11   52.9839105     1985  -15   15        8   0.3609172        1086
Species 12   17.5863212     1985  -15   15        4   0.3609172        1086
Species 13    0.1161715     1985  -15   15        4   0.3609172        1086
Species 14    1.7510260     1985  -15   15        4   0.3609172        1086
           nSpeciesObs   change_10yr
Species 1          142 -4.956766e+00
Species 10          35 -9.962461e+00 
Species 11           2 -3.225806e+01
Species 12           2  1.532374e+10
Species 13         155 -8.529735e+00
Species 14          14  1.175604e+02

For details of what these results mean see the WSS help file by typing '?WSS' into the R console. In short, year columns give details of the trend estimate, change_... gives the percentage change over a given time period and is dependent on the arguments trend_option and NYears, while cvg_code captures information on model warnings. cvg (convergence) codes can be looked up using the function cvg_codes, (use ?cvg_codes for more information).


###References

Hill, M.H. (2011) Local frequency as a key to interpreting species occurrence data when recording effort is not known. Methods in Ecology & Evolution, 3 (1), 195-205.

Clone this wiki locally