Shareholder quorum subsampling is a method I invented in 2009 that is similar in some ways to rarefaction, but conceptually very different. Rarefaction tells you how many species you would find in a given ecological sample given a fixed, uniform sample size. SQS tells you how many species you would find given fixed "coverage" of the underlying abundance distribution. "Coverage" (Good 1953) is simply the sum of the frequencies of the species you have sampled (so, it is 100% when all species are known). When coverage is fixed, individual samples may be large (given even abundances) or small (given uneven abundances). It all depends on the data.
The important point is that when samples have identical coverage, the ratio of their species counts should exactly mirror the ratio one would obtain if one could examine their entire respective sampling universes. That is, with SQS "twice as diverse" really does mean TWICE AS DIVERSE. The same is not true of rarefaction (much less raw species counts).
This page gives instructions for downloading and using version 3.3 of the R function I have written to perform SQS. Version 1.0 was released in July, 2009; versions 2.0 and 3.1 featured successively improved subsampling algorithms (respectively released in March and July, 2011); version 3.2 has slightly better reporting of stats (released August 2011); and version 3.3 (released December 2011) includes a minor bug fix. And you know something, 3.3 really rocks — it's almost 100% unbiased regardless of what the abundance distribution looks like.
Meanwhile, Fossilworks has a built-in diversity curve generator that implements SQS, and I have also written a Perl script to produce SQS diversity curves from PaleoDB occurrence data files. Write me an e-mail if you want it.
Do not use the R function with taxonomic occurrence data because literature compilations introduce biases that have to be dealt with using algorithms I describe in my 2014 Paleobiology paper on turnover rates. Use the Perl program or PaleoDB website instead.
And who am I, anyway? I love those existential questions.
Arguments
The R function is called sqs(). It expects a single specimen count array (i.e., the abundance data from a single fossil collection/quadrat/census/whatever) and takes the following arguments:
Important note: the 3.3 algorithm is not guaranteed to return subsamples with the absolute exact quorum that was requested. Version 4.0 will. But at the moment, for example, if you enter q=0.80 you may get back a mean subsampled u (see below) of 0.78. This problem is more serious for samples with very high dominance and/or analyses employing very low q values. My recommendation is to (1) see if it helps to exclude the dominant; (2) not perform analyses with q below about 0.4; and (3) show subsampling curves with subsampled richness on the y-axis and the returned average u values on the x-axis (instead of the submitted q values).
Returned values
On the plus side, the R script is dead-simple and returns a whole bunch of standard ecological statistics. Use them at your own peril, however, as most of them either don't work or don't tell you much. The returned values are:
Examples
Read an array and run a basic SQS analysis with a quorum of 0.8:
x = scan("/Users/me/datafile")
sqs(x,0.8,100)["subsampled richness"]
Perform a standard rarefaction analysis with a quota of 200 items and print all the stats:
sqs(x,200,100,"rarefaction")
Iterate through a series of quorum levels and plot the results (a subsampling curve):
levels = seq(.05,.95,.05)
r = array()
u = array()
for (i in 1:length(levels)) {
s = sqs(x,levels[i],300,"","no")
r[i] = s["subsampled richness"]
u[i] = s["subsampled u"]
}
plot(u,r,col="darkorange",cex=0.6,type="o",log="y",xlab="quorum",ylab="richness")
Read a PaleoDB occurrence file, turn the data into a count vector, and analyze it (even though I just told you not to):
o = read.table("/Users/me/download.csv",header=T,sep=",")
t = table(o[,"occurrence.genus_name"])
sqs(t,0.5,100)
A note on nomenclature
As mentioned, SQS and rarefaction aim to do fundamentally different things: the first equates fair sampling with uniform "coverage," and the second equates fair sampling with uniform sampling. In other words, the disagreement is on how to answer the classic philosophical question "What Is The Good?" We might just as well be talking about the Meaning Of Life here.
So, whatever you do, do not call SQS rarefaction. If the phrase "quorum subsampling" makes you want to puke or something, call it "fixed coverage subsampling" or some other bland boring thing — but don't confuse yourself and everyone else by equating it with a method whose founding premise couldn't possibly be more different.
Nonetheless, there are general similarities between rarefaction and SQS. Both seek to make sampling "fair" in some sense by looking at subsamples, and both are general methodologies with different possible implementations. For example, there is a standard algorithmic method for performing rarefaction (Simberloff 1972) in addition to an exact equation (Hurlbert 1971). And there are several algorithms that yield good approximations of fixed coverage under differing assumptions (e.g., under the assumption that published literature tends to overrepresent occurences of newly described taxa). These algorithms are described in my three 2010 papers and in one of my 2014 papers (see my CV).
The general idea of quorum subsampling and my name for it were both first published in my 2009 GSA abstract. By a remarkable coincidence, Lou Jost shortly afterwards published a paper that briefly described the idea of fixed coverage subsampling, which was suggested to him by Anne Chao. However, Jost's paper gave no details about the method, didn't name it, didn't use it, and advocated it only as a means of improving certain statistics that are used to describe abundance distributions (such as Pielou's J).
Chao and Jost (2012) went on to rename the method "coverage-based rarefaction". Ecologists continue to use that name and attribute the method to them. But there shouldn't be a priority issue here — SQS is not "rarefaction" in any sense, and because Jost thought he was dealing with some kind of "rarefaction" but didn't actually name the method, I certainly have name priority in addition to general priority.