Shareholder quorum subsampling R function

Forbes similarity coefficient R functions

The Forbes coefficient is a measure of binary similarity between lists of things (such as species found in samples) that was proposed by Stephen Alfred Forbes in 1907. The basic equation is:

F = a N/[(a + b) (a + c)]

Where N = the number of different things, a = the number found in both lists, b = the number found only in the first list, c = the number found only in the second, and d = the number found in neither one. By definition, N = a + b + c + d.

I have proposed correcting the Forbes equation by dropping the d term and adding some constants:

F' = a (n + sqrt n)/[a (n + sqrt n) + 3/2 b c]

where n = a + b + c. I retain the symbol F for the original equation with d set to the zero. The following paper describes the indices and shows that they are relatively accurate when sampling is incomplete:

J. Alroy. In press. A new twist on a very old binary similarity coefficient. Ecology.

This page gives instructions for downloading and using two R functions that implement the original and corrected Forbes equations.

And who am I, anyway? I love those existential questions.

Arguments

The first R function is called forbes(). It expects two arrays and takes the following arguments:

x = the first array.
y = the second array.
corrected = a value indicating whether the corrected equation should be used (the default value TRUE or else FALSE).

The arrays can either be lists of things such as c('a','b','c') and c('a','c','d') or paired arrays of counts such as c(1,1,1,0,0) and c(1,0,1,1,0).

The second function is called forbesMatrix(). It expects a square numerical matrix in which zero means absent and there are no negative numbers. There are only two arguments:

x = the data matrix.
corrected = a value indicating whether the corrected equation should be used (the default value TRUE or else FALSE).

forbesMatrix() calls forbes(), so both functions have to be created in order for forbesMatrix() to work.

Returned values

forbes() returns the value of either F' or F (again, ignoring d) and forbesMatrix() returns a square matrix of coefficients with 1 on the diagonal.

Examples

Create two arrays and compute the corrected Forbes similarity:

x <- c('a','b','c','d')
y <- c('a','c','e','f','g','h')
forbes(x,y)

Create a distance matrix based on the basic F instead of the corrected F' and use it in a cluster analysis:

d <- 1 - forbesMatrix(myMatrix,corrected=F)
plot(hclust(d))