The Forbes coefficient is a measure of binary similarity between lists of things (such as species found in samples) that was proposed by Stephen Alfred Forbes in 1907. The basic equation is:
F = a N/[(a + b) (a + c)]
Where N = the number of different things, a = the number found in both lists, b = the number found only in the first list, c = the number found only in the second, and d = the number found in neither one. By definition, N = a + b + c + d.
I have proposed correcting the Forbes equation by dropping the d term and adding some constants:
F' = a (n + sqrt n)/[a (n + sqrt n) + 3/2 b c]
where n = a + b + c. I retain the symbol F for the original equation with d set to the zero. The following paper describes the indices and shows that they are relatively accurate when sampling is incomplete:
J. Alroy. In press. A new twist on a very old binary similarity coefficient. Ecology.
This page gives instructions for downloading and using two R functions that implement the original and corrected Forbes equations.
And who am I, anyway? I love those existential questions.
Arguments
The first R function is called forbes(). It expects two arrays and takes the following arguments:
The arrays can either be lists of things such as c('a','b','c') and c('a','c','d') or paired arrays of counts such as c(1,1,1,0,0) and c(1,0,1,1,0).
The second function is called forbesMatrix(). It expects a square numerical matrix in which zero means absent and there are no negative numbers. There are only two arguments:
forbesMatrix() calls forbes(), so both functions have to be created in order for forbesMatrix() to work.
Returned values
forbes() returns the value of either F' or F (again, ignoring d) and forbesMatrix() returns a square matrix of coefficients with 1 on the diagonal.
Examples
Create two arrays and compute the corrected Forbes similarity:
x <- c('a','b','c','d')
y <- c('a','c','e','f','g','h')
forbes(x,y)
Create a distance matrix based on the basic F instead of the corrected F' and use it in a cluster analysis:
d <- 1 - forbesMatrix(myMatrix,corrected=F)
plot(hclust(d))
See also
dist() is R's workhorse function for computing a matrix of distances, analogous to forbesMatrix. vegdist() in package vegan computes distances based on other metrics such as the Bray-Curtis dissimilarity coefficient (which is one minus the Dice similarity coefficient when the data are binary).