Ecological Archives M074-012-A1

James P. Grover and Thomas H. Chrzanowski. 2004. Limiting resources, disturbance, and diversity in phytoplankton communities. Ecological Monographs 74:533–551.

Appendix A. A description of computational procedures.

Analyzing the association between diversity and number of limiting resources raises questions of estimation and inference similar to those in familiar analyses such as regression modeling, but conventional approaches (e.g., least squares, maximum likelihood) are problematic here. The relationship between the parameters and the statistic to be optimized is nonlinear and discontinuous, the solution to the optimization problem is not necessarily unique, and it is difficult to specify appropriate error terms and sampling distributions. Estimation of threshold parameters defining resource limitation is accomplished by a computationally intensive, exhaustive search algorithm that optimizes the correlation between diversity and number of limiting resources. This algorithm is described in this appendix. Significance tests for the optimized correlation were accomplished with a randomization approach described in the main text. A graphical evaluation of uncertainty in parameter estimates is also described in the main text.

The Exhaustive Search Algorithm

Here we describe the exhaustive search algorithm used to find thresholds defining resource limitation that maximize the correlation between diversity and number of limiting resources. Data for this procedure consist of a measure of diversity, Yt, and a set of resource availabilities, Rj,t, where j indexes the resources (j = 1, …, J), and t indexes the sampling time (t = 1, .., N). Formally, a count of the number of limiting resources at any time t , Xt, is developed from resource availabilities through an indicator function (Ij,t(j)) that designates a resource as limiting when its availability falls below a limitation threshold j:

.
(A.1)

Thus the number of limiting resources is the sum (over resources) of the indicator functions:

(A.2)

where is a parameter vector of the thresholds (j) defining resource limitation. Then, the optimization problem to be solved is

maximize

To present the exhaustive search procedure that achieves this, and its potential problems, we introduce an example data set small enough to illustrate all the calculations. The hypothetical data have J = 3 resources, and N = 3 sampling times, and the example calculation of the correlation between diversity and number of limiting resources is based on thresholds 1 = 0.22, 2 = 8.15, 3 = 20 (Table A1).

A given value of the threshold j partitions the resource availability data Rj,t into limiting and non-limiting cases, and the exact value of j achieving this is not unique. For example, the threshold 1 = 0.22 indicates that the observation R1,t = 0.2 is a limiting resource availability, and indicates that the observations R1,t = 0.24 and 0.3 are non-limiting. However, any value of q1 between 0.2 and 0.24 would produce the same partition of the R1,t data. Thus no unique estimate of 1 is associated with a particular correlation between diversity and the number of limiting resources. To proceed further, we arbitrarily limit the values of j considered. We define admissible values of j as the midpoints between successive ranked values of Rj,t, and we define the lowest admissible j as half the lowest observed value of Rj,t, and the highest admissible j as the highest observed value of Rj,t plus the lowest j. Following these rules, we obtain these sets of possible thresholds for the example data of Table 1: for R1,t, {0.1, 0.22, 0.27, 0.4}; for R2,t, {3.7, 8.15, 10.1, 15}; and for R3,t, {9.3, 20, 26.6, 41.1}. Although we thus define away the problem of non-unique thresholds, we cannot define away the inherent imprecision involved in their estimation. At best, the analysis will constrain an estimate to lie in the interval between two successive ranked observations of Rj,t.

The sets of admissible thresholds are finite, making it possible to search exhaustively for the values that produce the highest correlation between diversity and the number of limiting resources. For a data set with J resources and N sample times, there are (N + 1)J combinations of thresholds for the different resources; hence 64 combinations for the example data set, and about 7 × 106 for the lake data analyzed in the main text. Software that calculates the correlation between diversity and the number of limiting resources for each possible combination of thresholds, and records the highest, is straightforward, but two problems must be accommodated.

A first problem is that some combinations of thresholds produce an invariant series for the number of limiting resources Xt, and thus its correlation with diversity cannot be computed. Most obviously, when j is at its lowest or highest value for resource j, then that resource is either always non-limiting (Ij,t(j) = 0 for all t), or always limiting (Ij,t(j) = 1 for all t). If this is the case for all resources, then the number of limiting resource is invariant. Regarding the set of indicator functions as a matrix (with N rows and J columns), in such cases, each column consists entirely of either 0 or 1, thus each row sum (i.e., the count of limiting resources) is the same. In the example data set, one such invariant sequence for Xt arises when (1, 2, 3) = (0.1, 15, 41.1); here resource 1 has its lowest threshold, and the other two their highest. Resource 1 is never limiting, and Ij,t = 0 for all sample times t, while resources 2 and 3 are always limiting, so that Ij,t = 1 for all sample times; thus the number of limiting resources is always 2. For a data set with J resources, there are 2J combinations of thresholds that involve the highest or lowest possible threshold for each resource, hence 2J invariant sequences of Xt arise this way. There are 8 such invariant sequences for the example data set, and 16 for the lake data analyzed in the main text.

These inevitable invariant sequences, with thresholds for all resources at their highest or lowest values, are biologically uninteresting. They portray all resources as either always limiting or never limiting through their observed range of variation, and in such cases it is not theoretically credible to attribute variation in diversity to variation in the number of limiting resources. To avoid such problems, resource availabilities should be examined prior to analysis to ensure that some fall below biologically plausible thresholds for limitation, and some above.

Invariant sequences for the number of limiting resources Xt also arise from any other combinations of thresholds producing indicator matrices whose row sums are equal. In the example data, two threshold combinations produce invariant Xt: (1, 2, 3) = (0.22, 0.37, 26.6) and (1, 2, 3) = (0.22, 15, 26.6). Both of these combinations of thresholds indicate that there are always 2 limiting resources for the example data.

Combinatorial considerations suggest that the problem of invariant Xt will be most serious when N is small. With J resources and N sample times, the number of binary indicator matrices with equal row sums is

 

where  is the number of combinations of J things taken k at a time. The total number of possible binary indicator matrices is 2JN. As N gets large, the total number of possible indicator matrices grows much faster than the number with equal row sums, so it should become less likely that the indicator matrix for a given combination of thresholds has equal row sums. For the example data set, J = N = 3, and there are 512 possible indicator matrices, 56 of which have equal row sums. Thus a search of the 64 possible threshold combinations is likely to turn up a large proportion of invariant sequences, and 10 are found in this example. Of these, 8 are inevitable because they combine the highest and lowest thresholds for all three resources, and 2 are "accidental", resulting from patterns in the particular data set.

In our software implementing the exhaustive search for thresholds producing the highest correlation between diversity and number of limiting resources, we log the number of invariant sequences, and arbitrarily assign them a correlation of –1. This effectively removes such cases from consideration in the search. For the lake data analyzed in the main text, there are about 1061 possible indicator matrices, 1039 of which have equal row sums, a proportion of about 10-22. Thus on combinatorial grounds we expect few invariant sequences to be found, even when about 7 × 106 threshold combinations are evaluated. For EML, we found only the 16 inevitable invariant sequences resulting from combinations of highest and lowest thresholds for all resources. For JPL, we found 24 invariant sequences: the 16 inevitable ones, plus 8 accidental ones.

The presence of such accidental invariant sequences precludes some threshold combinations from consideration as best-fit estimates, and thus could contribute to error in estimated thresholds. Again, this problem is minimized by a large sample size, N. The number of thresholds defined for each resources is N + 1, so larger samples produce a finer dissection of the range of variation in resource availability. Hence, if a given threshold value is rejected because it produces an invariant Xt sequence, large N makes it likely that a threshold with a similar numerical value will prove acceptable in the search.

A second problem arises in the exhaustive search to find limitation thresholds that maximize the correlation between diversity and the number of limiting resources – the optimal combination of thresholds may not be unique. In the example data set, the highest correlation between diversity and the number of limiting resources is 0.9985, and 8 different combinations of thresholds yield this result (calculations for one combination are shown in Table A1). This problem could greatly contribute to uncertainty in estimated thresholds, and to doubt in the significance of the correlation. Combinatorial considerations and exploratory calculations suggest that this problem is most severe for small data sets. The number of unique sequences for the number of limiting resources Xt is (J + 1)N, while the number of possible threshold combinations to be searched is (N + 1)J. When the number of sample times N greatly exceeds the number of resources J, the number of unique sequences for Xt is much larger than the number of threshold combinations, making it unlikely that two threshold combinations share the same sequence Xt, including the sequence that correlates most strongly with diversity (one class of exceptions to this statement is discussed below).

For the data analyzed in the main text (N  50, J = 4), there are about 7 × 106 possible threshold combinations, and >1034 unique sequences for the number of limiting resources. Though it seems unlikely that two or more threshold combinations would share the same sequence for Xt, our search program records the 3000 highest correlations between diversity and number of limiting resources, and the associated thresholds for limitation j. Inspecting these, we found the set of thresholds producing the highest correlation between diversity and number of limiting resources was not unique for either lake. For some resources, it turned out that the correlation between diversity and the number of limiting resources was maximized at the lowest or highest admissible threshold. In such cases, the same value maximized correlation r*(Yt,Xt) is shared by both the lowest and highest threshold for that resource. For the highest admissible threshold, the resource in question is indicated to be always limiting, and the sequence of number of limiting resources Xt is just one plus the sequence associated with the lowest admissible threshold, portraying the resource as never limiting. Adding a constant (1 in this case) to one variable does not change its correlation with another variable. As discussed in the main text, such a finding implies biologically that variations in the limitation status of this resource do not affect variations in diversity.

Table A1. Hypothetical data, and calculation of the correlation between diversity and the number of limiting resources, for the thresholds 1 = 0.22, 2 = 8.15, 3 = 20.

Resource availabilities

Indicator functions

Number of limiting resources, Xt

Diversity index, Yt

R1,t

R2,t

R3,t

I1,t

I2,t

I3,t

0.3

11.3

21.4

0

0

0

0

4.4

0.2

8.9

31.8

1

0

0

1

7.3

0.24

7.4

18.6

0

1

1

2

10.8

r(Xt,Yt) = 0.9985



[Back to M074-012]