Ecological Archives E083-047-A2

Elizabeth E. Holmes and William F. Fagan. 2002. Validating population viability analysis for corrupted data sets. Ecology 83:2379-2386.

A pdf version of Appendix B is also available for viewing.

Appendix B. Derivations of the distributions of the estimated parameters.

Distribution of the estimate, Eq. 2

Consider for the moment, an idealized estimate using subsampled data to eliminate overlap in the Nt+/Nt ratios and L = 1. Let's call it . We can derive the distribution of by observing that the slope of var(ln(Nt + ) - ln(Nt)) vs. ( = 1,2,…,') is basically

since the var(ln(Nt+)-ln(Nt)) vs. line is generally straight. Using

where (,) is a gamma distribution with shape and scale , the distribution of is straight-forward to derive.

(B.1)

Note that the sequential Nt+'/Nt ratios are chosen so that there is no overlap thus each ratio is independent. Assume for the moment, that the two gamma distributions are independent -- which they are not. In this case, we can show as follows that the limiting distribution of Eq. B.1 as df' and df1 become large is 2 with df' degrees of freedom.

The moment generating function of is . Thus the moment generating function for the distribution in Eq. B.1 is

.

Take the natural log of this to get,

.

Using the Taylor expansion for ln(1+x) and multiplying the second element by ,

.

Ignoring higher order terms, the ln(mgf) has the form:

which is the ln(mgf) of the following 2 distribution:

As noted, the gamma distributions for the variances of ln(Nt+'/Nt) and var(ln(Nt+1/Nt)) are actually correlated. The effect of the correlation, as seen from numerical experiments, is to cause the distribution in Eq. B.1 to approach the limiting distribution faster (i.e., when the dfs in the gamma distributions are smaller).

The used in the Dennis-Holmes method is somewhat different than the idealized used in this derivation. First, the
Nt+'/Nt ratios cannot generally be subsampled due to short time series. This means the ratios are correlated and df' is substantially less than the number of ratios minus one; additionally the lack of subsampling makes biased. The data are running sum transformed (L > 1); this leads to further bias. These are trade-offs that improve estimation for short corrupted time series by reducing the number of negative variance estimates (percent errors column in Table B1). Despite the differences, understanding the limiting distribution for the idealized helps us understand why when we estimated a non-idealized (L > 1 and data not subsampled) from simulated data, we observed that showed a distribution of the form for a wide range of time series lengths, non-process to process error ratios, and filter lengths (Table B1).

Monte Carlo estimation was used to numerically estimate the 2 distributions for the estimates used in the Dennis-Holmes method (= the slope of ln(Rt+ /Rt) vs. for = 1,2,3,4). Monte Carlo estimation uses parameter estimates from samples of data generated with simulations to calculate the distribution of the parameter estimate (this is akin to parametric bootstrapping). We generated 5000 time series of length n using the model, Nt+1 = Nt exp(+ p), Ot = Nt exp(np) where the process error, p ~ Normal(0, p), and the non-process error, np ~ Normal(0, np). Let mean( ) denote the mean of all 5000 estimates. For each simulation, we calculated the statistic = . We then found the best fitting dfslp parameter such that

.

This was done by finding the dfslp that maximized the P value from a Kolmogorov-Smirnov goodness of fit test. The fitting process was repeated for different time series lengths (n), filter lengths (L), ratios of process to non-process error (p/np) and and p. The best fitting dfslp values for different n, L and (p/np) are given in Table B1 with the P values for the fitted distribution. The observed bias and parameters from the simulations are given in Table B2. The degrees of freedom depended mainly on the length of the time series, n, and the length of the filter, L. There was an approximately linear relationship between n, L and the dfslp values in Table B1. The following formula gives a close approximation of the numerically calculated dfslp:

dfslp = 0.333 + 0.212 n - 0.387 L   for   n > 15.

Variance of , Eq. 3

Given n observations, O1, O2, O3 … On, of the true population size, N1, N2, N3 … Nn, the Ot series is transformed into a running sum, R1, R2, R3 … Rr where r = n-L+1 and .

Denote as the mean of the N's that comprise the running sum, Rt: , and recall Ot = np,t Nt.

Note that is the mean of the ln(Rt+1/Rt) ratios from the time series; however, for corrupted time series, the variance of the is not 1/(n-L) times the variance of the ln(Rt+1/Rt) ratios, as it would be the case for uncorrupted time series:

Using the variance of the ln(Rt+1/Rt) ratios would lead to high overestimation of the variance of . This overestimation is greater for smaller L.

 

Estimate of the distribution of from data, Eq. 4

If and were known, it would be straight-forward to specify the distribution of , (i.e., Normal() however, instead we have to use estimates of and which themselves have some distribution. Below is outlined an estimate of the distribution of which uses only . Deriving a distribution based on both and appears problematic given the nature of the distribution of (see below) and given that the estimate is not independent of .

By simple algebra, we can rewrite as

 

Point estimate of

A point estimate of can be calculated by noting that , thus

where from the data.

Tables

Table B1. Numerically calculated degrees of freedom for the 2 distribution describing the ratio and percent errors (negative variance estimates). P values give the fit of the empirical distribution from a Kolmogorov-Smirnov goodness of fit test. For the degrees of freedom calculations, negative were removed from the sample. For the simulations, = -0.01 and = 0.1. The results were not sensitive to alternate parameter values in the ranges: –0.2<<0.2 or 0.01<<0.3.

 

 

np = 0

np/p = 0.5

np/p = 1

np/p = 2

np/p = 3

Years

L

df

P value

% errors

df

P value

% errors

df

P value

% errors

df

P value

% errors

df

P value

% errors

10

3

1.3

0.02

15

1.4

0.09

19

1.4

0.03

25

1.4

0.03

31

1.4

0.10

32

10

4

1.3

0.02

19

1.4

0.02

21

1.3

0.00

26

1.1

0.00

33

0.9

0.00

35

10

5

1.5

0.28

38

1.5

0.05

42

1.5

0.18

50

0.9

0.00

58

0.7

0.00

58

15

3

2.1

0.19

1

2.2

0.51

2

2.2

0.52

3

2.1

0.35

7

2.1

0.60

8

15

4

1.8

0.04

0

1.9

0.08

1

2.0

0.06

1

1.9

0.06

3

1.9

0.09

3

15

5

1.6

0.01

2

1.6

0.01

3

1.6

0.03

5

1.6

0.01

10

1.6

0.00

9

20

3

3.3

0.29

0

3.1

0.18

0

3.2

0.54

1

3.4

0.85

2

3.4

0.75

3

20

4

2.8

0.08

0

2.8

0.05

0

2.9

0.03

0

3.3

0.23

0

3.2

0.27

0

20

5

2.4

0.06

0

2.3

0.06

0

2.3

0.06

0

2.4

0.12

2

2.4

0.11

2

30

3

5.7

0.20

0

5.5

0.31

0

6.0

0.78

0

5.9

0.98

0

5.7

0.94

0

30

4

4.8

0.10

0

4.9

0.12

0

5.4

0.18

0

5.7

0.22

0

5.6

0.33

0

30

5

4.1

0.07

0

4.1

0.16

0

4.5

0.30

0

4.5

0.31

0

4.3

0.31

0

40

3

7.8

0.49

0

7.9

0.34

0

8.0

0.28

0

8.4

0.93

0

7.8

0.94

1

40

4

7.0

0.17

0

6.7

0.04

0

7.4

0.15

0

8.2

0.81

0

8.1

0.68

0

40

5

6.3

0.09

0

6.6

0.06

0

6.4

0.17

0

6.6

0.82

0

6.2

0.87

0

 

Table B2. Numerically calculated mean bias between and , expressed as a percentage of , and parameter (Eq. 11) describing the relationship between and . Negative estimates were removed from the sample before calculations.  For the simulations, = -0.01 and = 0.1. The results were not sensitive to alternate parameter values in the ranges: –0.2<<0.2 or 0.01<<0.3.

 

 

np = 0

np/p = 0.5

np/p = 1

np/p = 2

np/p = 3

Years

L

% bias

 

% bias

  

% bias

  

% bias

  

% bias

  

10 3 -75 0.25 -71 0.27 -61 0.35 -17 0.60 54 0.45
10 4 -87 0.12 -86 0.14 -80 0.18 -53 0.35 -1 0.32
10 5 -95 0.05 -94 0.05 -91 0.08 -77 0.17 -51 0.16
15 3 -48 0.52 -44 0.55 -31 0.65 24 1.01 120 0.92
15 4 -61 0.40 -58 0.41 -48 0.50 -7 0.79 70 0.79
15 5 -73 0.26 -71 0.28 -66 0.33 -40 0.51 7 0.53
20 3 -34 0.65 -31 0.68 -17 0.80 33 1.15 128 1.15
20 4 -45 0.54 -43 0.57 -31 0.67 11 1.00 93 1.09
20 5 -57 0.42 -55 0.44 -48 0.51 -20 0.72 34 0.81
30 3 -22 0.77 -19 0.81 -7 0.91 44 1.32 140 1.48
30 4 -33 0.67 -29 0.70 -20 0.79 25 1.16 110 1.42
30 5 -43 0.56 -41 0.58 -35 0.64 -5 0.89 53 1.09
40 3 -19 0.81 -15 0.84 -4 0.94 46 1.36 136 1.63
40 4 -29 0.71 -25 0.74 -15 0.83 28 1.21 110 1.56

40

5

-39

0.61

-37

0.63

-30

0.68

-1

0.94

54

1.20




[Back to E083-047]