Ecological Archives E093-073-A3

J. C. Douma, B. Shipley, J. P. M. Witte, R. Aerts, and P. M. van Bodegom. 2012. Disturbance and resource availability act differently on the same suite of plant traits: revisiting assembly hypotheses. Ecology 93:825–835.

Appendix C. Control analyses for robustness against missing trait values.

Allowing missing trait values for calculating a plot mean might affect the selection of sites for analysis and the trait-trait and trait-environment relationships. In this appendix it is shown that (i) the selected sites are not a biased selection from all sites available, (ii) the average number of species to calculate is in fact much higher than the lower bound set (50% and 20% respectively), (iii) the slopes of the regression lines are not significantly affected by introducing missing trait values, the increase in the uncertainty of the slopes is smaller than the increase of the percentage of missing trait data, and finally, (iv) the structure and significance of the SEM is not affected by the RGR values used. As a result, we conclude that our main conclusions that nutrient availability and disturbance partly affect the same suit of traits and that trait-trait constraints play an important rolestill hold. Step 1 and 2 are done for both the simplified model (Fig. 3) and the complex model of Appendix F. Step 3 is done for the model in Appendix F.

1. A biased subset of the total available number of sites?

The six available data sources represented 19 different vegetation types. The selected set of sites for which trait information was available was not biased compared to all available sites, because selected sites covered all 19 vegetation types. Additionally, the sites not included in the analysis were randomly spread over these 19 vegetation types: The correlation between the number of sites per vegetation type for the selected sites and the total number of available sites was 0.94 (on 10log transformed data to fulfill homogeneity of variance). Therefore the selection criteria have not led to an unbalanced data set.

2. A biased estimate of trait plot means?

Within our data set, plots had been chosen that had sufficient information for at least of 50% of the species (or 20% in case of RGR and LPC). However, might this selection and incomplete information have led to biased results? If missing data are non-randomly distributed, then this can lead to a biased estimate of the plot mean and of biased environment-trait and trait-trait relationships and thus to a different conclusion about the relative contribution of environmental drivers and trait-trait constraints. We performed a three step analysis to test this crucial issue. First, we calculated the actual percentage of species with trait information that were used to calculate plot means. Next, we tested whether the slopes of the paths of our SEM are significantly affected when allowing trait plot means to be based on incomplete data. Finally, we incorporated modelled RGR values in a SEM (of Appendix F) and tested whether the structure still holds.

Step 1: Was the percentage of species with trait information to calculate plot means really that low?

Our selection criterion set a minimum to the availability trait information in order to include the majority of the species per plot. This threshold was 50% of the species, assuming that these species give a good estimate of the true plot trait mean;  this minimum was lowered to > 20% for LPC and RGR as these traits were less well covered in the database but were core traits and essential to the analysis. In fact the average percentage of species used to calculate a plot mean was much higher than this minimum percentage (Fig. C1). In reality, the chances for a potential bias (if, in addition, species selection would have been selective and not random) are therefore much smaller than might have been concluded based on only this threshold stated in the manuscript. The only exception is for by RGR, for which on average indeed only 50% of the species had trait information. Given that we had already combined all available trait databases we tested as a second step whether correlations (and thus paths in our SEM) could have been affected by the non-complete trait information.



FIG. C1. The percentage of species used to calculate of plot mean for different traits (abbreviations of traits: Leaf nitrogen content (LNC), leaf phosphorus content (LPC), specific leaf area (SLA), 10log seed mass of the germinule (SM_g), 10log seed mass of the dispergule (SM_d), 10log maximum canopy height (maxCH), Growth form (GF), seedling relative growth rate (RGR), 10log germination onset (GO), flowering onset FO)).

Step 2: The comparison of the slopes of a complete set and the available data set and estimation of the uncertainty in the slopes

Our claim that ‘regenerative and establishment traits are linked’ was based both on the overall fit of the SEM and on the significance of those path coefficients linking the two groups of traits.  We deal with the overall fit in the next section.  Here we consider how missing data may affect the significance of the relevant path coefficients in our models. Since the path coefficients in a SEM are conceptually similar to the slopes of the equivalent regressions, the SEM model should be robust against missing trait values if the slopes of the relevant single regressions obtained from our data set (as used in this paper) and a smaller subset of the data set for which trait information is available for all species are not significantly different. Additionally, if the slopes of these regression lines are not significantly different for these two data sets, then the relative contribution of environmental drivers to trait selection and the significance of traits to trait selection will by definition remain unchanged.

To test for this, we used two data sets. The subset with which we compared our data set was defined by selecting those sites that had more than 90% of the species trait data available. Setting this criterium at 90% (and 70% for RGR analyses) ensured that at least 10 sites (average 42 sites) were available for the regression analysis. Note that this 90% means that on average only 1.4 species per plot were missing (a plot contained on average 18 species). The full set was defined as all 156 sites used in the SEM.

A significance test between the slopes of the two regression lines was performed as follows: a dummy variable (0 and 1 for the two data sets, respectively) was included in the regression: Y = a × X + b + c × group + d × group × X. If the slope of the subset is significantly different from the full set, then the parameter d will be significantly different from zero. Running these regressions for all environment-trait and trait-trait-trait relationships of the SEM model presented in Fig. 3 of the manuscript and the model presented in Appendix F (Fig. F1) showed that none of the regressions of the full set differed significantly from the subset (P > 0.05); in other words, the slopes of the regression were not significantly affected by allowing missing trait values up to a maximum of 80% for LPC and RGR and 50% for the other traits (See step 1, Fig. C1). This implies that the SEM would have the same slopes and the same cause-effect relationships if it would have been based on the subset (but with much less power, given the fewer degrees of freedom). Our claim that ‘regenerative and establishment traits are linked’ thus holds. In Table C1 we present the P values and estimates of the slopes.

TABLE C1. Comparison of slopes between subset and full set for all relationships used in the SEM (including the number of sites used (N), estimates of the parameters and P values). Non-significant parameters are indicated in bold.

Independent Dependent N Model Estimate P
log10 Soil C/N LNC 13 Intercept 28.619 0.0000
log10 Soil CN -4.967 0.0000
group (0 = full set, 1 = subset) 5.531 0.5020
group × log10 Soil CN -3.950 0.5230
log10 Soil C/N SLA 50 Intercept 24.695 0.0000
log10 Soil CN -2.549 0.0749
group (0 = full set, 1 = subset) 1.104 0.7698
group × log10 Soil CN -1.629 0.5815
log10 Soil C/N LPC 13 Intercept 2.6463 0.0000
log10 Soil CN -0.6484 0.0007
group (0 = full set, 1 = subset) -0.0324 0.9810
group × log10 Soil CN -0.0474 0.9627
log10 Soil C/N SM_g 20 Intercept -0.4638 0.0208
log10 Soil CN -0.5799 0.0000
group (0 = full set, 1 = subset) 0.0017 0.9979
group × log10 Soil CN -0.1915 0.7088
log10 Soil C/P LPC 70 Intercept 3.493 0.0000
log10 Soil CP -0.751 0.0000
group (0 = full set, 1 = subset) -1.352 0.0559
group × log10 Soil CP 0.546 0.0803
log10 Soil C/P LNC 13 Intercept 32.782 0.0000
log10 Soil CP -4.696 0.0000
group (0 = full set, 1 = subset) -3.873 0.3790
group × log10 Soil CP 1.706 0.3860
log10 Soil C/P SLA 50 Intercept 29.025 0.0000
log10 Soil CP -3.408 0.0000
group (0 = full set, 1 = subset) -1.170 0.7210
group × log10 Soil CP 0.093 0.9490
log10 Soil C/P GO 30 Intercept 0.263 0.0278
log10 Soil CP 0.124 0.0203
group (0 = full set, 1 = subset) -0.123 0.6444
group × log10 Soil CP 0.010 0.9311
log10 Soil C/P FO 139 Intercept 26.505 0.0000
log10 Soil CP 0.542 0.0203
group (0 = full set, 1 = subset) 0.544 0.4795
group × log10 Soil CP -0.218 0.5280
log10 Soil C/P SM_g 20 Intercept 0.431 0.0316
log10 Soil CP -0.313 0.0006
group (0 = full set, 1 = subset) -1.216 0.0457
group × log10 Soil CP 0.449 0.0943
log10 Soil C/P RGR 11 Intercept 0.1599 0.0000
log10 Soil CP -0.0041 0.5790
group (0 = full set, 1 = subset) 0.0316 0.6530
group × log10 Soil CP -0.0352 0.3020
TSD SLA 50 Intercept 20.936 0.0000
TSD 0.037 0.0015
group (0 = full set, 1 = subset) -0.419 0.5361
group × maxCH -0.038 0.2293
TSD maxCH 42 Intercept -0.172 0.0000
TSD 0.012 0.0000
group (0 = full set, 1 = subset) -0.035 0.4850
group × TSD 0.002 0.3250
TSD SM_d 44 Intercept -0.438 0.0000
TSD 0.018 0.0000
group (0 = full set, 1 = subset) -0.045 0.6020
group × TSD 0.007 0.1290
TSD FO 139 Intercept 28.003 0.0000
TSD -0.019 0.0000
group (0 = full set, 1 = subset) 0.052 0.6940
group × TSD -0.001 0.8470
TSD RGR 11 Intercept 0.1634 0.0000
TSD -0.0008 0.0000
group (0 = full set, 1 = subset) -0.0305 0.0680
group × TSD 0.0002 0.6210
maxCH SLA 27 Intercept 21.509 0.0000
maxCH 1.302 0.0932
group (0 = full set, 1 = subset) -2.148 0.0054
group × maxCH -1.950 0.3016
maxCH LNC 13 Intercept 22.4026 0.0000
maxCH 3.059 0.0000
group (0 = full set, 1 = subset) -1.139 0.2730
group × maxCH -0.456 0.7680
maxCH LPC 12 Intercept 1.837 0.0000
maxCH 0.246 0.0211
group (0 = full set, 1 = subset) -0.339 0.2561
group × maxCH 0.034 0.9276
maxCH SM_g 14 Intercept -0.270 0.0000
maxCH 0.819 0.0000
group (0 = full set, 1 = subset) 0.275 0.2143
group × maxCH 1.277 0.1947
maxCH FO 39 Intercept 27.721 0.0000
maxCH -1.451 0.0000
group (0 = full set, 1 = subset) 0.035 0.8360
group × maxCH -0.022 0.9616
maxCH GO 16 Intercept 0.184 0.0000
maxCH -0.153 0.0000
group (0 = full set, 1 = subset) 0.000 0.9782
group × maxCH -0.004 0.8721
maxCH GF 42 (Intercept) 0.1649 0.0000
maxCH 0.6159 0.0000
group (0 = full set, 1 = subset) 0.0083 0.5051
group × maxCH -0.0328 0.3329
GF FO 139 (Intercept) 28.1282 0.0000
GF -2.4638 0.0000
group (0 = full set, 1 = subset) 0.0416 0.7442
group × GF 0.0128 0.9763
GF GO 30 (Intercept) 0.6737 0.0000
GF -0.7824 0.0000
group (0 = full set, 1 = subset) 0.0042 0.9176
group × GF -0.0095 0.9249
GF SM_d 44 (Intercept) -0.5324 0.0000
GF 2.2220 0.0000
group (0 = full set, 1 = subset) -0.0261 0.6891
group × GF 0.3868 0.1680
GF RGR 11 (Intercept) 0.1670 0.0000
GF -0.0919 0.0000
group (0 = full set, 1 = subset) 0.0098 0.6116
group × GF -0.0179 0.5623
SM_g SM_d 15 Intercept 0.191 0.0000
SM_g 1.302 0.0000
group (0 = full set, 1 = subset) -0.165 0.4473
group × SM_g -0.242 0.5420
SM_g FO 19 Intercept 27.362 0.0000
SM_g -1.305 0.0000
group (0 = full set, 1 = subset) 0.831 0.0589
group × SM_g 0.734 0.3793
LNC RGR 11 (Intercept) 1.1357 0.0000
LNC 0.0007 0.4140
group (0 = full set, 1 = subset) -0.2021 0.0175
group × LNC 0.0065 0.0566
SLA RGR 11 (Intercept) 1.1280 0.0000
SLA 0.0011 0.1392
group (0 = full set, 1 = subset) -0.1301 0.0195
group × SLA 0.0042 0.0954
RGR maxCH 11 (Intercept) 7.5344 0.0000
RGR -6.5316 0.0000
group (0 = full set, 1 = subset) 3.5674 0.2805
group × RGR -2.8471 0.3369

Although for a SEM it is much more important to test to what extent the slopes of the relationships are significantly affected by missing trait data, we additionally investigated the role of missing trait data on the uncertainty of the slope estimates. To estimate the effect of missing trait data on the standard error of the slope, we ran a rarefying method which makes the number of trait data increasingly sparse. However, running the rarefying method and putting the newly calculated trait averages in the SEM for 500 or 1000 times would be a huge effort. Therefore, in analogue to the robustness test before, we ran the rarefying method for the bivariate trait-trait relationships which occur in the SEM. The proportion of missing trait data in the dependent variable was stepwise increased in steps of 5% up to 35% relative to the currently available data for that trait. Then new trait means were calculated for the plots and a regression was run on all plots to determine the slope and its standard error. Next, the standard error of the slope was calculated relative to the standard error of the slope of the bivariate relationships with the current number of available trait-data. This allows us to compare the increase in standard error among the bivariate relationships. This procedure was repeated 500 times to get a robust estimate of the standard error. The results are shown in Table C2. In all cases the standard error of the slope increases with increasing number of missing trait data. The results show that on average the standard error increases with 7% if 10% of the trait data is deleted. Particularly RGR is sensitive to missing trait data, but this is probably due to the already relative low availability of this trait. Also germination onset (GO) is sensitive to omissions of trait data. Although only 22% of the trait-data is missing, we think that this is because of the ordinal three point scale of this trait.

Based on these results, we think that the slope estimates are relatively robust against missing trait data, as the relative increase in the standard error of the slope is for most traits much less than the relative increase in missing trait data. Although the relative increase in the SE of GO is larger than 10% with an increase of 10% of missing trait data, we have the feeling that this does not really affect the SEM because GO is only affected by traits and not a parent of any other trait and because the number of trait data available for GO is among the highest of the traits (see table 1 of the manuscript), so the actual bias is relatively small. The increase in SE of the slope for RGR is also larger than 10%, with 10% more trait data missing. In the next section the effect of missing trait data on RGR has been analyzed in more detail.

TABLE C2. Relationship between the % missing trait data on the standard error of the slope for bivariate relationships. Slope indicates the increase of the standard error with increasing number of missing species. The last column indicates the % increase in standard error given a 10% loss of species trait data.

X Y Intercept Slope % increase in
st.error for 10%
maxCH SLA 0.99 0.0016 0.05
maxCH LPC 1.00 0.0012 0.04
maxCH LNC 1.00 0.0011 0.02
maxCH SM_g 1.00 0.0013 0.05
maxCH FO 0.99 0.0013 0.05
maxCH GO 0.99 0.0036 0.12
maxCH GF 1.00 0.0017 0.06
GF FO 0.99 0.0014 0.05
GF GO 0.99 0.0035 0.12
GF SM_d 0.99 0.0014 0.05
GF RGR 1.00 0.0042 0.14
SM_g SM_d 0.98 0.0019 0.07
SM_g FO 0.99 0.0016 0.05
LNC RGR 1.00 0.0024 0.08
SLA RGR 1.00 0.0024 0.08
RGR maxCH 1.00 0.0005 0.02

Step 3: Test of SEM with modelled RGR values

In contrast to other relationships from the full data set vs. the subset, the relationships of the leaf traits vs. RGR were close to being significantly different for the two data sets. Also the standard error of the slopes was relatively large compared to the other traits. This probably means that the plot means of RGR deviated to some extent from the ‘real’ plot mean.

To test whether the structure and significance of the SEM was affected by the deviating estimates of RGR, we ran an additional SEM (tested on the extended model Appendix F only) that included better estimates of the RGR plot means. We did not run a SEM for only those plots for which we had sufficient trait information for RGR, as this would have led to too few degrees of freedom to run this SEM model. Instead, we fitted a multiple regression model in which RGR was predicted based on growth form, LNC and SLA for the subset with known unbiased estimates of plot means for RGR (at least 70% of the species cover available). The parameter estimates of the multiple regression were used to predict the RGR values for the remaining sites with insufficient trait information. To avoid over-fitting, a random number was added to the predicted values (drawn from a normal distribution with a mean of zero and a standard deviation equal to the standard deviation of the residuals of the multiple regression). This procedure ensured that relations between RGR and growth form, LNC and SLA were not made stronger than in the default model. These predicted RGR values replaced the original RGR values and were used in the SEM (everything else kept equal – Fig. F1). This procedure was repeated multiple times, because the numbers are randomly drawn from a normal distribution and thus can lead to an over- or underestimation of the fit, and showed that neither the validity of the full model (P values remained equal), nor the structure of the full model, or the significance of any individual path was different from the original model. Additionally, for all traits, the dominant drivers and the dominant trait-trait constraints remained unchanged. Furthermore, the relative contribution of the traits and the environmental drivers remained equal. There was only a slight increase in the role of the leaf traits in determining RGR and the explained variance of RGR (from 0.12 to 0.14 and from 0.49 to 0.56 respectively – compare Table C3 below and Table F8 in manuscript) and the explained variance of SLA and maxCH increased slightly. Therefore, the plot mean RGR values as calculated in the paper did not change the interpretation of the results and the conclusions about the contribution of the environmental drivers and the role of trait-trait constraints in trait assembly (See Table C3).

TABLE C3. The effect of environmental constraints (cause; columns) on the selection of individual traits (effect; rows) relative to the effect of trait-trait constraints with the modeled RGR values. In the most right column the explained variance of the SEM with the plot mean RGR values as used in the manuscript.

Cause Environmental constraint Trait–trait constraints Dominant
driver
Dominant
trait
R2: final
model RGR
modeled*
R2: final
model*
Effect Nutrient
availability
TSD DE > IE Leaf
traits
Allometric
traits
Seed
traits
Relative
growth rate
LNC 1.00 0.00 yes 0.00 0.00 0.00 0.00 Nutrients   0.82 0.82
SLA 0.31 0.02   0.06 0.50 0.00 0.11 Nutrients Allometric traits 0.74 0.71
LPC 0.67 0.13 yes 0.02 0.15 0.00 0.03 Nutrients Allometric traits 0.97 0.97
RGR 0.12 0.23   0.14 0.44 0.00 0.07 Disturbance Allometric traits 0.56 0.49
maxCH 0.06 0.51 yes 0.07 0.23 0.00 0.13 Disturbance Allometric traits 0.73 0.69
GF 0.04 0.47   0.05 0.37 0.00 0.08 Disturbance Allometric traits 0.94 0.94
SM_g 0.23 0.30   0.04 0.35 0.00 0.08 Disturbance Allometric traits 0.63 0.63
SM_d 0.10 0.33   0.03 0.25 0.24 0.05 Disturbance Seed traits 0.92 0.93
GO 0.08 0.35   0.04 0.45 0.00 0.07 Disturbance Allometric traits 0.69 0.69
FO 0.16 0.10 yes 0.01 0.61 0.09 0.02 Nutrients Allometric traits 0.57 0.56

[Back to E093-073]