Supplement 1. R code and the data set necessary to conduct the Random Forest analysis.
File List
dreissena_in_lakes_of_belarus.csv (MD5: 3dc2d2f89af3064223358983c785771d)
r_script_random_forest.R (MD5: af1295890d60bc832955e940889e4575)
Description
This Supplementary material contains two files necessary to fully reproduce the results obtained using the Random Forest classifier. The first of these files, dreissena_in_lakes_of_belarus.csv, is a plain text table that has 553 records, each described with the following variables:
1. Lake_Code: numeric codes uniquely identifying each lake (for reference only, not used in analysis explicitely).
2. ZMpresence: indicator of whether a lake is infested with zebra mussel (0 – for non-infested, 1 – for infested).
3. LAREA: lake area
4. LVOL: lake volume
5. MAXD: maximal depth
6. AVED: average depth
7. SPECWATSHED: specific watershed (i.e., drainage area)
8. TRANSP: Secci depth
9. COLOR: water color
10. pH: water pH
11. HCO3: HCO3 content
12. SO4: SO4 content
13. Cl: CL content
14. Ca: Ca content
15. Mg: Mg content
16. TDS: total dissolved solids
17: Fe: Fe content
18. Si: Si content
19. NH4: NH4 content
20. NO2: NO2 content
21. PO4: PO4 content
22. PermOx: permanganate oxydizability
23. N: latitude (decimal degree)
24: E: longitude (decimal degree)
Missing values in the data set are denoted as NA.
The second file, r_script_random_forest.R, loads the data into R (assuming that the file dreissena_in_lakes_of_belarus.csv is stored in the current R working directory), fits the Random Forest model, and plots the results. The analysis relies on three add-on packages: caret, geosphere, randomForest, and ggplot2. All these packages are assumed to be already installed on the user's computer (if not, they can be freely downloaded from the Comprehensive R Archive Network, cran.r-project.org, or installed directly from within R using the following command: install.packages(c("caret", "geosphere", "randomForest", "ggplot2"))).