Ecological Archives A013-011-A1

Benjamin Bolker, Toshinori Okuyama, Karen Bjorndal, and Alan Bolten. 2003. Sea turtle stock estimation using genetic markers: accounting for sampling error of rare genotypes. Ecological Applications 13:763–775.

Appendix A. A description of CML/UML methods.

In order to find maximum likelihood estimates (either CML or UML), we simply define a function that calculates the multinomial likelihood equations given in the text for given values, along with their derivatives with respect to the parameters. We then feed this function to a standard quasi-Newton optimizer (the L-BFGS-B optimizer [Byrd et al. 1995], as implemented in the R language by Brian Ripley). Parts of the likelihood and derivative calculation are written in C for speed.

The optimizer works well, although not as fast as more specialized expectation-maximization methods [Smouse et al. 1990]. Maximizing the likelihood directly is conceptually simpler but gives the same results as a least-squares algorithm that accounts for multinomial variances [Xu et al. 1994]. The only tricky part is the constraint that all haplotype frequencies within rookeries and contributions by rookeries must add up to 1; this constraint does not fit within the "box constraints'' (min < x < max) supported by our optimizer. Instead, we transform a set of frequency parameters pi into $p'_i = p_i/(1-\sum_{j=1}^{i-1} p_j)$ ; this states the parameters in terms of the fraction of the remaining probability that rookery i (or haplotype i) contributes, rather than the fraction of the overall probability. The likelihood is calculated using these parameters, and the derivative vector is transformed to account for this transformation. We have also implemented an expectation-maximization method for finding UML estimates, which is (as expected) much faster than the direct optimization method described above, but expectation-maximization combined with bootstrapping is still slower than MCMC for confidence limit estimation.

Literature cited

Byrd, R. H., P. Lu, J. Nocedal, and C. Zhu. 1995. A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Computing 16:1190–1208.

Smouse, P. E., R. S. Waples, and J. A. Tworek. 1990. A genetic mixture analysis for use with incomplete source population data. Canadian Journal of Fisheries and Aquatic Sciences 47:620–634.

Xu, S., C. J. Kobak, and P. E. Smouse. 1994. Constrained least squares estimation of mixed population stock composition from mtDNA haplotype frequency data. Canadian Journal of Fisheries and Aquatic Sciences 51:417–425.



[Back to A013-011]