Coalescent-based species delimitation is sensitive to geographic sampling and isolation by distance

Nicholas A. Mason, Cornell University
Nicholas K. Fletcher, Cornell University
Brian A. Gill, Colorado State University
W. Chris Funk, Colorado State University
Kelly R. Zamudio, Cornell University

Abstract

© 2020, © The Trustees of the Natural History Museum, London 2020. All Rights Reserved. Species are a fundamental unit of biodiversity that are delimited via genetic data and coalescent-based methods with increasing frequency. Despite the widespread use of coalescent-based species delimitation, we do not fully understand the sensitivity of these methods to potential sources of bias and violations of their underlying assumptions. One implicit assumption of coalescent-based species delimitation is that geographic sampling is adequate and representative of genetic variation among populations within the lineage of interest. Yet exhaustive geographic sampling is logistically difficult, if not impossible, for many taxa that span large geographic expanses or occupy remote regions. Here, we examine the impact of geographic sampling on the output of Bayes-factor delimitation with SNAPP, a popular coalescent-based species delimitation pipeline. First, we demonstrate the problematic nature of sparse geographic sampling and isolation by distance for species delimitation using simulated data sets of populations connected by different levels of gene flow. We then examine whether similar trends are present in an empirical dataset of Andesiops mayflies (Ephemeroptera: Baetidae) from a high elevation transect in the Ecuadorian Andes. In both the simulated and empirical analyses, we systematically exclude geographically intermediate sites to quantify the impact of geographic sampling and isolation by distance on coalescent-based species delimitation. We find that removing intermediate sites with genetically admixed individuals incorrectly favors multi-species delimitation scenarios. Oversplitting is especially pronounced when isolation by distance is strong, but exists even when gene flow among neighboring populations is relatively high. These findings highlight the importance of adequate geographic sampling in species delimitation and urge caution in interpreting the output of such methods when species’ distributions are sparsely sampled and in systems characterized by strong patterns of isolation by distance.