Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Publication Source (Journal or Book title)
Prasanna, A. N., Gerber, D., Kijpornyongpan, T., Aime, M. C., Doyle, V. P., & Nagy, L. G. (2020). Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Systematic biology, 69 (1), 17-37. https://doi.org/10.1093/sysbio/syz029