This repository contains tutorials for learning about inference of species trees from gene trees that vary according to some process (such as coalescent sorting, horizontal transfer, or gene duplication and loss). We will use RevBayes to do inference under the full multispecies coalescent model. We will also use ASTRAL to perform fast species tree analyses that are scalable to large datasets, but also come with some tradeoffs. You will need RevBayes and ASTRAL available on your machine, along with FigTree or another tree viewer.
Now we'll carry out fast species tree analyses with ASTRAL. We can start by re-analyzing the same data that we just analyzed in RevBayes. To do so, download this file which simply contains unrooted estimates of gene trees for the loci that we just analyzed in RevBayes. You can run a quick ASTRAL analysis of these gene trees with:
astral4 -i /path/to/unrooted_trees.tre -o species.tre
Open this estimate of the species tree with FigTree and compare it to your estimate from RevBayes. Are the trees similar? What is likely driving any differences?
ASTRAL and related programs contain much more functionality and can scale to thousands of gene trees, whole genome alignments, and multi-copy gene families. The following documentation will explain much of this extra functionality and provide examples of very large scale analyses.
You might also be interested in SVDQuartets, which provides another way to estimate species trees using quartets. This approach estimates quartets directly from the distribution of site patterns in the data, and so relies less on assumptions about gene trees being estimated correctly.
Multiple software packages are also available that allow for species tree estimation when reticulation has occurred. Two widely used options include:
PhyloNet which provides several different ways of estimating phylogenetic networks from both sequence and SNP data.
PhyloNetworks which provides methods for estimating phylogenetic networks using quartets, as well as several tools for working with, plotting, and carrying out comaprative analysis with these networks.
Finally, duplication and loss within gene families can also lead to variation among gene trees. Available tools for estimating phylogeny in this scenario include:
ASTRAL-PRO which uses quartet scores to infer phylogeny similar to ASTRAL, but accounts for duplication and loss within gene families.
GeneRax which performs maximum likelihood estimation of gene family histories, including duplication and loss, as well as species trees.