This is an archived static version of the original phylobabble.org discussion site.

Model testing large multigene families

Gadget

Hi Folks, I am generating phylogenies for a large number of GPCR gene families from a variety of organisms. A lot of these gene families contain 2-3000 genes, and I can run prottest on them. As they are transmembrane, the JTT+G model has the best fit for some of the alignments. However there are some with more than 4000 sequences, which prottest cannot handle. Does anyone know of an alternative multithreaded program to determine the best fit amino acid model of sequence evolution for large datasets? The 4000+ gene families are close orthologues of the smaller families, however this is surely not enough to justify assuming the larger families must be JTT too. Can anyone share some advice on how to proceed? Thank you!

cmeehan

You could try PartitionFinder (http://www.robertlanfear.com/partitionfinder/). This looks for partitions in your data but also returns the best model fit. I don’t know how it handles such large sets but hopefully should do well. I know PartitionFinder is also implemented in the latest test version of PAUP* which also is good at handling large datasets.