This is an archived static version of the original phylobabble.org discussion site.

Prunning off long branches

kmkozak87

Dear All,

I am building a pipeline to automatically generate gene trees for about 10,000 CDS alignments (all genes from an exome). The genes were sequenced for 150 individuals in multiple species. Some individuals are worse than others and occasionally have little data in some alignments, and end up on obviously artificially inflated branches. Is anyone aware of a tool to prune those automatically? (I will also use tools to get rid of poor sequence first, but that’s a different topic.)

Many thanks, Krzysztof Kozak

lpryszcz

Hi Krzysztof,

Have a look at ETE toolkit for tree handling (Python based). You will have to write simple script that recognises and prunes long branches. Here you have a pruning example.

You may want to look at phylomeDB phylogenetic tree reconstruction pipeline for some ideas how to improve alignments by using multiple aligners and trim poorly aligned regions automatically. Finally, the way to improve the phylogenetic reconstruction itself are also given.

Good luck!

yangya

I wrote a simple python script to pruning branches longer than a certain threshold. See the script “final_cut_long_branched.py” in https://bitbucket.org/yangya/phylogenomic_dataset_construction. The tree libraries phylo3.py and newick3.py are also included in the same repository, which were written by Stephen Smith.