This paper appeared in a journal I don’t commonly read, so I wanted to highlight it. The ideas are not new (as they acknowledge) but it’s a good reminder that we should be fighting model mis-specification on all fronts. Comments, anyone?
[Paper] Correcting for sequencing error in maximum likelihood phylogeny inference
If anyone wants to use this model (described by Felsenstein in Inferring Phylogenies) in a Bayesian context, it is implemented in BEAST. We describe our implementation in Rambaut et al (2008) MBE [doi:10.1093/molbev/msn256]. In our paper we were modelling postmortem DNA damage so provide extensions where the error rate is a function of time in the ground and where specific types of nucleotide replacements are happening. But the basic homogeneous error model is also available (turned on in the ‘Sites’ panel in BEAUti).
I think this simulation is useful to demonstrate that sequencing error should be modeled somewhere. Incorporating the uncertainty in the observation before the alignment phase might improve things further, since a misread character may result in the alignment opening a gap. Then, you should be able to propagate the error in the (marginal) tip state in a similar manner.
There are a number of multiple sequence alignment programs that output a per-column measure of uncertainty. I know that FSA’s is in terms of an expected accuracy, which could be used as a prior on the error tip state:
Re BEAST, here’s the link (note that paper info is pasted automatically if a pubmed link is given on its own line):
A simulation study sure seems like some low-hanging fruit for some student or postdoc. I’ve set up a framework that makes it easy to do lots of simulation using INDELible. Anyone keen?