Predicting the most likely location for a unknown discrete trait tip on the phylogeny


I’m looking for a method that would enable me to predict with some confidence value the most likely trait for a tip on my phylogeny. For example, I have a set of sequences from different locations except that the location for one of the sequences is missing. I would like to predict this missing trait and get a probability for the assignment. I thought that something like the ancestral character evolution (ace in APE) might work but it only seems to allow for known traits at the tips.

I’m currently making the assumption that my phylogeny is correct but if it was possible to apply a method that allowed for uncertainty in the phylogeny by providing a sample of trees from a bayesian phylogenetic reconstruction, that would be very useful.

Any suggestions or guidance would be very much appreciated.

If you’re still looking into this, my two cents are that it’s not liable to go very well unless you have many traits and a correlation structure.

If you just estimate ancestral sequences, your best guess for any unobserved tip is just the most likely ancestral sequence, but the uncertainty will be larger the longer the branch is. Eventually, with a long enough branch, you just predict the stationary distribution for the missing tip. Bottom line: unless the branch is quite short, there will likely be a lot of uncertainty about the state, is a guess of “A” over “C” really useful if it’s a 55-45 split on the probability of each state?

If you have multiple traits and you can infer an evolutionary correlation structure for your traits (as with a threshold model or phylogenetic factor analysis), you’d be able to leverage extra information and your estimates would be a lot better.