This is an archived static version of the original discussion site.

Calculating distances with ambiguity codes (R, Y, N etc) in the data


Are there any distance calculators, such as DNAdist in PHYLIP, which can be set to treat ambiguity codes as a partial match? For example, I want a R to be counted as half a match to A or G. I believe that PHYLIP DNAdist counts R as a full match to either A or G.

For diploid organisms an “R” is usually indicating that one allele had A and the other G. But for populations such as a swarm of HIV-1 in a single patient, the R usually means that part of the population had A and the other part G.


There was a discussion of this topic on a mailing list a couple years ago that offered some possible options.

– Chris


There’s also this:

I have some Julia code to calculate distances with ambiguities too, but I haven’t committed the code to the BioJulia repository yet.




Hi Brian,

As @sdwfrost suggests, take a look at our TN93 calculator ( It has fairly comprehensive ambiguity handling, including partial matching (like what you want), partial matching subject to constraints (e.g. two-fold ambigs only), and corresponding ambig-ambig matching. The code is easy to modify to compute any other nucleotide distance for which you have a closed form expression.

Best, Sergei