OK, your older post is indeed convincing: there are cases where you will never have more than a few informative sites.

If we need to formalize and get a bit more quantitative here, for me, it would be this:

(1) what proportions of true/false discoveries do we get for a given number of informative sites, for a given problem and under a particular inference method (and using a given threshold) ? Ideally, one would like the methods to advertise a reasonably reliable estimate of the risk that they make such false discoveries. If they can do this even in a low-diversity regime, then users are left with sufficient information to make up their own mind.

(2) Then, once the rate of false discovery is at least approximately known, the second question is: for a given false discovery rate, imposed as a pre-requisite, which method gives you the most informative answer ?

Given that, in general, we are interested in having such good calibration properties mostly in the strong support (low false discovery rate) regime: in many practical situations, we don’t really care if nominal rates of 20% of false discoveries in fact correspond to 50%, because 20% is already too inconclusive anyway. But we do care about whether an advertised 1% really means 1% of false discovery (or at least, no more than a few %). Because these strongly supported claims will be the ones relied upon for downstream publication or decision making.

In any case, I think that false/true discovery rate is the crucial property we care about. And note that this is not just a small sample-size question. But it can become particularly critical under low-diversity alignment. In fact, we can further discuss about this, if you want, in a separate channel.

concerning polytomies, rjMCMC and the star tree paradox: I think this is a different question. Priors that deal with polytomies are important when polytomies exists in the true tree in the first place. But here, it is not that we believe that the tree is intrinsically polytomous. Instead, it is just that we do not have enough information for estimating a tree which may just be boringly bifurcating (of course, it may also be polytomous, but then it is another problem, fundamentally of model violation, not a sample size issue).