Dear All,

I am new to the forum and happy that it exists

I have been using phylogenetic tree reconstruction methods for non-biological data for some years now yet I still get into trouble from time to time.

My main question is this: What is the relationship between additive distance matrices (or tree-like data) and the need for validating your cluster analysis?

I know that if a distance matrix is additive, most algorithms (e.g., Neighbor-Joining or UPGMA) will reconstruct the ācorrect treeā. However, in the broader field of cluster analysis and statistics, people usually expect you to perform a validation of your analysis using, for example, PCA or silhouettes.

So again, how do we go about validation if the data are provably treelike, i.e., using one of the tests from: J.A. HARTIGAN: Statistical theory in clustering, Journal of Classification 2, 63-76 (1985)?

Also, are there any publications that deal with the properties of caterpillar trees w.r.t. the data they are based on?

I hope this is the right place to post such question. If now, I would be very glad if you could direct me to the correct one.

Many thanks and best wishes, Tudor