Statistical consistency of OTOL methods?

Hey @karen_cranston and @blackrim

Are there any efforts to prove statistical consistency of the methods that y’all use to build the OTOL? I’m thinking of analogs of papers like @mathmomike and @tandy’s few logs paper and the work of Degnan and Rosenberg and many others concerning species tree inference. There are some interesting and perhaps relevant observations in the ML supertrees paper by Steel and Rodrigo.

You would need some random process that would generate observed trees from the true TOL. Not sure what that would be exactly.

For those who don’t know about the OTOL methods, here’s @blackrim’s phyloseminar:

I am not sure much of this has been explored. The way the algorithm works as we use it for OTOL it is really not making any decisions that haven’t already been designated by the user. However, built into the system, we can basically search some space and work like a supertree method where there are other decisions being made not by the user. It would be great to explore these more. They certainly aren’t advanced yet though because that really isn’t what is being used for OTOL at the moment. Happy to go into more detail though!

Ah. So if I understand correctly, you make this big graph, and then the user goes through the graph and picks a tree that seems supported by the most edges?