This is an archived static version of the original discussion site.

Problems with recent Drosophila phylogeny


This post is about a recent Drosophila phylogeny published in MPE, a critique of that paper, and whether MPE has done enough by just publishing the critique. Opinions welcome.

The original paper presented new data and a new tree of Drosophilidae. Obviously lots of people care about this tree since it encompasses some of the best-studied model organisms we’ve got. It’s been cited 23 times since 2012 according to google scholar. Here’s the original:

Increasing the data size to accurately reconstruct the phylogenetic relationships between nine subgroups of the Drosophila melanogaster species group (Drosophilidae, Diptera). Yang Y, Hou ZC, Qian YH, Kang H, Zeng QT.

A critique has just been published, showing lots of issues with the original analysis (full disclosure - I have published with the first author of this critique, although I had nothing to do with this critique and hadn’t read it until today). Here’s the critique:

Problems with data quality in the reconstruction of evolutionary relationships in the Drosophila melanogaster species group: Comments on Yang et al 2012. Catullo RA, Oakeshott JG.

In short - they found many issues with the data in the ms (problems with ~150 fo the ~800 sequences), and couldn’t replicate their results. Most worryingly, they show at least one example where this published tree may have already led to incorrect inferences in a published comparative study that relied on the tree.

What seems odd to me is that although the critique seems fairly damning, nothing has changed on the original paper. My understanding was that this is what the COPE guidelines were for:

While there is no evidence of fraud here, if you take the critique at face value then there were a lot of mistakes in the original article and the validity of the results is certainly in question.

MPE is a premier venue for publishing trees, and it would be nice to think they were committed to their publications being accurate. So I’d be interested to hear others’ opinions on this paper and the critique. Specifically, have MPE done enough here by just publishing the critique? Should they issue a correction / expression of concern / or worse of the original article? Or should the original article stand unchanged despite the critique?




Update: I asked the first author of the critique to share all of her data. You can find it here:


The comment paper is correct, a significant percentage of the accession numbers listed in the first paper are for human sequences etc. What boggles the mind is that they are not even close to what they should be. For example one listed as Drosophila ADH gene is not human ADH but a human immunoglobulin gene. If it was just one or two sequences that were off like that, it could be due to typos in the paper, but witn more than 20 of the accession numbers leading to non-Drosophila data, it can’t be just typos. In my opinion, the Drosophila paper should be retracted.