This is an archived static version of the original phylobabble.org discussion site.

About parsing newick format

guangchuangyu

Dear all,

I am new to phylogeny, and I start to learn by solving problems in ROSALIND. I got stuck with the problem of : http://rosalind.info/problems/nwck/

My solution to it is define a NEWICK class to store the tree and after setting the current node, the getParentList() method can return all its ancester nodes.

So the problem turn out to be finding the most recent common ancestor, and the distance between to nodes can be calculated then.

Source code and sample data can be found at https://github.com/GuangchuangYu/ROSALIND

java/NWCK.java java/tree/Newick.java java/FILE/ReadFile.java DATA/rosalind_nwck.txt

Most of the time, my code can give correct answer, except the newick text contain no ‘,’. For example: ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((sibiricus_molurus)sibiricus_deminutus)Ursus_means)Tupinambus_dactylisonans)Trachemys_cancerides)Tadorna_ferina)Tadarida_albertisii)Syrrhaptes_ulikovskii)Strepsilas_subrubrum)Spizaetus_vittatus)Sitta_plathyrhychos)Selenocosmia_barbatus)Scaphiopus_not)Saxicola_politus)Salmo_hosii)Saga_subruficollis)Saga_rufodorsata)Ruticilla_fiber)Rufibrenta_vanellus)Rhynchaspis_parvus)Rhombomys_erythronotus)Rhacodactylus_spaldingi)Remiz_porzana)Ptyodactylus_ignicapillus)Pseudemys_felderi)Prunella_lagopus)Poephagus_turtur)Phrynocephalus_chrysargos)Pelomedusa_mlokosiewiczi)Pelomedusa_arcticus)Pelodytes_rusticolus)Parus_altaicus)Pandion_grossmani)Oceanodroma_glareola)Nucifraga_insularis)Nucifraga_deremensis)Nipponia_pelagicus)Mochlus_buccata)Marmota_duplex)Madagascarophis_hodgsoni)Lystrophis_isabellina)Lycaenopsis_timidus)Limnodromus_horridum)Liasis_angulifer)Lepus_sudanensis)Lepus_arenarius)Leptopelis_totanus)Leptobrachium_zagrosensis)Ingerophrynus_hypoleucus)Iguana_albatrus)Hydrosaurus_schreibersi)Homopholis_tarandus)Holaspis_querquedula)Heteroscodra_lasiopterus)Haplopelma_turtur)Geochelone_constrictor)Gavia_metallica)Fuligula_vulpes)Fuligula_capreolus)Felis_saxatilis)Felis_middendorffi)Eurynorhynchus_taeniura)Eunectes_weberi)Eucratoscelus_doctus)Erpeton_squamatus)Equus_parreyssi)Equus_cristatus)Cygnus_pachypus)Cuculus_bimaculata)Colaeus_mexicana)Citellus_castaneus)Circaetus_scripta)Chen_corticale)Chelydra_mycterizans)Cardiocranius_aspera)Capella_baeri)Bubulcus_arvensis)Bronchocela_piscator)Branta_cristatellus)Bradyporus_ornata)Bradyporus_ladogensis)Arenaria_campestris)Archispirostreptus_maldivarum)Amphiuma_longicaudata)Amphiuma_bukhunensis)Alauda_opimus);Felis_middendorffi Liasis_angulifer

My code will return the number of correct answer+2.

It took me many times to figure out this bug, and I have no idea why this happened.

Anyone has some ideas?

Thank you!