How to get Accession numbers by sequences or "gi numbers" in Genbank in a batch-job manner?


Hi geniuses,

A really big headache issue haunting me recently, required your insightful suggestions to help me out:

I have a large matrix, about 12,000 taxa, and the data is formalized as below:


So using information like “taxa name or gi No.”, how can I get the corresponding Accession number from each taxon in this large matrix via NCBI website as a batch job?

Any ideas or experience to share?




One of the most efficient ways to do such things is to use Perl and Bioperl. You can see examples here of people looking to do similar things: Extremely powerful!

Maybe someone else can come up with a “scripting-less” approach.

Good luck,



Thank you very much Laura for your insightful suggestion!

And those examples are very good, I’ll start from perl.

Thanks again!



Well, I really find a a “scripting-less” approach in NCBI website:

Hope it will be helpful for someone in future.



The other option, which I find very straightforward, is to use eBot from NCBI to create a perl script ( You can query Entrez databases and select the type of output you prefer. Often times I parsed the outputs using regex.

Alternatively, NCBI published some standalone tools exactly for batch queries recently (

Good Luck