Wednesday, June 26, 2013

EuMicrobedb; Transcriptomicsdb and TOOLKIT with EMBOSS interface is finally resurrected....

Uncountable number of emails, worries, warnings, anger, frustration over these databases is finally over. Good news is Eumicrobedb, transcriptomicsDB and toolkit that served the oomycetes community for a long time is now finally up and available. It is now hosted at IICB instead of Virginia Tech. These  resources were the one of the first efforts to organize and disseminate oomycetes genomes and transcriptomes for 7 long years. In October 2012, after a hacker attack on the server, the web site went down and subsequently many of the web files were completely erased and we could only recover part of the softwares from the back up server. We tried to revive it in VBI, but it failed and was unavailable for 8 months.

Why it took almost 8 months to recreate it?
After joining IICB, I thought I will just maintain a mirror site with the main site at VBI. But I realized that the cost of maintaining a server at VBI is one of the biggest limiting factors in keeping this resource alive. Then it completely fell back on me to recreate it, for which I was not completely prepared. One of the biggest problems was, this database was designed on a oracle framework with many complicated tables and views. I knew after using it for a long time that we only used about 50% of the total architecture for real time data storage. I was aware of this, but did not really have the time or inclination to sit and rework on the schema. Now that I have had only small servers and No Oracle license, I was kind of pushed to a corner to work with this. thus began the effort to recreate this schema in Mysql.

Finally there was lot of pressure to do this (I work best under pressure!) and I spent solid two months sitting literally day and night to work on the schema. I even merged several tables, removed several obsolete fields and created few new tables to finally come up with a new schema. Although we have retained most of the nicest features of GUS, at the same time we have reduced overheads substantially.

Another challenging feature of this schema is the data upload and the dependencies. Earlier a bioperl layer with several plugins served the purpose. But that would mean again installation and dependencies and I decided to get rid of all that. We re-wrote most of the interfacing softwares in Python and some perl that is extremely light weight.

One of the most challenging task was data transfer from VBI oracle server to the new schema. There was a need for remapping data at various points. At the same time the VBI data (VMD) needed some cleaning because I have kept many genomes that were either obsolete or not very clean. So, this required a lot of soul searching to remove unwanted stuff. Finally I have been able to remap, reanalyze where necessary and upload them to our new schema.

The front end of the database was also riddled with several challenges such as complete change in queries and migrating it to a different linux server settings with limited backed up softwares. Finally all these have been done and today I released the first mysql version of Eumicrobedb. There are still a lot of glitches that I know exists and there are some unknown bugs too. In coming days all these will be cleared.

 So, altogether it was a herculean task that one of my MTech students Akash and I have undertaken amidst our other responsibilities...
Database is now available at

  1. m really proud of u mam... wish everything go well and succeed...