Computational Genomics Lab at IICB: June 2013

Wednesday, June 26, 2013

EuMicrobedb; Transcriptomicsdb and TOOLKIT with EMBOSS interface is finally resurrected....

Uncountable number of emails, worries, warnings, anger, frustration over these databases is finally over. Good news is Eumicrobedb, transcriptomicsDB and toolkit that served the oomycetes community for a long time is now finally up and available. It is now hosted at IICB instead of Virginia Tech. These resources were the one of the first efforts to organize and disseminate oomycetes genomes and transcriptomes for 7 long years. In October 2012, after a hacker attack on the server, the web site went down and subsequently many of the web files were completely erased and we could only recover part of the softwares from the back up server. We tried to revive it in VBI, but it failed and was unavailable for 8 months.

Why it took almost 8 months to recreate it?
After joining IICB, I thought I will just maintain a mirror site with the main site at VBI. But I realized that the cost of maintaining a server at VBI is one of the biggest limiting factors in keeping this resource alive. Then it completely fell back on me to recreate it, for which I was not completely prepared. One of the biggest problems was, this database was designed on a oracle framework with many complicated tables and views. I knew after using it for a long time that we only used about 50% of the total architecture for real time data storage. I was aware of this, but did not really have the time or inclination to sit and rework on the schema. Now that I have had only small servers and No Oracle license, I was kind of pushed to a corner to work with this. thus began the effort to recreate this schema in Mysql.

Finally there was lot of pressure to do this (I work best under pressure!) and I spent solid two months sitting literally day and night to work on the schema. I even merged several tables, removed several obsolete fields and created few new tables to finally come up with a new schema. Although we have retained most of the nicest features of GUS, at the same time we have reduced overheads substantially.

Another challenging feature of this schema is the data upload and the dependencies. Earlier a bioperl layer with several plugins served the purpose. But that would mean again installation and dependencies and I decided to get rid of all that. We re-wrote most of the interfacing softwares in Python and some perl that is extremely light weight.

One of the most challenging task was data transfer from VBI oracle server to the new schema. There was a need for remapping data at various points. At the same time the VBI data (VMD) needed some cleaning because I have kept many genomes that were either obsolete or not very clean. So, this required a lot of soul searching to remove unwanted stuff. Finally I have been able to remap, reanalyze where necessary and upload them to our new schema.

The front end of the database was also riddled with several challenges such as complete change in queries and migrating it to a different linux server settings with limited backed up softwares. Finally all these have been done and today I released the first mysql version of Eumicrobedb. There are still a lot of glitches that I know exists and there are some unknown bugs too. In coming days all these will be cleared.

So, altogether it was a herculean task that one of my MTech students Akash and I have undertaken amidst our other responsibilities...
Database is now available at www.eumicrobedb.org/eumicrobedb/

Sunday, June 23, 2013

Installation of Samtools and BEDTools in my new RH 6 64 bit server

Troubles come as a package when you start your new lab. In my effort to resurrect our old database Eumicrobedb.org, I am facing many issues such as installing myriad of packages that are open source. What I did over a period of a decade now needs to be done right away...

I never recall having any issues with installation of easy-to-install packages such as samtools and BEDTools. One of the reasons may be because the system admin was taking care of several system level library installations that were a pre-requisite. While compiling BEDTools, it complained about "undefined reference to `gzopen64'". Searching the forums, I figured out that it is complaining about not finding zlib. I checked installation several times and zlib was right there and was there on path. Checking the stdout of make commands I found that it searches zlib at the right place yet comes back in the end complaining about it. However, finally changing the LIBS path in makefile solved the problem.

Look for the line in Makefile
'export LIBS'

Change it to:
export LIBS = YOUR_PATH/libz.so.1.2.7 -lz

$make clean
$make
It should compile fine.

Samtools:

With Samtools, the problem lasted for a very long time. It always exited with error
"samtools error bam_import.c:76: undefined reference to gzopen64"

I tried re-installing zlib, added zlib path to LD_LIBRARY_PATH, installed latest version zlib, nothing worked.
Finally changing CFLAGS in makefile did the trick :

Change
CFLAGS= -g -Wall -O2
to
CFLAGS= -g -Wall -O2 -L /usr/local/lib (This is where my zlib libraries are located)

.... It finally worked.

NOTE: Install as regular user. May be later you can copy the binary files to the system
directories.

Wednesday, June 12, 2013

Bioinformatics: An Inseparable part of Systems Biology.

Hello everybody,

As per the title of this post, today I'm going to discuss about two widely used terms Bioinformatics and Systems Biology. Well, this time I'm here not to illuminate you with some new information, since none of these terms are newly coined terms, rather my objective behind this post is to share my personal view on these popular disciplines of modern biology, their relation and interdependency on each other.

Let's first come to Systems Biology; what does it mean?

Systems biology deals with biological phenomenon at systems level and systems are defined to be a cluster/collection of smaller components on which we, or the viewer have interest. Now, if we consider cell as a system then its building materials are organelles, macro-micro molecules and the genetic materials (DNA and RNA). If we consider an organ as a system, then its component tissues and the signalling system are the building material of the system. So, what appears from the discussion so far is that this term "system" in biological context is very flexible and it follows some kind of hiererchical construction.

We gathered huge amount of information about these biological entities from varieties types of experiments, which are needed to be organized and accessible in order to fetch out other valuable information out of them- and this is nothing but bioinformatics. So, without the proper organization of biological data and without the facility to fetch out valuable information out of them, it is impossible to stitch those entities together to build a system.

Thus, Bioinformatics should me viewed as a inseparable, rather than only as an important part of systems biology.

Monday, June 3, 2013

NANO SLEUTH

Came through a very interesting article this morning and so wanted to share.

National Centre for Biological Science (NCBS) Bangalore based researchers Yamuna Krishnan & Souvik Modi gifted the world a tool that could play at being a sleuth (detective) within a living cell & help scientists develop the best treatment for a disease.

They demonstrated that 2 or more extremely tiny DNA devices which they call as nano-device can be dispatched inside a cell to report on goings-on within. They created this device by cooking & cooling commercially available DNA strands with Potassium chloride.

These tiny device is 14nm long and 2nm in diameter, helped the scientists accurately measure the pH values of subcellular loctions where they were parked. The authors further added that "if the pH is found to be different from normal, we know something is amiss".

The application potential of the technique is humongous. It can be great tool for drug discoverers. And this may give us the next generation probes for sensing intracellular signals also.

This study appeared in Nature Nanotechnology recently.

Souvik Modi Yamuna Krishnan

Computational Genomics Lab at IICB

Followers