Computational Genomics Lab at IICB: 2012

Monday, December 24, 2012

Gratitude always...

Guys, I don't want to put in spiritual ramblings here, but since this is year end and I read few great articles on gratitude, thought shall share here.

As the year is coming to an end, we have to look back and say a grateful thank you for every small thing that we so effortlessly ignore. The fact that we are all part of a CSIR group that is ranked 82nd in the world (much higher than IISC, IITs) in terms of scientific output is something we should feel grateful about. We should continue to give in more than 100% wherever possible and stay honest, hardworking and truthful no matter what.

In the coming year, I urge each one of you stay grateful to whatever happens. There is nothing good or bad, only the mirror through which we look at things may make it look good or bad. If in a day, you find many bad things happening, then you need to quickly introspect within yourself and see what is wrong on.

Say thank you to anyone who does even a small thing for you and say it from heart. Appreciate others meaningfully, give one extra rupee as tip, try help a needy person, and most importantly express gratitude for the food you eat, the beautiful world you see, the melodious sound you can hear - for many people are deprived of that privilege. Never take anything for granted!!

Staying grateful is an important feeling - it opens your mind and widens your horizon. Most importantly it generates a feel good thought, that is much more empowering than anything else.

I wish you all a very happy new year and a successful year research wise.

Friday, December 14, 2012

How to estimate significance in genomewide studies

By Subhadeep Das

With the advent of large scale sequencing and microarray technology, genome wide studies have become a powerful tool to delineate the regulatory network of Genes. With its immense potentiality, genome wide studies are being widely used to decipher differentially expressed genes, genome wide regulatory region detection etc. While its huge power are producing valuable information, a major difficulty lies in detection of the actual significant results out of the huge datasets being produced by it. Moreover, analyses of these data involve statistical significance test on thousands of features being analyzed (e.g. genes, enhancer sites, transcription factor binding sites, SNPs etc) simultanously. In traditional cases of linkage study(similar to present genome wide studies), strict P value cutoffs were used in order to avoid less number of false positive results, but, as opposed to the linkage case, it is expected that many more than one or two of the tested features are statistically significant . So, setting strict P value cutoff may lead to many missed findings.

Here, two newer entities- False Discovery rate (FDR) and q value come in rescue. So, what do the terms mean? Let us take a look over the terms-

P value- The probability of a random (or null) feature showing a score same or greater than a truly alternative (true positive) feature shows.

FDR- It is the proportion of false positives to the total number of features, called significant in a study, based on a P value cutoff.

FDR= expressed as expectation of the proportion of F/S and written as- E [F(t)/S(t)]. where F= no. of false positives, S= total no. of significant features. t= P value threshold or cutoff.

Q value- The proportion of false positives, that will occur, if the P value associated with this feature (whose Q value is being calculated) is considered as significant.

Thus a q value of <=5% of a feature indicates that, if that feature is considered as significant, then 5% of features considered as significant may be false positive.

The next major challenge is the calculation of FDR and q value, since q value is calculated from FDR and the internal terms like F and S are unknown. So, these terms are estimated rather than calculated and in order to do so, the terms are broken in several new terms. The simplified form of FDR is

FDR= P*m*t/(number of features where p value cutoff <=t)

Where, P= proportion of features that are truly null. i.e. m0/m. where, m0= number of true null features, m= number of total features.

So, using this simple but useful terms, we can filter the true positive data conferring real contribution towards the result of thegenomewide study.

Tuesday, December 4, 2012

Resetting root password in a RH linux version 6.0

We have installed the latest version Redhat.Enterprise.Linux.v6.UPDATE.2.X86_64 on one of our servers. Probably the grub prompt time setting is set to 0, so we never see a grub prompt when the computer booted. Numerous posts indicate that one has to set the run level to 1 during boot time to set the kernel settings to single user mode, then just type a password. Howver, in our machine, we could not get that prompt at all.

Yesterday, one of my students, Dhritiman Jana was in the lab and he spent around few hours of his time in solving the issue. Following are his note on how to get to the grub prompt:

Once you get to the command prompt, say startx. From there, you could start and reboot the system

Once the computer starts rebooting press any key constantly. When I say constantly, it should be literally constantly without a pause. At the same time, just time your tapping so that you tap once in 2 seconds or so. Then you will enter grub prompt. Press enter there, then you will enter kernel mode (There will be a list of items and the one that says kernel). Put the cursor on it and press 'e', then the screen will go back to another page, where you will see a string 'rhgb' at the prompt, put a space followed by 1 and then press enter. It may so happen that you may not see the 'rhgb' string here, in that case, just use your arrow key to get the 'rhgb' line. Once this part is taken care of, this will again take you back to kernel mode. There, this time press 'b' . Then the computer is now ready to boot in single user mode and that is root. Once the computer boots you are already logged as root. So, you can do as you please as root, and one of the things you would like to do is change the password. type passwd command and change the password to a suitable one.

Hope this solution works for all!

Tuesday, November 6, 2012

creating password less connections to multiple remote machines

Many times while working in multiple machines, you may like to automate certain processes where the programs can directly access information/data from another server effortlessly - without a password. This can be done using ssh-keygen protocol.

So, what happens here is; you have a local machine, lets call it 'A' and you have a remote machine, lets call it as 'B'. You have an account in 'B' and that is say 'myname'. Everytime you log into that machine using ssh, you have to do something like:

$ ssh myname@B

$Password:

$myname@B:

In order to directly log into a machine without a password, you have to generate a pair of keys; called as a public key and a private key. The public key is the public information and the private key is only known to your local machine i.e; 'A'. You can use ssh-keygen to create a pair of keys in a given time. This is how you should proceed:

$ ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/home/Sucheta/.ssh/id_rsa): /home/Sucheta/.
ssh/iicb_rsa [ Remember here to enter a new file name, else it will over write any other id_rsa file that you may have saved earlier for any other computer ]
Enter passphrase (empty for no passphrase): [Enter a paraphrase that is > 4 characters long. This is essential if your computer has more than one pair of public private keys for more than one remote server]
Enter same passphrase again:
Your identification has been saved in /home/Sucheta/.ssh/iicb_rsa.
Your public key has been saved in /home/Sucheta/.ssh/iicb_rsa.pub.
The key fingerprint is:
95:13:96:1b:66:ef:36:74:25:76:05:23:64:58:bb:94 Sucheta@Sucheta-PC
The key's randomart image is:
+--[ RSA 2048]----+
|          o== o.o|
|         .*+ +o.o|
|         o++E. + |
|         ..oo.. |
|        S o..   |
|            +    |
|           . .   |
|                 |
|                 |
+-----------------+
Then do:

$ssh-copy-id myname@B

[This command will append your public key in the ~/.ssh/authorized_keys file in the remote host. You can also do this manually by logging back to your remote computer and copy pasting your public key in the 'authorized_keys' file. Make sure your public key is copy pasted in one single line.
Another thing to remember is, depending on the OS and version, the file that needs to have the public key in the remote machine may be different. In order to confirm that it is indeed called as "authorized_keys" do the following:
[root@Apala ssh]# cat /etc/ssh/sshd_config | grep Keys
# HostKeys for protocol version 2
#AuthorizedKeysFile .ssh/authorized_keys
#AuthorizedKeysCommand none
#AuthorizedKeysCommandRunAs nobody

This tells you indeed the file that stores public key in your remote computer is named as authorized_keys .

Next time you try to create another passwordless connection to another computer, just repeat the above steps. And always remember to write the public and private key into different files, else it will overwrite contents into id_rsa and id_rsa.pub file. Add a passphrase too.

One more important thing to remember is to check the file permission for "authorized_keys" file in the remote machine. Always set it to 700.

Using this, you can also automate file transfer by an sftp or any other remote ftp protocols

Sunday, September 2, 2012

Welcome to the blog

I joined Indian Institute of Chemical Biology on 1st August 2012 and have been working on building my team ever since. The objective behind creating this site is for my group to be able to share their idea, knowledge, work with each other and with outside world. The topics to be discussed here will be on genomics, transcriptomics, new algorithm along with regular day to day work and research finding like an open lab notebook.

Computational Genomics Lab at IICB

Followers