Computational Genomics Lab at IICB: July 2013

Tuesday, July 30, 2013

synetny and the world of genomics

The Comparative genomics is one of my interest where in India many people knows bioinformatics as a designing drug, simulation, and protein folding, predicting 3D structures.but i gotta a right place to do ... i will share what i know the little which i learnt...

Comparative genomics:

The comparative genomics is the one of the bioinformatics approach here we can compare the unknown genomes with known genomes by the genome size, no of genes and chromosome. by comparing the two genome we can identify the core set of genes between two organisms, we can identify the genes which is involved in mutation or ability to cause diseases also.. the comparative genomics it reveals the evolutionary relationships of an organisms

Synteny? The two or more genes which are located on the same chromosome and the linkage between them ie genes. most likely the chromosome is based on collinearity data that two or more chromosomes or segments are derived from a common ancestor, we can say that synteny is likely used to identify the homologous genes to the ancestral chromosomal position.for example in human we all know we have 23 pairs of chromosome and the human chromosome 17 corresponds to the entire portion of mouse chomosome 11.

some terms related to synteny:

Single gene transposition:

it refers to the insertion of one gene into a new location

Fractionation:

here by which the duplicated gene, chromosomal segment , genome organization tends to return to its preduplication gene content

Subfunctionalization:

selectively neutral tendency of a duplicated cis-acting unit of function to lose dispensable sequences on one but not both, duplicates, such that the ancestral function is spread over both duplicates

COGE is one of the online database used for analysing the synteny

the identification of syntenic regions can be done by

1. finding the putative regions or homologous genes between the two genomes.

2. identifying the co-linear set of genes or the regions of sequence similarity

synmap methods:

1. extracts the sequence for comparison builds the fasta files.

2.it creates the database and compare using the BLAST algorithm

3.It contains the default e value cutoff of 0.001

4. it identify tandem gene duplicates by blast2raw.

5. it filters the repetitive matches

6.identify the syntenic pairs by finding co-linear putatative homologous sequences.

7. it calculates the synonymous and non synonymous mutation rates for syntenic gene pairs.

8.it generates the dotplot for putative homologous matches.

9. the colored dotplots based on the synonymous and nonsynonymous mutations.

Analysis options:

1. breaks the sequences into multiple pieces and searches

2 filter repetitive matches:

it adjusts the evalues of the blast hits to lower the significance of sequence that occurs in multiple times in genome.

3. DAG chainer options:

identify synntenic regios between genomes and gene spaces in genomes.

4. average distance between syntenic regions.

5.maximum distance between two matches.

6. minium range of aligned pairs

7. syntenic depth: it gives the best syntenic regios that covers each genome.

the syntenic dotplot which is constructed between Ecoli genomes.. iam just giving this an example so that everybody can have an idea of dotplot and how it looks..

Saturday, July 27, 2013

UID's of cyanobacteria..

have been tired of downloading genomes and genes of cyanobacteria one by one.. then got something called UID's and i collected all the UID's of my genomes.. i think this will be useful for everybody those who are working with cyanobacteria's.

cyanobacterium_UCYN_A_uid43697

Nostoc_azollae__0708_uid49725

Trichodesmium_erythraeum_IMS101_uid57925

Thermosynechococcus_elongatus_BP_1_uid57907

Synechocystis_PCC_6803_uid57659

Synechocystis_PCC_6803_uid189748

Synechocystis_PCC_6803_uid159873

Synechocystis_PCC_6803_substr__PCC_N_uid159835

Synechocystis_PCC_6803_substr__GT_I_uid158059

Synechocystis_PCC_6803_substr__GT_I_uid157913

Synechococcus_elongatus_PCC_7942_uid58045

Synechococcus_elongatus_PCC_6301_uid58235

Synechococcus_WH_8102_uid61581

Synechococcus_WH_7803_uid61607

Synechococcus_RCC307_uid61609

Synechococcus_PCC_7502_uid183008

Synechococcus_PCC_7002_uid59137

Synechococcus_PCC_6312_uid182934

Synechococcus_JA_3_3Ab_uid58535

Synechococcus_JA_2_3B_a_2_13__uid58537

Synechococcus_CC9902_uid58323

Synechococcus_CC9605_uid58319

Synechococcus_CC9311_uid58123

Rivularia_PCC_7116_uid182929

Prochlorococcus_marinus_pastoris_CCMP1986_uid57761

Prochlorococcus_marinus_NATL2A_uid58359

Prochlorococcus_marinus_NATL1A_uid58423

Prochlorococcus_marinus_MIT_9515_uid58313

Prochlorococcus_marinus_MIT_9313_uid57773

Prochlorococcus_marinus_MIT_9312_uid58357

Prochlorococcus_marinus_MIT_9303_uid58305

Prochlorococcus_marinus_MIT_9301_uid58437

Prochlorococcus_marinus_MIT_9215_uid58819

Prochlorococcus_marinus_MIT_9211_uid58309

Prochlorococcus_marinus_CCMP1375_uid57995

Prochlorococcus_marinus_AS9601_uid58307

Oscillatoria_acuminata_PCC_6304_uid183003

Oscillatoria_PCC_7112_uid183110

Nostoc_punctiforme_PCC_73102_uid57767

Nostoc_PCC_7524_uid182933

Nostoc_PCC_7120_uid57803

Nostoc_PCC_7107_uid182932

Microcystis_aeruginosa_NIES_843_uid59101

Microcoleus_PCC_7113_uid183114

Leptolyngbya_PCC_7376_uid182928

Halothece_PCC_7418_uid183338

Gloeocapsa_PCC_7428_uid183112

Gloeobacter_violaceus_PCC_7421_uid58011

Geitlerinema_PCC_7407_uid183007

Dactylococcopsis_salina_PCC_8305_uid183341

Cylindrospermum_stagnale_PCC_7417_uid183111

Cyanothece_PCC_8802_uid59143

Cyanothece_PCC_8801_uid59027

Cyanothece_PCC_7822_uid52547

Cyanothece_PCC_7425_uid59435

Cyanothece_PCC_7424_uid59025

Cyanothece_ATCC_51142_uid59013

Cyanobium_gracile_PCC_6307_uid182931

Cyanobacterium_stanieri_PCC_7202_uid183337

Cyanobacterium_PCC_10605_uid183340

Crinalium_epipsammum_PCC_9333_uid183113

Chroococcidiopsis_thermalis_PCC_7203_uid183002

Chamaesiphon_PCC_6605_uid183005

Calothrix_PCC_7507_uid182930

Calothrix_PCC_6303_uid183109

Arthrospira_platensis_NIES_39_uid197171

Anabaena_variabilis_ATCC_29413_uid58043

Anabaena_cylindrica_PCC_7122_uid183339

Anabaena_90_uid179383

Acaryochloris_marina_MBIC11017_uid58167

by using with this can download the genome information with the help of NCBIeutilities by doing some perl scripts can save the time...

Monday, July 22, 2013

Importing & exporting MySQL dump files including/excluding data

Let us suppose we have a database created in MySQL as: testDB

EXPORT

1. DATA + STRUCTURE
[user@pc]$ mysqldump -u user -p testDB > /path-to-export/testDB.sql

2. STRUCTURE Only
[user@pc]$ mysqldump -u user -p --no-data testDB > /path-to-export/testDB.sql

3. DATA only
[user@pc]$ mysqldump -u -p --no-create-db --no-create-info testDB > /path-to-export/testDB.sql

IMPORT

1. STRUCTURE + DATA
[user@pc]$ mysql -u user -p testDB < /path-from-import/testDB.sql

I would like to mention about an issue that you might face while importing a huge dump file (specially genome databases). Default MySQL configuration will give you an error:
"Got a packet bigger than 'max_allowed_packet' bytes"

The solution is to globally increase the import size of MySQL server+client versions

1. While importing add --max_allowed_packet=100M or your specified size.
e.g.: [user@pc]$ mysql -u user -p --max_allowed_packet=100M testDB < /path-from-import/testDB.sql

2. Open another terminal and login into MySQL server as root
Add the following lines:
mysql> set global net_buffer_length=10000000;
mysql> set global max_allowed_packet=10000000000;

Now proceed with the import command.

Hope it helps you!

Friday, July 19, 2013

mysql.sock file missing or appear to be missing

I had an episode when we tried to install the data directory mysql in a location other than the standard /var/lib location in our server. For that we uninstalled mysql several times but still some remnants were there elsewhere and it was interfering. If you need to re-install mysql just do the following.

1. yum erase mysql -> This will erase all your mysql from any location
2. yum install mysql-server -> This will again install mysql
3. login as root or as sudo run : service mysqld start -> This will start mysql
4. Create root password for your mysql using: mysqladmin -u root password lklklklskl -> This will reset root password as lklklklskl
5. If step 4 runs fine then all is well. But in my case it complained a lot about
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)'
Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists!

I googled a lot and then found that in /etc/my.cnf file, mysql.sock file path is written. Inside my.cnf file I found the path was at /var/run/mysqld/mysqld.sock. Here notice it is mysqld.sock NOT mysql.sock. After a lot of caution I created a mysqld.sock link at /var/lib/mysql using:
ln -s /var/run/mysqld/mysqld.sock /var/lib/mysql/mysql.sock

Then retired command 4 and it ran fine....

If your problem persists and it is unable to connect through mysql.sock file, then try connecting through TCP/IP. By default, mysql tries to connect to server via localhost that uses socket connector. But connecting to 127.0.0.1 uses TCP/IP connector. So, you can try connecting using:

mysql --protocol TCP -u xxx -p

or
If you sont want to create a symlink to mysqld.sock file, try giving the path to socket file with:

mysql --socket=/var/run/mysqld/mysqld.sock -u xxx -p

Alternatively, you can change the host name in /etc/my.cnf file to 127.0.0.1 from localhost. Then it will not look for socket connector.

Computational Genomics Lab at IICB

Followers