Followers

Tuesday, July 30, 2013

synetny and the world of genomics

The Comparative genomics is one of my interest where in India many people knows bioinformatics as a designing drug, simulation, and protein folding, predicting 3D structures.but i gotta a right place to do ... i will share what i know the little which i learnt...


Comparative genomics:
The comparative genomics is the one of the bioinformatics approach here we can compare the unknown genomes with known genomes by the genome size, no of genes and chromosome. by comparing the two genome we can identify the core set of genes between two organisms, we can identify the genes which is involved in mutation or ability to cause diseases also.. the comparative genomics it reveals  the evolutionary relationships of an organisms
Synteny? The two or more genes which are located on the same chromosome and the linkage between them ie genes.  most likely the chromosome is based on collinearity data that two or more chromosomes or segments are derived from a common ancestor, we can say that synteny is likely used to identify the homologous genes to the ancestral chromosomal position.for example in human we all know we have 23 pairs of chromosome and the human chromosome 17 corresponds to the entire portion of mouse chomosome 11.
some terms related to synteny:
Single gene transposition:
it refers to the insertion of one gene into a new location
Fractionation:
here by which the duplicated gene, chromosomal segment , genome organization tends to return to its preduplication gene content
Subfunctionalization:
 selectively neutral tendency of a duplicated cis-acting unit of function to lose dispensable sequences on one but not both, duplicates, such that the ancestral function is spread over both duplicates 
COGE is one of the online database used for analysing the synteny
the identification of syntenic regions can be done by 
1. finding the putative regions or homologous genes between the two genomes.
2. identifying the co-linear set of genes or the regions of sequence similarity
synmap methods:
1. extracts the sequence for comparison builds the fasta files.
2.it creates the database and compare using the BLAST algorithm
3.It contains the default e value cutoff of 0.001
4. it identify tandem gene duplicates by blast2raw.
5. it filters the repetitive matches
6.identify the syntenic pairs by finding co-linear putatative homologous sequences.
7. it calculates the synonymous and non synonymous mutation rates for syntenic gene pairs.
8.it generates the dotplot for putative homologous matches.
9. the colored dotplots based on the synonymous and nonsynonymous mutations.

Analysis options:
1. breaks the sequences into multiple pieces and searches
2 filter repetitive matches:
it adjusts the evalues of the blast hits to lower the significance of sequence that occurs in multiple times in genome.
3. DAG chainer options:
identify synntenic regios between genomes and gene spaces in genomes.
4. average distance between syntenic regions.
5.maximum distance between two matches.
6. minium range of aligned pairs
7. syntenic depth: it gives the best syntenic regios that covers each genome.



the syntenic dotplot which is constructed between Ecoli genomes.. iam just giving this an example so that everybody can have an idea of dotplot and how it looks..


Saturday, July 27, 2013

UID's of cyanobacteria..

have been tired of downloading genomes and genes of cyanobacteria one by one.. then got something called UID's and i collected all the UID's of my genomes.. i think this will be useful for everybody those who are working with cyanobacteria's.
cyanobacterium_UCYN_A_uid43697
Nostoc_azollae__0708_uid49725
Trichodesmium_erythraeum_IMS101_uid57925
Thermosynechococcus_elongatus_BP_1_uid57907
Synechocystis_PCC_6803_uid57659
Synechocystis_PCC_6803_uid189748
Synechocystis_PCC_6803_uid159873
Synechocystis_PCC_6803_substr__PCC_N_uid159835
Synechocystis_PCC_6803_substr__GT_I_uid158059
Synechocystis_PCC_6803_substr__GT_I_uid157913
Synechococcus_elongatus_PCC_7942_uid58045
Synechococcus_elongatus_PCC_6301_uid58235
Synechococcus_WH_8102_uid61581
Synechococcus_WH_7803_uid61607
Synechococcus_RCC307_uid61609
Synechococcus_PCC_7502_uid183008
Synechococcus_PCC_7002_uid59137
Synechococcus_PCC_6312_uid182934
Synechococcus_JA_3_3Ab_uid58535
Synechococcus_JA_2_3B_a_2_13__uid58537
Synechococcus_CC9902_uid58323
Synechococcus_CC9605_uid58319
Synechococcus_CC9311_uid58123
Rivularia_PCC_7116_uid182929
Prochlorococcus_marinus_pastoris_CCMP1986_uid57761
Prochlorococcus_marinus_NATL2A_uid58359
Prochlorococcus_marinus_NATL1A_uid58423
Prochlorococcus_marinus_MIT_9515_uid58313
Prochlorococcus_marinus_MIT_9313_uid57773
Prochlorococcus_marinus_MIT_9312_uid58357
Prochlorococcus_marinus_MIT_9303_uid58305
Prochlorococcus_marinus_MIT_9301_uid58437
Prochlorococcus_marinus_MIT_9215_uid58819
Prochlorococcus_marinus_MIT_9211_uid58309
Prochlorococcus_marinus_CCMP1375_uid57995
Prochlorococcus_marinus_AS9601_uid58307
Oscillatoria_acuminata_PCC_6304_uid183003
Oscillatoria_PCC_7112_uid183110
Nostoc_punctiforme_PCC_73102_uid57767
Nostoc_PCC_7524_uid182933
Nostoc_PCC_7120_uid57803
Nostoc_PCC_7107_uid182932
Microcystis_aeruginosa_NIES_843_uid59101
Microcoleus_PCC_7113_uid183114
Leptolyngbya_PCC_7376_uid182928
Halothece_PCC_7418_uid183338
Gloeocapsa_PCC_7428_uid183112
Gloeobacter_violaceus_PCC_7421_uid58011
Geitlerinema_PCC_7407_uid183007
Dactylococcopsis_salina_PCC_8305_uid183341
Cylindrospermum_stagnale_PCC_7417_uid183111
Cyanothece_PCC_8802_uid59143
Cyanothece_PCC_8801_uid59027
Cyanothece_PCC_7822_uid52547
Cyanothece_PCC_7425_uid59435
Cyanothece_PCC_7424_uid59025
Cyanothece_ATCC_51142_uid59013
Cyanobium_gracile_PCC_6307_uid182931
Cyanobacterium_stanieri_PCC_7202_uid183337
Cyanobacterium_PCC_10605_uid183340
Crinalium_epipsammum_PCC_9333_uid183113
Chroococcidiopsis_thermalis_PCC_7203_uid183002
Chamaesiphon_PCC_6605_uid183005
Calothrix_PCC_7507_uid182930
Calothrix_PCC_6303_uid183109
Arthrospira_platensis_NIES_39_uid197171
Anabaena_variabilis_ATCC_29413_uid58043
Anabaena_cylindrica_PCC_7122_uid183339
Anabaena_90_uid179383
Acaryochloris_marina_MBIC11017_uid58167

by using with this can download the genome information with the help of NCBIeutilities by doing some perl scripts can save the time...

Monday, July 22, 2013

Importing & exporting MySQL dump files including/excluding data


Let us suppose we have a database created in MySQL as: testDB

EXPORT

1. DATA + STRUCTURE
[user@pc]$ mysqldump -u user -p testDB > /path-to-export/testDB.sql

2. STRUCTURE Only
[user@pc]$ mysqldump -u user -p --no-data testDB > /path-to-export/testDB.sql

3. DATA only
[user@pc]$ mysqldump -u -p --no-create-db --no-create-info testDB > /path-to-export/testDB.sql

IMPORT

1. STRUCTURE + DATA
[user@pc]$ mysql -u user -p testDB < /path-from-import/testDB.sql

I would like to mention about an issue that you might face while importing a huge dump file (specially genome databases). Default MySQL configuration will give you an error:
 "Got a packet bigger than 'max_allowed_packet' bytes"

The solution is to globally increase the import size of MySQL server+client versions

1. While importing add --max_allowed_packet=100M or your specified size.
e.g.: [user@pc]$ mysql -u user -p --max_allowed_packet=100M testDB < /path-from-import/testDB.sql

2. Open another terminal and login into MySQL server as root
Add the following lines:
mysql>  set global net_buffer_length=10000000;
mysql>  set global max_allowed_packet=10000000000;

Now proceed with the import command.

Hope it helps you!

Friday, July 19, 2013

mysql.sock file missing or appear to be missing

I had an episode when we tried to install the data directory mysql in a location other than the standard /var/lib location in our server. For that we uninstalled mysql several times but still some remnants were there elsewhere and it was interfering. If you need to re-install mysql just do the following.

1. yum erase mysql  -> This will erase all your mysql from any location
2. yum install mysql-server -> This will again install mysql
3. login as root or as sudo run : service mysqld start -> This will start mysql
4. Create root password for your mysql using:  mysqladmin -u root password lklklklskl -> This will reset root password as  lklklklskl
5. If step 4 runs fine then all is well. But in my case it complained a lot about
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)'
Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists!

I googled a lot and then found that in /etc/my.cnf file, mysql.sock file path is written. Inside my.cnf file I found the path was at /var/run/mysqld/mysqld.sock. Here notice it is mysqld.sock NOT mysql.sock. After a lot of caution I created a mysqld.sock link at /var/lib/mysql using:
ln -s /var/run/mysqld/mysqld.sock /var/lib/mysql/mysql.sock

Then retired command 4 and it ran fine....

If your problem persists and it is unable to connect through mysql.sock file, then try connecting through TCP/IP. By default, mysql tries to connect to server via localhost that uses socket connector. But connecting to 127.0.0.1 uses TCP/IP connector. So, you can  try connecting using:

mysql --protocol TCP -u xxx -p

or
If you sont want to create a symlink to mysqld.sock file, try giving the path to socket file with:

mysql --socket=/var/run/mysqld/mysqld.sock -u xxx -p

Alternatively, you can change the host name in /etc/my.cnf file to 127.0.0.1 from localhost. Then it will not look for socket connector.