tag:blogger.com,1999:blog-77571182438035223942023-11-16T08:23:03.435-08:00Computational Genomics Lab at IICBWe are a new group came into existence on August 1st 2012 at Indian Institute Of Chemical Biology, Kolkata, India. Our group comprises of enthusiastic researchers working in the areas of genomics, transcriptomics, computational biology - all aimed towards finding solution for fuel crisis, crop destruction and human diseases. Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.comBlogger67125tag:blogger.com,1999:blog-7757118243803522394.post-9880810981647236052021-10-19T06:43:00.004-07:002021-10-19T06:43:36.307-07:00A dangerous $# in perl arrays - Not recommended for iteration <p> The innocent looking $# operator that we often use for determining the maximum index of an array in for loop can be sometimes dangerous.</p><p>I spent a sizable amount of time wondering why my for loop is becoming an infinite loop without realizing that writing something like this actually changes the max index value or $# value of an array</p><p>Say you have 2 dimensional array @sorted and you want to print the component. The easiest way would be:</p><p>for(my $i=0; $i <= $#sorted; $i++){</p><p> for(my $j=0; $j <= @{$sorted[$i]}; $j++){</p><p> print "$sorted[$i][$j]\t";</p><p> }</p><p><span> </span><span> </span><span> </span><span> </span><span> </span><span> print "$#sorted\n";</span><br /></p><p> }</p><p><br /></p><p>The output will be a neat:</p><p><br /></p><p>3349 4097 gene-PR001_g8806 9 ----> The last column indicates the max index </p><p>6662 6832 gene-PR001_g8807 9</p><p>11316 11696 gene-PR001_g8808 9</p><p>13158 13334 gene-PR001_g8809 9</p><p>18688 19095 gene-PR001_g8810 9</p><p>25175 25342 gene-PR001_g8811 9</p><p>26554 26883 gene-PR001_g8812 9</p><p>28100 29059 gene-PR001_g8813 9</p><p>29128 30235 gene-PR001_g8814 9</p><p>30266 30786 gene-PR001_g8815 9</p><div><br /></div><p>Here one can notice the value of the last column is that of the max index of the array that remains unchanged and hence the lop terminates.</p><p>However, something as innocent as a <b><u>c style</u></b> this involving accessing the $i+1 element of the array actually changes the array maximum index!! This came as a surprise to me where I hit the infinite loop leading to out of memory and locked file alert.</p><p>Check this out::</p><p>for(my $i=0; $i <= $#sorted; $i++){</p><p> for(my $j=0; $j <= @{$sorted[$i]}; $j++){</p><p> print "$sorted<b>[$i+1]</b>[$j]\t";---> Accessing the $i+1 value rather than $i value</p><p> }</p><p> print "$#sorted\n";</p><p> }</p><div>Here all the hell breaks loose where you hit a infinite loop when you suspect the least. Therefore, it</div><div>will be prudent to first pass the value of $#sorted to a variable and loop over that value instead of looping directly over $#sorted.</div><div><br /></div><div><p>3349 4097 gene-PR001_g8806 9 </p><p>6662 6832 gene-PR001_g8807 9</p><p>11316 11696 gene-PR001_g8808 9</p><p>13158 13334 gene-PR001_g8809 9</p><p>18688 19095 gene-PR001_g8810 9</p><p>25175 25342 gene-PR001_g8811 9</p><p>26554 26883 gene-PR001_g8812 9</p><p>28100 29059 gene-PR001_g8813 9</p><p>29128 30235 gene-PR001_g8814 9</p><p>30266 30786 gene-PR001_g8815 9</p><p>10</p><p>11</p><p>... --> Increases infinitely!</p><p><br /></p><p>A potentially dangerous infinite loop where you are least suspicious!!!</p><p>A neat solution for this problem will be:</p><p><b>my $index = $#sorted;--> notice this statement</b></p><p> for(my $i=0; $i <= $index; $i++){</p><p> for(my $j=0; $j <= @{$sorted[$i]}; $j++){</p><p> print "$sorted[$i+1][$j]\t";</p><p> }</p><p> print "$#sorted\n";</p><p> }</p><div><br /></div><p><br /></p><p>6662 6832 gene-PR001_g8807 9</p><p>11316 11696 gene-PR001_g8808 9</p><p>13158 13334 gene-PR001_g8809 9</p><p>18688 19095 gene-PR001_g8810 9</p><p>25175 25342 gene-PR001_g8811 9</p><p>26554 26883 gene-PR001_g8812 9</p><p>28100 29059 gene-PR001_g8813 9</p><p>29128 30235 gene-PR001_g8814 9</p><p>30266 30786 gene-PR001_g8815 9</p><div>10 --> Notice this 10 below. </div><div><br /></div><div><b><u>This means that the value of index is on rise but the loop terminates nevertheless!! Is this a bug in perl??</u></b></div><p><br /></p></div>Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-78886758049232826542020-12-10T02:21:00.001-08:002020-12-10T02:21:06.487-08:00Piping server for transferring data back and forth between any device<p> Today I came across a system called as piping server. The beauty of this technology is you can transfer any file between the devices using simple commands such as curl. If you are ubuntu or any other linux user, the files that you want to transfer from machine A to machine B involves the following simple commands.</p><p>Suppose in 'A' you have a file called as mandel1.jpg and you need to transfer that to 'B' then simply go to the terminal in A and type:</p><p><b>$ curl -T mandel1.jpg https://ppng.io/mandel1</b></p><p><b>[The following will prompt in your terminal @ A]</b></p><p><b>[INFO] Waiting for 1 receiver(s)...</b></p><div><br /></div><div>Then go to terminal 'B' and type:</div><div><br /></div><div><br /></div><div><div><b>sutripa@amrit:~$ curl https://ppng.io/mandel1 > mandel1.jpg</b></div><div><b><br /></b></div><div><b>You will get the following prompt @B </b></div><div> % Total % Received % Xferd Average Speed Time Time Time Current</div><div> Dload Upload Total Spent Left Speed</div><div>100 137 100 137 0 0 2 0 0:01:08 0:01:06 0:00:02 33</div></div><div><br /></div><div>Then finally do an ls at B</div><div><br /></div><div>you will see the file you have transferred e.g; <b>mandel1.jpg.</b></div><div><b><br /></b></div><div>Transferring directories between devices is also pretty easy using this server.</div><div><br /></div><div>At the sending server just do a </div><div><br /></div><div><b>$tar zfcp - ./QC | curl -T - https://ppng.io/such</b></div><div>or if you want to compress using zip do the following:</div><div><b>$zip -q -r - ./QC </b> |<b> curl -T - https://ppng.io/such</b></div><div><br /></div><div>At the receiving server do a </div><div><br /></div><div><b>sutripa@amrit:~$ curl https://ppng.io/such > QC</b></div><div><b><br /></b></div><div>Then you will see</div><div><br /></div><div><div> % Total % Received % Xferd Average Speed Time Time Time Current</div><div> Dload Upload Total Spent Left Speed</div><div>100 855k 0 855k 0 0 60467 0 --:--:-- 0:00:14 --:--:-- 216k</div></div><div><br /></div><div>Directory transferred.</div><div><br /></div><div>For more information check this link out:</div><div><br /></div><div>https://ostechnix.com/transfer-files-between-any-devices-using-piping-server/ </div><div><br /></div><div><br /></div><div><br /></div>Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-69667299383720941502017-04-11T03:05:00.000-07:002017-04-16T07:19:27.866-07:00two-speed genome analysis using R and perl<div dir="ltr" style="text-align: left;" trbidi="on">
I have discussed about my speed genome analysis on my previous blog, now am writing the steps how to do that<br />
<br />
1. calculate the intergenic distance of ur organism "x" from gtf file using the perl script , I have used the augustus predicted gtf file , the file needs to be modified according to the perl script so that it looks for the particular feature and pattern such as it looks for gene/exon/mRNA feature<br />
<br />
sample of the augustus gtf file<br />
(make sure the last column is mentioned in the gene row)<br />
<br />
scaffold_1904 AUGUSTUS gene 1 775 0.09 + . transcript_id "g18705"; gene_id "temp";<br />
scaffold_1 AUGUSTUS transcript 1 4141 0.05 + . g1.t1<br />
scaffold_1 AUGUSTUS intron 1 180 0.55 + . transcript_id "g1.t1"; gene_id "g1";<br />
scaffold_1 AUGUSTUS intron 2022 3511 0.21 + . transcript_id "g1.t1"; gene_id "g1";<br />
scaffold_1 AUGUSTUS CDS 181 2021 0.09 + 2 transcript_id "g1.t1"; gene_id "g1";<br />
scaffold_1 AUGUSTUS CDS 3512 4141 0.49 + 0 transcript_id "g1.t1"; gene_id "g1";<br />
scaffold_1 AUGUSTUS stop_codon 4139 4141 . + 0 transcript_id "g1.t1"; gene_id "g1";<br />
<div>
<br /></div>
<div>
use the perl script Calculate_FIR_length.pl using the gtf file to calculate the intergenic distance between the features, already complimentbed is there but that is not the one which we need, that is for difference purpose when I posted in a github forum to author I came to know the difference, this is the link https://github.com/Adamtaranto/density-Mapr/issues/1#issuecomment-291475238 </div>
<div>
<br /></div>
<div>
the intergenic distance between the gene file looks like this</div>
<div>
<div>
"geneid","strand","fiveprime","threeprime"</div>
<div>
"g10","+",5020,927</div>
<div>
"g84","+",3316,8625</div>
<div>
"g42","+",1558,1773</div>
<div>
"g156","+",4460,13837</div>
<div>
"g93","-",2035,553</div>
<div>
"g30","+",117,361</div>
<div>
"g106","+",1656,874</div>
<div>
"g39","+",1380,720</div>
<div>
"g1","+",NA,1222</div>
<div>
"g70","+",2614,419</div>
</div>
<div>
<br /></div>
<div>
2. make a gtf file of Avh's with the location and calculate the intergenic distance for Avh's , intergenic 5 prime and 3 prime distance needs to be calculated the intergenic distance of effector file should look like this</div>
<div>
<div>
"geneid","strand","fiveprime","threeprime"</div>
<div>
"Avh92","-",4946,80942</div>
<div>
"Avh48","+",137224,80942</div>
<div>
"Avh102","+",38474,24067</div>
<div>
"Avh127","-",137224,6955</div>
<div>
"Avh304","-",7882,12043</div>
<div>
"Avh313","+",23825,26166</div>
<div>
"Avh91","-",24826,4946</div>
<div>
"Avh303","+",16698,12043</div>
<div>
"Avh34","-",41377,26166</div>
<div>
"Avh61","+",4031,5317</div>
<div>
"Avh311","-",14467,1021</div>
<div>
"Avh310","+",4389,1021</div>
<div>
"Avh93","+",41377,38474</div>
</div>
<div>
<br /></div>
<div>
3. then just Run the R script which is pasted below just by changing the names of the file, if the points or plots are going out the cut off please change the bin size and num of bins before its creating the heatmap don't change after the heatmap is made</div>
<div>
<br /></div>
<div>
<div>
whole_intergene=read.csv(file="intergene_whole.csv",sep=",")</div>
<div>
NumBins=50</div>
<div>
if ((max(whole_intergene$fiveprime, na.rm=TRUE)>max(whole_intergene$threeprime, na.rm=TRUE)) == TRUE) { whole_intergene2Bin=whole_intergene$fiveprime} else { whole_intergene2Bin=whole_intergene$threeprime}</div>
<div>
whole_intergene2Bin=whole_intergene2Bin[which(whole_intergene2Bin!=0)]</div>
<div>
whole_intergene2Bin=na.omit(whole_intergene2Bin)</div>
<div>
BinSteps=round(length(whole_intergene2Bin)/ (NumBins-20) , digits=10)</div>
<div>
whole_intergene2BinOrd=sort(whole_intergene2Bin)</div>
<div>
#### The [2*BinSteps] has been changed to [1*Binsteps it was producing an error after googling the error has been fixed]</div>
<div>
TempBinLimits=whole_intergene2BinOrd[seq(whole_intergene2BinOrd[2*BinSteps],length(whole_intergene2BinOrd),BinSteps)]</div>
<div>
TempBinLimits[length(TempBinLimits)+1]=max(whole_intergene2Bin, na.rm=TRUE)</div>
<div>
x<-seq(length(TempBinLimits))</div>
<div>
fit<-nls(log(TempBinLimits) ~ a*x + b, start= c(a=0, b=0),algorithm='port',weights=((x-1.0* NumBins)^2))</div>
<div>
pred=predict(fit, x)</div>
<div>
BinLimits=c(1, round(exp(pred),0), max(whole_intergene2Bin))</div>
<div>
xbin=cut(whole_intergene$fiveprime, breaks=c(BinLimits))</div>
<div>
ybin=cut(whole_intergene$threeprime, breaks=c(BinLimits))</div>
<div>
whole_intergene=cbind(whole_intergene, xbin, ybin, genevalue=rep(1, length (whole_intergene$fiveprime)))</div>
<div>
GenValMatrix<-with(whole_intergene, tapply (genevalue, list(xbin, ybin), sum))</div>
<div>
x<-1:ncol(GenValMatrix)</div>
<div>
y<-1:nrow(GenValMatrix)</div>
<div>
zlim = range(as.numeric (unlist(GenValMatrix)) , finite=TRUE)</div>
<div>
mypalette<-colorRampPalette(c( "white","darkblue", "forestgreen", "goldenrod1","orangered", "red3", "darkred"), space="rgb")</div>
<div>
mycol=mypalette(2*max(GenValMatrix, na.rm=TRUE))</div>
<div>
mylabels<-paste(BinLimits[1:length(BinLimits)-1], BinLimits[2:length(BinLimits)], sep="- ", collapse=NULL)</div>
<div>
filled.contour(x, y, z=GenValMatrix,plot.title = title(main ="Phytophthora ramorum Pr102 genome",xlab = "five prime intergenic regions", ylab= "three prime intergenic regions", cex.main=0.8, cex.lab=0.8),key.title = title(main ="Number ofgenes", cex.main=0.5,line=1),col=mycol,levels = pretty(zlim, 1*max(GenValMatrix,na.rm=TRUE)),plot.axes={axis(1,at=x, labels=mylabels, las=2,cex.axis=0.5);axis(2,at=y, labels=mylabels,cex.axis=0.5)})</div>
<div>
#wget http://wiki.cbr.washington.edu/qerm/sites/qerm/images/1/16/Filled.contour3.R</div>
<div>
source('Filled.contour3.R')</div>
<div>
library(png)</div>
<div>
library(gridExtra)</div>
<div>
library(ggplot2)</div>
<div>
image_name<-paste(as.character(format(Sys.time(),"%Y%m%d%H%M%S")), "_graph", sep="")</div>
<div>
png(filename = paste(image_name, ".png", sep=""))</div>
<div>
par(mar=c(0,0,0,0))</div>
<div>
filled.contour(x, y, z=GenValMatrix,col=mycol,levels = pretty(zlim, 2*max(GenValMatrix,na.rm=TRUE)),frame.plot = FALSE,axes = FALSE)</div>
<div>
dev.off()</div>
<div>
img <- readPNG(paste(image_name, ".png", sep=""))</div>
<div>
library(gridExtra)</div>
<div>
library(grid)</div>
<div>
library(ggplot2)</div>
<div>
library(lattice)</div>
<div>
g <- rasterGrob(img, interpolate=TRUE)</div>
<div>
rxlrData=as.data.frame(read.csv('rxrl_whole_intergenic.csv',header=TRUE))</div>
<div>
ggplot(data=rxlrData,aes(x=rxlrData$fiveprime,y=rxlrData$threeprime,geom="blank"))+annotation_custom(g,xmin=-Inf,xmax=Inf,ymin=-Inf,ymax=Inf)+coord_fixed(ratio=1)+geom_point(shape=21,fill="red",colour="black",size=2,alpha=0.7,na.rm=FALSE)+scale_y_log10(breaks=BinLimits[2:length(BinLimits)],limits=c(BinLimits[2],BinLimits[NumBins +1]))+scale_x_log10(breaks= BinLimits[2:length(BinLimits)],limits= c(BinLimits[2] ,BinLimits[NumBins +1]))+theme(axis.text.y=element_text(size = 10,vjust=0.5))+theme(axis.text.x=element_text(size=10,vjust=0.5,angle=90))+theme(axis.title.x = element_text(face="bold",size=12))+xlab("five prime intergenic region")+theme(axis.title.y = element_text(face="bold",size=12)) + ylab("threeprime intergenic region")</div>
</div>
<div>
<br /></div>
<div>
Below I have put the screen shot for the Phytophthora sojae genome from a sample dataset</div>
<div>
minor edits can be done to make it good !!</div>
<div>
</div>
<div>
This code and concept has been acquired from http://biorxiv.org/content/early/2015/07/01/021774 </div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhy3NUJK6NcsTrm-6O8T5-Pgq0jrdNV87OtUUYAH8FZRUN2bNrHxR7FG92_EJxIzD56iNuGAoO0YenMGEhBaJDvPwpTRgzs5BAUY5Y7VMBIkGe1534Lq3UH3MBUqfjyJs41RK1xFWLergQ/s1600/test.tif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhy3NUJK6NcsTrm-6O8T5-Pgq0jrdNV87OtUUYAH8FZRUN2bNrHxR7FG92_EJxIzD56iNuGAoO0YenMGEhBaJDvPwpTRgzs5BAUY5Y7VMBIkGe1534Lq3UH3MBUqfjyJs41RK1xFWLergQ/s320/test.tif" width="320" /></a></div>
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-70619771548031922432017-03-26T22:48:00.000-07:002017-04-18T23:07:40.519-07:00analysing the alleles from haplotypes on pacbio data<div dir="ltr" style="text-align: left;" trbidi="on">
I had been working in pacbio data and when am trying to identify the alleles from haplotypes from diploid assembly, in the very early step itself i got many errors, because i had been following the illumina dataset method like for pacbio data, but the developed tools behaves strange with the data and I got stuck for 3-4 days i googled the maximum and tried various approaches, Finally i posted in the forums and interacted with the GATK developers, they suggested me a simple solution for solving my errors, so those who are working in long reads and want to identify the haplotypes here is my commandline and verified one<br />
[ Any aligners can be used even BLASR initially i was thinking there was a problem with my aligner, but really not] and no need to mark duplicates in case of long reads only for illumina reads its been recommended by the developer, i had reached till the step of HaplotypeCaller so far no error its running smooth, If i change commands or face any problems, will be updated, once the output is ready maybe i can paste some of my output<br />
<br />
bwa index 2017_V6_Pr102_assembly.fa<br />
bwa mem -x pacbio 2017_V6_Pr102_assembly.fa /data/results1/STLab/Takao_data/Raw_data/ND886/all_ND886.fastq > aln.sam<br />
samtools view -b -S aln.sam -o aln.bam<br />
samtools sort aln.bam > aln_sorted.bam<br />
samtools index aln_sorted.bam<br />
samtools mpileup -uf 2017_V2_ND886_assembly.fa aln_sorted.bam | /share/apps/bcftools-1.2/bcftools call -cv - > out.vcf [use bcftools1.2 otherwise its not producing the genotype information]<br />
<br />
Use any of your favourite haplotype phaser (whatshap/ hapcut) along with the above produced bam and vcf file<br />
<br />
<div>
Now u get the phased alleles from haplotypes u can compare them and these can be used to downstream analysis<br />
<br />
<br />
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-90353484657833520792017-03-24T06:38:00.003-07:002017-04-16T07:19:59.528-07:00Fancy genomics “Iam taking you all to the world of two-Speed genomes concept"<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
My Phd problem includes the various approaches for solving genome
assembly problems. When I was working on oomycetes project, I was attracted by
the effector proteins, Evolution, pathogenicity, synteny, transposon, Repeat
regions, suddenly the fancy thing which came in the mind after reading an
interesting paper from biorxiv that is verticullum genome, a group from Netherlands
have sequenced and studied the 2-speed genome concepts among the strains. <a href="http://genome.cshlp.org/content/early/2016/07/12/gr.204974.116.full.pdf+html">http://genome.cshlp.org/content/early/2016/07/12/gr.204974.116.full.pdf+html</a>
I was impressed by the work, then I showed the work to my PI even she was
impressed by the speed genomes. I work in a collaborative program where exactly
my collaborator also was fascinated by the
speed genome work. <o:p></o:p></div>
<div class="MsoNormal">
Let me explain what is 2 speed genomes?<o:p></o:p></div>
<div class="MsoNormal">
It was already known that fungi and the plant pathogen
genomes comprises of Effector proteins. Which plays an important role in
causing pathogenicity to the host. These Effector genes are not randomly distributed
across the genomes, tend to be associated with the compartments enriched with
repeat sequences and transposons. This <span class="apple-converted-space"><span style="background: white; color: #2e2e2e; font-family: "arial" , sans-serif;">led</span></span><span style="background: white; color: #2e2e2e; font-family: "arial" , sans-serif;"> to the
‘two-speed genome’ model in which filamentous pathogen genomes have a bipartite
architecture with gene sparse, repeat rich compartments for adaptive evolution</span>.<span class="apple-converted-space"> </span>The unusual genome architecture and
occurrence of effector genes in specific genome compartments is a feature that
has evolved repeatedly in independent phylogenetic lineages of filamentous
pathogens. Genome analyses of<span class="apple-converted-space"> </span><em style="word-spacing: -1.244px;"><span style="border: none 1.0pt; font-family: "arial" , sans-serif; padding: 0in;">P. infestans</span></em><span class="apple-converted-space"><span style="word-spacing: -1.244px;"> </span>and three of its sister species
revealed uneven evolutionary rates across genomes with genes in repeat-rich
regions showing higher rates of structural polymorphisms and positive selection</span>.
<span class="apple-converted-space"> </span>Two-speed genome architecture
with the effector genes populating the more rapidly evolving sections of the
genomes.<span class="apple-converted-space"> </span>Lineages that
acquired two-speed genomes have increased survivability — they are
less probabe to go extinct compared to lineages with less adaptable genomes,
which are more probabe to be purged out of the biota as their hosts develop
full resistance or become extinct. In this ‘jump or die’ model, pathogen
lineages that have an increased likelihood to produce virulent genotypes on
resistant hosts and non-hosts benefit from a macroevolutionary advantage and
end up dominating the biota. Several filamentous plant pathogens have
evolved by shifting or jumping from one host plant to another.<o:p></o:p></div>
<div class="MsoNormal">
The information has been shared from this paper a great detailed
review by Sophien and Raffaele et al its available here <a href="http://www.sciencedirect.com/science/article/pii/S0959437X15000945">http://www.sciencedirect.com/science/article/pii/S0959437X15000945</a>
.<o:p></o:p></div>
<div class="MsoNormal">
For who don’t have access to science direct the same paper
is available at biorxiv repository please find the link http://biorxiv.org/content/early/2015/07/01/021774<o:p></o:p></div>
<br />
<div class="MsoNormal">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-29297465898446023322016-10-26T00:32:00.000-07:002016-10-26T04:03:46.511-07:00Structural variation in the genomes <div dir="ltr" style="text-align: left;" trbidi="on">
<div style="background-color: white; letter-spacing: -0.5px; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="font-size: large;"><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;">Structural variation:</span><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;">Structural variation is a change or variation which leads to change in the structure of organisms's chromosome. structural variants can be of Insertions, duplication, Inversion and translocation. According to the human genome or people work in genome say that if there is a variant more than of 50 base pairs changes in the human genome of 1%. Its believed that some of the genetic diseases are caused due to the structural variations.</span><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;">whats the difference between the SNP's and structural variation?</span><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;">SNP's are single nucleotide base mutations <span style="background-color: white; color: #242729;"> </span><span style="background-color: white; color: #242729;">which have been validated to be present in more than 1% of the population</span><span style="background-color: white; color: #242729;"> when a single base differes between the 2 genomes. </span></span><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;"><span style="background-color: white; color: #242729;"> </span><span style="background-color: white; color: #242729;">These are any mutations which cause a change in the organism's chromosome structure, such as Insertions, deletions, copy number variations, duplications, inversions and translocation.</span></span><span style="color: #222222; font-family: "times" , "times new roman" , serif; font-weight: normal;"><span style="background-color: white; color: #242729;"> </span><span style="background-color: white; color: #222222;">SNPs and INDELs are about low-level genomic variation. The structural variants which affect the genome at larger scales. Events like gene duplications, tandem repeats, transposon insertions, inversions, and other chromosomal rearrangements. </span></span><span style="font-family: "times" , "times new roman" , serif; font-weight: normal;"><span style="background-color: white; color: #222222;">The long read sequencing technology paves the way to understand the structural variants using the split read alignment.</span><span style="background-color: white;"><span style="color: red;">[Information from literature</span></span><span style="background-color: white; color: #222222;"> </span></span><span style="font-family: "times" , "times new roman" , serif;"><span style="color: #222222; font-weight: normal;">Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome </span><i style="color: #222222; font-weight: normal;">de novo</i><span style="color: #222222; font-weight: normal;"> assembly </span><span style="color: red;"><b><a class="name" href="http://www.nature.com/nbt/journal/v29/n8/full/nbt.1904.html#auth-1" style="text-decoration: none;"><span class="fn"><span style="color: red;">Yingrui Li</span></span></a><span class="comma">, et al</span></b></span><span class="comma" style="color: #333333; font-weight: bold;">] </span><span style="color: #333333; font-weight: normal;">structural variations from short sequencing reads are hampered by one or more of the following limitations: (i) the methods may favor a particular length range of structural variations; (ii) they may favor discovery of particular types of structural variations; (iii) they may be unable to resolve the exact structural variation genotypes and/or breakpoints at single nucleotide resolution; and (iv) because of difficulties mapping reads to the genome, they may not be able to accurately identify complex rearrangements. Paired-end mapping, for example, can only predict insertion breakpoints within a few base pairs of the exact breakpoint position</span><span style="color: #333333; font-weight: normal;">, and it can only detect insertions when the entire sequence is contained within the DNA fragment whose ends are being sequenced; thus, the maximum size of an insertion that can be detected by paired-end mapping is limited by the largest insert size present in a library. Split-read methods, on the other hand, can precisely define a breakpoint and genotype of an insertion, but only when it is shorter than the read length. Thus, studies carried out so far have been of limited completeness, accuracy and/or resolution.</span></span></span></div>
<div style="background-color: white; letter-spacing: -0.5px; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="color: #333333; font-size: large;"><b>BWA-MEM or BLASR </b></span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">http://lh3.github.io/2014/12/10/bwa-mem-for-long-error-prone-reads/ this is a very nice blog discusses about the alignment methods useful of the pacbio long reads. </span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">https://www.biostars.org/p/63306/ forum discusses about the split read alignments.</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;"><b>Tips for structural variant analysis:</b></span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">1. The maximum number of Reads should be mapped in the breakpoints of the chromosome and the coverage should be high.</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">2. How many Individual reads are supporting the translocation versus supporting assembly for identifying the translocations.</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">[ I spoke with some of the developers asking about the structural variants of draft pacbio assembly plant pathogen human said completely I can use the tools for predicting , am trying to do for one of the plant pathogen genome]</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">one of the paper in 2014 talks about all approaches</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;">https://bib.oxfordjournals.org/content/early/2014/12/12/bib.bbu047.full#sec-9</span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="font-size: large; letter-spacing: -0.5px;"><br /></span></span></div>
<div style="background-color: white; line-height: 1.173; margin: 0px 0px 20px; padding: 0px; text-align: left;">
<span style="color: #333333; font-family: "times" , "times new roman" , serif;"><span style="letter-spacing: -0.5px;"><br /></span></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com1tag:blogger.com,1999:blog-7757118243803522394.post-74035930623413714562016-09-27T22:26:00.004-07:002016-09-28T06:16:58.316-07:00Posters from ECCB2016<div dir="ltr" style="text-align: left;" trbidi="on">
I found some interesting poster and thought it will help my friends who are working on the same area , and same type of work going in my lab those are here<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOTicb9j0KBDJDV-ucm6JsOojDeAaVMgtqLxb12Ad4Y5qdpUSievjrkoyGOKxf39Tv2ubvJ_jCxfl0H1wVCe_I8Lvh-WnrN2syFIsOpcgzeshw_NYbJHpc8pmI6kCl35XhF0CM7Trmx0c/s1600/P_20160907_112134.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOTicb9j0KBDJDV-ucm6JsOojDeAaVMgtqLxb12Ad4Y5qdpUSievjrkoyGOKxf39Tv2ubvJ_jCxfl0H1wVCe_I8Lvh-WnrN2syFIsOpcgzeshw_NYbJHpc8pmI6kCl35XhF0CM7Trmx0c/s640/P_20160907_112134.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUYSVf_F0zLIgfQ5XABBewWcDnntt4LPAiKsC_HS9WADZZArqGPjJOjNYk70JL3g45qsElnyT7BIlAkGJv005yaIlqNZZSc8G5YH73FA2VNldLs5D-z6lWwvvX5XT7s9bumdzByDWFTD4/s1600/P_20160907_111814.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUYSVf_F0zLIgfQ5XABBewWcDnntt4LPAiKsC_HS9WADZZArqGPjJOjNYk70JL3g45qsElnyT7BIlAkGJv005yaIlqNZZSc8G5YH73FA2VNldLs5D-z6lWwvvX5XT7s9bumdzByDWFTD4/s640/P_20160907_111814.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV-HyCUC5je8WKVw3ATlFKt0oZxmPJNXbVEEIHTab95irooYevPYew2_ycXi0QpictB2ZDPYtwBTqbHtbxcJikRFO7quwOeaSOgE4EA4qZu95AHg-QMrSYbsZ6MVd9c2gGf6TeMNMq9XA/s1600/P_20160904_110424.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV-HyCUC5je8WKVw3ATlFKt0oZxmPJNXbVEEIHTab95irooYevPYew2_ycXi0QpictB2ZDPYtwBTqbHtbxcJikRFO7quwOeaSOgE4EA4qZu95AHg-QMrSYbsZ6MVd9c2gGf6TeMNMq9XA/s640/P_20160904_110424.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4UqJ8L_wPKRX8QznGZTkR2HTP51aMExwCSbnkg48WT3877bZQk0Zo4I7KDil4wVyTgjM268nIsNC2L-OfwTnAzUPOED65DcvTftT36lTFP1XT1UeJ9XyfVKGXwDc_WfiPciLaUkAMc60/s1600/P_20160904_172236.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4UqJ8L_wPKRX8QznGZTkR2HTP51aMExwCSbnkg48WT3877bZQk0Zo4I7KDil4wVyTgjM268nIsNC2L-OfwTnAzUPOED65DcvTftT36lTFP1XT1UeJ9XyfVKGXwDc_WfiPciLaUkAMc60/s640/P_20160904_172236.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlr3JbX19dc311q-SOpj33xO6IgMBlOZ2LpLE7lwFzNagozvwJZ_7T08OYLcU3vB2GweDSeMNNcG34nHx6doI3qn0gb0MONFDd3jhO4siIk3Jm2Jzf4Mn0BsDyEA-TvZmK394DMF__WgU/s1600/P_20160905_131604.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlr3JbX19dc311q-SOpj33xO6IgMBlOZ2LpLE7lwFzNagozvwJZ_7T08OYLcU3vB2GweDSeMNNcG34nHx6doI3qn0gb0MONFDd3jhO4siIk3Jm2Jzf4Mn0BsDyEA-TvZmK394DMF__WgU/s640/P_20160905_131604.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE0ro617I6PoyQutNMdDl-IKiYM59TnV9s-DCrvgD8K3itUy58ZTiHxsKrt-JkkaTOGuwvfCZLHHaBBZc6gMYm-BrMUnTQ3Iq6AH4sg3Ji65zr3hy66rp7ch1a8Dr6skZfNmmgEqeIjMo/s1600/P_20160905_131637.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgE0ro617I6PoyQutNMdDl-IKiYM59TnV9s-DCrvgD8K3itUy58ZTiHxsKrt-JkkaTOGuwvfCZLHHaBBZc6gMYm-BrMUnTQ3Iq6AH4sg3Ji65zr3hy66rp7ch1a8Dr6skZfNmmgEqeIjMo/s640/P_20160905_131637.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQrWD_e-Es3gAO_hs79OfsOw9Bx3CtuwrQB2zGHUhxtDwN7lqg7Q0UqB975v7yBuEcPOJMB0TFbdJqA93lcXO4BCG8peAhQTorFpR-K-fiGbNjXvKao9ZMo05nlH11PzRqWBwwZF2ahmw/s1600/P_20160905_131728.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQrWD_e-Es3gAO_hs79OfsOw9Bx3CtuwrQB2zGHUhxtDwN7lqg7Q0UqB975v7yBuEcPOJMB0TFbdJqA93lcXO4BCG8peAhQTorFpR-K-fiGbNjXvKao9ZMo05nlH11PzRqWBwwZF2ahmw/s640/P_20160905_131728.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiK3GbShJ3vr4t39oFKAECQXhLGmA61doJZM3gmypQn7j8UrwcS4ziKB6Vo1P4H88xIMUdXabxqGXD_Vuuz_p7mak_areVoPho0Atp27lpcAyPrlQQkgCdk7_PIXNPT4mimYt1LNwE2i5g/s1600/P_20160905_131906.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiK3GbShJ3vr4t39oFKAECQXhLGmA61doJZM3gmypQn7j8UrwcS4ziKB6Vo1P4H88xIMUdXabxqGXD_Vuuz_p7mak_areVoPho0Atp27lpcAyPrlQQkgCdk7_PIXNPT4mimYt1LNwE2i5g/s640/P_20160905_131906.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcDq38n8WYOkvANgCvSQKs882-XfqHdvlgjdznO4TvI3oMmwlTU_TZXjlR0osTZRvlE5-TBZ6O2xDGa-B9tk3D-HDasleMRVgdcic1cd-hzsGe3bsF9iWh7b0MIydwBUq6EuoTGMpQBys/s1600/P_20160905_132155.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcDq38n8WYOkvANgCvSQKs882-XfqHdvlgjdznO4TvI3oMmwlTU_TZXjlR0osTZRvlE5-TBZ6O2xDGa-B9tk3D-HDasleMRVgdcic1cd-hzsGe3bsF9iWh7b0MIydwBUq6EuoTGMpQBys/s640/P_20160905_132155.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL9APmxnStensDXCAZZYdAyaMU5209YyCQWr7ezfmFHuGOCmP9bkL9vNSkdeulD9kyeUuKxlq7kj-vCDB2-F3o9QDBHagnK6gwjuoKA-QYcynzmOKHzGRQIKHP4ltHCFbMPRAeS1718Xk/s1600/P_20160905_132310.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL9APmxnStensDXCAZZYdAyaMU5209YyCQWr7ezfmFHuGOCmP9bkL9vNSkdeulD9kyeUuKxlq7kj-vCDB2-F3o9QDBHagnK6gwjuoKA-QYcynzmOKHzGRQIKHP4ltHCFbMPRAeS1718Xk/s640/P_20160905_132310.jpg" width="360" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYlNvUwooM3ql407M7AjxptJCFKOJ5AuyHzibJyLGBZRu9Ph8USLDXSdKsp_gDm19uPclUFYsEuwDSJAas-0ca4Q1Y-XipeC4T1PYYvPyprw04zxo9Bp5xcZFrNvrMoiESvsTebzbwMLk/s1600/P_20160905_132755_HDR.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYlNvUwooM3ql407M7AjxptJCFKOJ5AuyHzibJyLGBZRu9Ph8USLDXSdKsp_gDm19uPclUFYsEuwDSJAas-0ca4Q1Y-XipeC4T1PYYvPyprw04zxo9Bp5xcZFrNvrMoiESvsTebzbwMLk/s640/P_20160905_132755_HDR.jpg" width="360" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdB-BjB4t03mQdgIDozp09xiz9oVnSPQUhp7IPaBC1Lo0xd0uDRZTFhtOvGcW3Y3NAARse9SG0vVE4x0giA2xrejPxyHJABRoE2qDyE6TMpkJulOOyMa-FX5PAB-6A7eDFsvHTvwGMAMo/s1600/P_20160905_132737.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdB-BjB4t03mQdgIDozp09xiz9oVnSPQUhp7IPaBC1Lo0xd0uDRZTFhtOvGcW3Y3NAARse9SG0vVE4x0giA2xrejPxyHJABRoE2qDyE6TMpkJulOOyMa-FX5PAB-6A7eDFsvHTvwGMAMo/s640/P_20160905_132737.jpg" width="360" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKPs2XUNcG03HtV0wOzOwQaL4eQ1N_KytgAgpp2kmfqPouQ1nkb66t4z0HWt7XSbFQs03UTT5Ic9M34L1YZwuKN-DVDIj8aP_cCFasGJbMtW9a9uuhrT8mDhDRva5TyFSApXfafDicdhU/s1600/P_20160905_183620.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKPs2XUNcG03HtV0wOzOwQaL4eQ1N_KytgAgpp2kmfqPouQ1nkb66t4z0HWt7XSbFQs03UTT5Ic9M34L1YZwuKN-DVDIj8aP_cCFasGJbMtW9a9uuhrT8mDhDRva5TyFSApXfafDicdhU/s640/P_20160905_183620.jpg" width="640" /></a></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzO9dvySu_3z7RN5W_7BcnQgQIlyhkUG3Wh9c2STQeTUgxMJQSujPOBVYqfIL8miMS57RS__Q76EtLOM282z6x7IDyxmhYMD3yXIxeeKrQiHDduC0nCrnu7CcSTeLFDptZpIW-f6QwGek/s1600/IMG_20160905_125509.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzO9dvySu_3z7RN5W_7BcnQgQIlyhkUG3Wh9c2STQeTUgxMJQSujPOBVYqfIL8miMS57RS__Q76EtLOM282z6x7IDyxmhYMD3yXIxeeKrQiHDduC0nCrnu7CcSTeLFDptZpIW-f6QwGek/s1600/IMG_20160905_125509.jpg" /></a></div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-81696054528334771552016-09-25T09:32:00.002-07:002016-09-25T09:32:46.699-07:00ECCB 2016 Den Hague, Netherlands computational Biologist and Bioinformaticians gatherings at a sweet Dutch country !<div dir="ltr" style="text-align: left;" trbidi="on">
I have been to several conferences within India, while ECCB 2016 which happened in Den Hague, from September 3 2016 - September 7 2016. It was the first time for me travel outside India, had butterflies on my stomach the day before I travel.The trip really went well. It was a gathering of computational Biologist and Bioinformaticians over the world. Well I should thank Department of Science and Technology, Government of India for providing me the travel award. The Meeting started with the workshop on discussing Pacbio and Nanopore data. Expertise from the field of nanopore and Pacbio were discussing the problems with the long reads. People were complaining about the "error rates" of these reads, and difficulties in genome assembly of these reads. Had a great opportunity to discuss with the experts. The Nanopore experts were suggesting that Canu assembler can do better when handling the problematic regions in the genome. The Miniasm and Racon assembler also be tried . There were sessions about Irys to create a genome map and align the created map back to the genome assembly to get a better genome assembly. The structural variants and the comparative genomics are also studied from the graph. Next topic was using Isoseq from pacific Bio-systems to produce a full length transcripts without assembly, followed by promethion and squiggle sequencing system from nanopore technology"Read Until " approach it enables selection of individual DNA molecules for sequencing from a pool of DNA molecules. Then there was a session of Minotour where the base calling of the nanopore reads where done without performing the cloud base calling since there is a dependency of high speed internet. The developers of the tools and technologies were very friendly and gave suggestions on working with the long reads. after the workshop the conference scientific sessions started and many interesting talks where there, I was more interested towards the error correction algorithm development, genome assembly tools, new ortholog prediction tools. Most of the sessions and posters where about the cancer ( a devil), and ENCODE. I can say that 60% of people presented towards cancer transcriptomics and genomics, and 30 % of work in ENCODE, rest where like plant, bacteria, database development, Docking and simulations. The talks and discussions can be retrieved from twitter via #ECCB2016. I liked the theme of the conference here not only PhD students and Scientists were presenting the work, even people working in companies were also showcased the ongoing work.Some people were very happy and showed interest towards the poster of my PhD work, since its a plant pathogen. I am more interested towards studying the environmental organisms, pathogens of human, discovering various new species from the environment. About the food it was good, had a varieties of cheese. I had time to visit Amsterdam its a very nice place with a polite people. Visited churches, Museum, had a good canal Boat riding.I had few friends from conference and joined with them and rented a boat and rode over the Amsterdam city. The future ECCB2017 conference will be held in Prague.</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com2tag:blogger.com,1999:blog-7757118243803522394.post-64355914978074958932016-09-20T05:49:00.001-07:002016-09-20T05:49:38.230-07:00Analyzing Differential expression analysis data using the tuxedo suite (cummeRbund)<div dir="ltr" style="text-align: left;" trbidi="on">
Tuxedo suite comprises of bowtie, tophat, cufflink, cummeRBund and many more accessory tools.<br />
<br />
First get your genome fasta file (final genome assembly file).<br />
1. Map your RNAseq fastq files using tophat (if all is well your run will be seamless)<br />
2. Run cufflink over your tophat output file (<b>cufflinks accepted_hits.bam</b>). This run will take a while since cufflink will actually merge the reads into transcripts, isoforms, genes and so on. If your files are large then in a good enough server expect it to run for 8-12 hours.<br />
3. Run cuffmerge: <b>cuffmerge list.txt</b> -> where list.txt carries the names of the files of <b>*_transcripts.gtf</b> files. This will run very fast and will merge all the gene_ids that will be same across all your samples. The output of this file is a <b>merged.gtf</b> file.<br />
4. For running differential expression analysis run the following:<br />
/cuffdiff merged.gtf tophat_HTI1-vs-HTI4/accepted_hits.bam tophat_HTI2-vs-HTI4/accepted_hits.bam tophat_HTI3-vs-HTI4/accepted_hits.bam<br />
<div>
<br /></div>
<div>
This will create a plethora of files, but the following files are the ones you will be proceeding with for cummeRbund for result visualization and generating publication quality images.</div>
<br />
For running cummeRbund, get all these files to your working directory<br />
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
isoforms.fpkm_tracking<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
isoform_exp.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
genes.fpkm_tracking<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
gene_exp.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
tss_groups.fpkm_tracking<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
tss_group_exp.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
cds.fpkm_tracking<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
cds_exp.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
cds.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
promoters.diff<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="295"><div class="MsoNormal">
splicing.diff<o:p></o:p><br />
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
</div>
</td>
</tr>
</tbody></table>
The best option will be to put all of these 11 files into a separate directory inside your working directory: say '<b>diff_exp'</b><br />
You can run Rstudio if you like in your windows machine or run R in your server. For running CummeRbund you will need the following packages that you can go ahead and download upfront:<br />
<br />
<ul style="background-color: white; color: #555555; font-family: Verdana, Arial, sans-serif; font-size: 12.8px; line-height: 20.48px; list-style-type: circle; margin: 0px 0px 0px 24px; padding: 0px;">
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">RSQLite</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">ggplot2 v0.9.2</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">reshape2</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">plyr</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">fastcluster</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">rtracklayer</span><div class="p" style="margin: 0px; padding: 0px;">
</div>
</li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">Gviz</span></li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">BiocGenerics (>=0.3.2)</span></li>
<li style="margin: 0px 0px 0px 24px; padding: 0px;"><span style="font-family: "helvetica"; margin: 0px; padding: 0px;">Hmisc</span></li>
</ul>
In case you have forgotten how to install R packages go this way:
source('http://www.bioconductor.org/biocLite.R')
biocLite('cummeRbund')
And follow this same protocol for installing other R packages. Once done you can start with setting your working directory using setwd() command.<br />
For example: setwd("C:/Users/Sucheta/Documents/MyLabIICB/AllCollaborations/NahidAliCollaboration/companion")<br />
Then load the library:<br />
<br />
library(cummeRbund)
<br />
<br />
Now read your 11 files using this command<br />
data <-readCufflinks("diff_exp")<br />
<br />
This will take a while to read but will create a db file in your source directory. This is your database file.<br />
<br />
Now you can plot gene density using the following command:<br />
<br />
csDensity(genes(data))<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo99T1B8f8JbZYCT3DXiHalCJdEmtWHFZAuTnKrvvsU6fIp9rXqISpb4xjvhYuiuhMRU3onga0slbUlBCfb4FKkeR8dy_ol3Vf87CnM-Pah9imBnUZO7hDGtWjqnBpAyB6TRdxBL2RyRs/s1600/GeneDensityPlot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="256" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjo99T1B8f8JbZYCT3DXiHalCJdEmtWHFZAuTnKrvvsU6fIp9rXqISpb4xjvhYuiuhMRU3onga0slbUlBCfb4FKkeR8dy_ol3Vf87CnM-Pah9imBnUZO7hDGtWjqnBpAyB6TRdxBL2RyRs/s320/GeneDensityPlot.png" width="320" /></a></div>
<br />
Or can do a volcano plot of differentially expressed genes using:<br />
<br />
v<-csVolcanoMatrix(genes(data))<br />
v<br />
<br />
As you can see from this file, the different conditions have least difference among themselves.<br />
<br />
This will continue in next blog...</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com1tag:blogger.com,1999:blog-7757118243803522394.post-10760856563854342262016-06-17T09:35:00.000-07:002016-06-17T09:35:03.233-07:00#OMGN2016 Malmo, Sweden - Between Then and Now...<div dir="ltr" style="text-align: left;" trbidi="on">
Many things have changed in the years in front of me since the day I started attending OMGN meetings. My first meeting was in year 2005 and then the first Oomycetes genomes were getting sequenced and getting analyzed - at its own pace (read very slow pace). We used to get excited even when we got SSRs or repeats predicted. I distinctly remember the 2004 Joint Genome Institute sequence jamboree when in the evenings we used to gather to discuss what was done during the day. On second or third day of Jamboree, Brett came up with this multiple sequence alignment that presumably indicated that there was an RXLR motif in the effector proteins. It was a huge deal then. Subsequently in all the meetings everybody started discussing on these proteins. Initially it appeared too good to be true with this small 4 letter motif, but a lot of work was done especially in Brett's lab to prove that it indeed was a significant motif. The prediction algorithms of effectors got published in high flying journals, everybody was excited. Slowly many more papers came out on RXLRs, their prediction methods, characterizations till 2010. Now in 2016, I see the level of science has gone up way higher. Genome sequencing using PacBio or Illumina is no deal, neither is analyzing them. Effector prediction has become just a days job (Thanks to all the hard work done by the pioneers). Genome analysis are now carried out by single individuals in few months time. This meeting was a skew towards miRNAs, CRISPER technology for genome editing, pacbio sequencing, RNA silencing. The effector biology has moved many many steps up now. Many more things are now known. Many more proteins have been characterized. It is exciting time in the history of oomycetes biology where many things are happening right in front of our eyes. For those who could not attend please check #OMGN16 for more details. For me now bye bye lovely Malmo!!<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvWf_E38fR77GdGXtKnE9ADzsxxIRESbBMYixW2-SdiyuSOoX-7wUGI4bOr77O5dQoKw8ejF5hQNF08B7IPCAD1kbqV7R7u_NYbajNGDnE_SEEyd1byVc5rUYi_yfogg1f53vSSOcMwgo/s1600/ClEDgTZWQAAeMv_.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvWf_E38fR77GdGXtKnE9ADzsxxIRESbBMYixW2-SdiyuSOoX-7wUGI4bOr77O5dQoKw8ejF5hQNF08B7IPCAD1kbqV7R7u_NYbajNGDnE_SEEyd1byVc5rUYi_yfogg1f53vSSOcMwgo/s320/ClEDgTZWQAAeMv_.jpg" width="320" /></a></div>
<br /></div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-81219308163171690992015-12-14T02:36:00.001-08:002015-12-14T02:36:54.938-08:00Variant Calling - The bowtie - picard - samtools - gatk pipeline....<div dir="ltr" style="text-align: left;" trbidi="on">
Nextgen sequencing has caused a sudden surge in data deluge, but the informatics pipelines and algorithms are unable to keep up with the pace. While most of the exome sequencing data finally focuses on SNP calling and there are various ways of doing this, I decided to discuss one pipeline that has been accepted all over as one of the most sophisticated methods. It is the bowtie - picard - gatk pipeline.<br />
<br />
When you are dealing with colorspace data the choice of mappers get limited. Howver, my favorite mapper is still bowtie for several reasons. Lifescope has its own inhouse mapper; which claims to have a all round better approach in mapping colorspace data, but the lack of transparency on what happens within puts me off using this tool. Once bowtie maps the reads by default parameter, the next thing to do is to convert the sam file into bam file, sort it and index it. All these can be done using samtools. However, if the file size is large, you could do the sorting job using sort operations from unix commandline.<br />
<br />
1. sort sam file<br />
<b>export TMPDIR=DIR_WITH_LOTS_OF_SPACE</b><br />
<b>LC_ALL="C" sort -k 3,3 -k 4,4n input_sam > output_sam</b> # This step will take a long time<br />
<br />
sort options for samtools works but only on bam files and on many instances downstream analysis softwares complain about co-ordinates not being sorted...<br />
<br />
or Use picard:<br />
<br />
java -jar /share/apps/picard-tools-1.56/SortSam.jar I=bowtie.sam O=bowtie.bam SO=coordinate # This took one hour in a HPC with 48 GB RAM on each node for a file size of 30 GB<br />
<br />
2. Make an index file of bam file<br />
<b>samtools index bowtie.bam bowtie.bai</b><br />
<br />
3. MarkDuplicates using picard<br />
<b>java -jar /share/apps/picard-tools-1.56/MarkDuplicates.jar I=bowtie.bam M=metrics.bam O=duplicateMarked.bam</b><br />
<b><br /></b>
4. sort this bam file and make index using samtools<br />
<b>samtools sort duplicateMarked.bam duplicateMarked.sorted</b><br />
<b>samtools index duplicateMarked.sorted.bam duplicateMarked.sorted.bai</b><br />
<div>
<br /></div>
5. Then run IndelRealignerTargetCreator using GATK<br />
<b>java -jar /share/apps/GenomeAnalysisTK-2.4-9-g532efad/resources/GenomeAnalysisTK.jar -T RealignerTargetCreator -I duplicateMarked.sorted.bam -R /share/reference/human/samtools/hg19.fa -o dM.bam.list </b><br />
# The output file dM.bam.list returns 0 output. Check it later.<br />
<div>
<br /></div>
<div>
6. Now run this picard tool to get RG updated since indelRealigner complains.</div>
<b>java -jar /share/apps/picard-tools-1.56/AddOrReplaceReadGroups.jar I=duplicateMarked.bam O=readGroupReplaced.bam RGLB="LINK_TO_FASTA" RGPL=SOLID RGPU=run barcode RGSM=9111 SORT_ORDER=coordinate CREATE_INDEX=TRUE VALIDATION_STRINGENCY=LENIENT'</b><br />
<div>
<br /></div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-48933219500784315172015-12-14T02:36:00.000-08:002015-12-14T02:36:18.318-08:00Posters I could take pictures of Beyond Genome 2014<div dir="ltr" style="text-align: left;" trbidi="on">
Here are few of the posters in Beyond Genome meeting that I could take pictures of. There were many more, but access to take their pictures was less...<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCP2YKVDRzYv17NwvfFGcQECK38xBX04Af4l_MFsdTDLL3bOH_2A1YXgM5v-86iqtLvANFlXrJmgBLwixhyytBhO_J7XFAu5xn2nPDpXoPbl1Y6hx0pdPKzWARfqIk-ZFHaDo5aVGohio/s1600/IMG_20141009_203444.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCP2YKVDRzYv17NwvfFGcQECK38xBX04Af4l_MFsdTDLL3bOH_2A1YXgM5v-86iqtLvANFlXrJmgBLwixhyytBhO_J7XFAu5xn2nPDpXoPbl1Y6hx0pdPKzWARfqIk-ZFHaDo5aVGohio/s640/IMG_20141009_203444.jpg" width="480" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Our Poster<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiB1xalZmLq6Va-w9VopzlEhqk08j2Y24qIP5iTSTBtf5O2w62umAHVwZgNjJ811wSvMecQRikGUqZIqTjuZ7lXaKn5x6U3vN3U62pr1MT6znHn_afpTqUnGkwMvbYrjOZC1SeuwEch_Y/s1600/IMG_20141010_201425.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiB1xalZmLq6Va-w9VopzlEhqk08j2Y24qIP5iTSTBtf5O2w62umAHVwZgNjJ811wSvMecQRikGUqZIqTjuZ7lXaKn5x6U3vN3U62pr1MT6znHn_afpTqUnGkwMvbYrjOZC1SeuwEch_Y/s640/IMG_20141010_201425.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSHGhtbVyU1vcjDy8pAxkqb_Svul7vflmVXATniru9QDrYAud8nFqj5f6Aa0mOWJ1KiF2RkWSJrHjSzm3aL7j2lv2GWYfAnPS5u9wYEPWsTVXsdpiZ_3uaM39thn3DPiFsXJcU5dXZdQo/s1600/IMG_20141010_201437.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSHGhtbVyU1vcjDy8pAxkqb_Svul7vflmVXATniru9QDrYAud8nFqj5f6Aa0mOWJ1KiF2RkWSJrHjSzm3aL7j2lv2GWYfAnPS5u9wYEPWsTVXsdpiZ_3uaM39thn3DPiFsXJcU5dXZdQo/s640/IMG_20141010_201437.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilDjLnCC-HVyieluYoVS_7gPfbtIqLH9Et-L1dvfLQU9XuKzfRdMOtcaNTaCfZYmjOosEyI3draR7cwlRhcGmc0sg9RNNEAjlVtfUkSEkevahNlAqNk4z8aq9i2LgO9CyS93kEDVa-g3o/s1600/IMG_20141010_201547.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilDjLnCC-HVyieluYoVS_7gPfbtIqLH9Et-L1dvfLQU9XuKzfRdMOtcaNTaCfZYmjOosEyI3draR7cwlRhcGmc0sg9RNNEAjlVtfUkSEkevahNlAqNk4z8aq9i2LgO9CyS93kEDVa-g3o/s640/IMG_20141010_201547.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtQ_KIh4ItBSOVaB7b59BaEGh5OLEnxNkvZ_h3MtpL3bGxSpJFzpSUUqG4jTJuvwNITYfnWpF-GrBgnmvIGCTjR8-BsLo7ocscb7txFgLg0uOAhQsFV7L-PRTt0AmuaP-iGUYwxgKhj6w/s1600/IMG_20141010_201316.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtQ_KIh4ItBSOVaB7b59BaEGh5OLEnxNkvZ_h3MtpL3bGxSpJFzpSUUqG4jTJuvwNITYfnWpF-GrBgnmvIGCTjR8-BsLo7ocscb7txFgLg0uOAhQsFV7L-PRTt0AmuaP-iGUYwxgKhj6w/s640/IMG_20141010_201316.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCP2YKVDRzYv17NwvfFGcQECK38xBX04Af4l_MFsdTDLL3bOH_2A1YXgM5v-86iqtLvANFlXrJmgBLwixhyytBhO_J7XFAu5xn2nPDpXoPbl1Y6hx0pdPKzWARfqIk-ZFHaDo5aVGohio/s1600/IMG_20141009_203444.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCP2YKVDRzYv17NwvfFGcQECK38xBX04Af4l_MFsdTDLL3bOH_2A1YXgM5v-86iqtLvANFlXrJmgBLwixhyytBhO_J7XFAu5xn2nPDpXoPbl1Y6hx0pdPKzWARfqIk-ZFHaDo5aVGohio/s640/IMG_20141009_203444.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi36LwFVU_I2uazlwiDlJEToZCKjswspsm-Er5oA_1T-4Esc4PD2d5jNtIJppWAoB32tJyA8r3F2IycuIN-9yAAxNw0fOfBh0EqffMytD5pqzMQVHfJ_7qzgdqKwi-tqLASvtSc8q5lfU/s1600/IMG_20141010_200945.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi36LwFVU_I2uazlwiDlJEToZCKjswspsm-Er5oA_1T-4Esc4PD2d5jNtIJppWAoB32tJyA8r3F2IycuIN-9yAAxNw0fOfBh0EqffMytD5pqzMQVHfJ_7qzgdqKwi-tqLASvtSc8q5lfU/s640/IMG_20141010_200945.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_g47L_BBZp3DnbDQF9ZxhSrGGrEBE8qpdEXlBYxFim1rcAM_FAHsZuuuoUvmmjeNzbCb6oQAKmuIvJdk3w6PMkGlnCsJXzWwXZ98izawotwz49HOs_IcgQahfCTu7R3duPa1iXRzpFEc/s1600/IMG_20141010_200954.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_g47L_BBZp3DnbDQF9ZxhSrGGrEBE8qpdEXlBYxFim1rcAM_FAHsZuuuoUvmmjeNzbCb6oQAKmuIvJdk3w6PMkGlnCsJXzWwXZ98izawotwz49HOs_IcgQahfCTu7R3duPa1iXRzpFEc/s640/IMG_20141010_200954.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZX_CQ4uZXLE84lxtOoFLasWxJdquE5HVXl0zt5vQT1jkm49AwEqd-5XdPoqE1hLqAd8YS9EV1RXLE3FGD7xF5Zwh_NGdZ0I8Igz6M_jU3EI9Jm9lKmMsogD6MPXmhkw8YMiWFMXfTP38/s1600/IMG_20141010_201015.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZX_CQ4uZXLE84lxtOoFLasWxJdquE5HVXl0zt5vQT1jkm49AwEqd-5XdPoqE1hLqAd8YS9EV1RXLE3FGD7xF5Zwh_NGdZ0I8Igz6M_jU3EI9Jm9lKmMsogD6MPXmhkw8YMiWFMXfTP38/s640/IMG_20141010_201015.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiqmJvLgq2gukYTCY7rS_5LhtLPCbrPbbhV2-R51GAyt0tJt9DTPt66B0fl1iRPOxhRqELRLKO91zPUS-Tk611UhNopVl8waBtCUfPCpqLQFk5FXB1XDPmwPOIYqWKUE7gOirylWvUyk/s1600/IMG_20141010_201046.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiqmJvLgq2gukYTCY7rS_5LhtLPCbrPbbhV2-R51GAyt0tJt9DTPt66B0fl1iRPOxhRqELRLKO91zPUS-Tk611UhNopVl8waBtCUfPCpqLQFk5FXB1XDPmwPOIYqWKUE7gOirylWvUyk/s640/IMG_20141010_201046.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS60LKeMoR90rDl6YhghqTrdglbflGm8mRngtr7_JAReG1GDg_jeeE2BxJ2JSD6sM_-vPls-HrdK7Cyl75tzgSedpZoo18rFwKJ23pxfN-7s0bnlQqokS8rS4fygqBRcp4N05tDM0q2CM/s1600/IMG_20141010_201051.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhS60LKeMoR90rDl6YhghqTrdglbflGm8mRngtr7_JAReG1GDg_jeeE2BxJ2JSD6sM_-vPls-HrdK7Cyl75tzgSedpZoo18rFwKJ23pxfN-7s0bnlQqokS8rS4fygqBRcp4N05tDM0q2CM/s640/IMG_20141010_201051.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtkGQ5BjQxxQDAr6U-ciDebylRtSk3b-cEH3YlIcdAwU5CzXpT0zeLUxlKWHkH6pLijbEupUhWGmmZH56ZFGUt5fInPqMnrXp4m203MEuYQqJW9fds9GEDbu0pAE1wRmkLlnVW7pT9AYs/s1600/IMG_20141010_201240.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtkGQ5BjQxxQDAr6U-ciDebylRtSk3b-cEH3YlIcdAwU5CzXpT0zeLUxlKWHkH6pLijbEupUhWGmmZH56ZFGUt5fInPqMnrXp4m203MEuYQqJW9fds9GEDbu0pAE1wRmkLlnVW7pT9AYs/s640/IMG_20141010_201240.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvHZ3qqNHpCrtRm10MW6Fb6_XKD2abnVw58wESg33phYRKYqV-sYigt3xtmCbNzeSoIrsh7nIwIs6krzyL8Xt4E59EQliCZgn41ZMYMMKK9B8j90yf12WbxyZLFa4ltaecmJI7LmZ-kXs/s1600/IMG_20141010_201225.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvHZ3qqNHpCrtRm10MW6Fb6_XKD2abnVw58wESg33phYRKYqV-sYigt3xtmCbNzeSoIrsh7nIwIs6krzyL8Xt4E59EQliCZgn41ZMYMMKK9B8j90yf12WbxyZLFa4ltaecmJI7LmZ-kXs/s640/IMG_20141010_201225.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj59gA-6EXYVzmJcXMoaU3sFUYFcyMdqL1Ltq8J-jmBUGw1D9osJFTbDB5Hz63OvFSGY9mSeOUGM-TQpWtqLs6AffM4mV7qW7W5E20P7_Y-ozW6ikKwa1PYEYGEeLVL0ayLNVWKTX4ji5Y/s1600/IMG_20141010_201217.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj59gA-6EXYVzmJcXMoaU3sFUYFcyMdqL1Ltq8J-jmBUGw1D9osJFTbDB5Hz63OvFSGY9mSeOUGM-TQpWtqLs6AffM4mV7qW7W5E20P7_Y-ozW6ikKwa1PYEYGEeLVL0ayLNVWKTX4ji5Y/s640/IMG_20141010_201217.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwuFtnMnAGqxFIPPnzCuAacsXpsNb2CjCtM8Ya9mwfjtOrwFJFpg3ngN1HHzZNRU4t0UKR30V0PBhBb9u2JMarDO5WYz01_4iEpbCEVjLMWyJ53zKPw0_d3j9byWCO6eE5L14X8fqf8go/s1600/IMG_20141010_201143.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwuFtnMnAGqxFIPPnzCuAacsXpsNb2CjCtM8Ya9mwfjtOrwFJFpg3ngN1HHzZNRU4t0UKR30V0PBhBb9u2JMarDO5WYz01_4iEpbCEVjLMWyJ53zKPw0_d3j9byWCO6eE5L14X8fqf8go/s640/IMG_20141010_201143.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio2f57rM5Po3rjSHAsqvxz-hqQqZPVwwABoQS79VDm8QY_tnRxRFLrnMIuazogxDnQUoVhrCM54yyaLYSQ9Ct55ioBoNIqHeCz7yIOOHsZg_DljMeZ_6QldKHjhUlZfvnvgi9EqjGIvsI/s1600/IMG_20141010_201119.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEio2f57rM5Po3rjSHAsqvxz-hqQqZPVwwABoQS79VDm8QY_tnRxRFLrnMIuazogxDnQUoVhrCM54yyaLYSQ9Ct55ioBoNIqHeCz7yIOOHsZg_DljMeZ_6QldKHjhUlZfvnvgi9EqjGIvsI/s640/IMG_20141010_201119.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5oX-yc2B6TRA7j4fSaAVW4-yEsq1eHA405gMKtd7H3gtz1xeOgoBLx7i_ZhsadcFev7-vlHUYsIYmfEvkpfFZsbiPJDP5-4Mb_JhyphenhyphentC_yxKnME7cmLcJgvZQldEiLFLw8_NcGOk9HTM4/s1600/IMG_20141010_201109.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5oX-yc2B6TRA7j4fSaAVW4-yEsq1eHA405gMKtd7H3gtz1xeOgoBLx7i_ZhsadcFev7-vlHUYsIYmfEvkpfFZsbiPJDP5-4Mb_JhyphenhyphentC_yxKnME7cmLcJgvZQldEiLFLw8_NcGOk9HTM4/s640/IMG_20141010_201109.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglG9oblQS5eIq0B8c-MoZSxyR3RvhQRfh9SsPqpMn3u6wmlBuNq3lK68-XAUYpdVcRFTV091IwwyPZSwFaHwLAkkDxvvG9w8-fGyhDNTqP3xOzrg0eWpQviXdDNjNwLhswJcW0mWdOK5E/s1600/IMG_20141010_201254.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglG9oblQS5eIq0B8c-MoZSxyR3RvhQRfh9SsPqpMn3u6wmlBuNq3lK68-XAUYpdVcRFTV091IwwyPZSwFaHwLAkkDxvvG9w8-fGyhDNTqP3xOzrg0eWpQviXdDNjNwLhswJcW0mWdOK5E/s640/IMG_20141010_201254.jpg" width="480" /></a></div>
</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi36LwFVU_I2uazlwiDlJEToZCKjswspsm-Er5oA_1T-4Esc4PD2d5jNtIJppWAoB32tJyA8r3F2IycuIN-9yAAxNw0fOfBh0EqffMytD5pqzMQVHfJ_7qzgdqKwi-tqLASvtSc8q5lfU/s1600/IMG_20141010_200945.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi36LwFVU_I2uazlwiDlJEToZCKjswspsm-Er5oA_1T-4Esc4PD2d5jNtIJppWAoB32tJyA8r3F2IycuIN-9yAAxNw0fOfBh0EqffMytD5pqzMQVHfJ_7qzgdqKwi-tqLASvtSc8q5lfU/s640/IMG_20141010_200945.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_g47L_BBZp3DnbDQF9ZxhSrGGrEBE8qpdEXlBYxFim1rcAM_FAHsZuuuoUvmmjeNzbCb6oQAKmuIvJdk3w6PMkGlnCsJXzWwXZ98izawotwz49HOs_IcgQahfCTu7R3duPa1iXRzpFEc/s1600/IMG_20141010_200954.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_g47L_BBZp3DnbDQF9ZxhSrGGrEBE8qpdEXlBYxFim1rcAM_FAHsZuuuoUvmmjeNzbCb6oQAKmuIvJdk3w6PMkGlnCsJXzWwXZ98izawotwz49HOs_IcgQahfCTu7R3duPa1iXRzpFEc/s640/IMG_20141010_200954.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZX_CQ4uZXLE84lxtOoFLasWxJdquE5HVXl0zt5vQT1jkm49AwEqd-5XdPoqE1hLqAd8YS9EV1RXLE3FGD7xF5Zwh_NGdZ0I8Igz6M_jU3EI9Jm9lKmMsogD6MPXmhkw8YMiWFMXfTP38/s1600/IMG_20141010_201015.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZX_CQ4uZXLE84lxtOoFLasWxJdquE5HVXl0zt5vQT1jkm49AwEqd-5XdPoqE1hLqAd8YS9EV1RXLE3FGD7xF5Zwh_NGdZ0I8Igz6M_jU3EI9Jm9lKmMsogD6MPXmhkw8YMiWFMXfTP38/s640/IMG_20141010_201015.jpg" width="480" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiqmJvLgq2gukYTCY7rS_5LhtLPCbrPbbhV2-R51GAyt0tJt9DTPt66B0fl1iRPOxhRqELRLKO91zPUS-Tk611UhNopVl8waBtCUfPCpqLQFk5FXB1XDPmwPOIYqWKUE7gOirylWvUyk/s1600/IMG_20141010_201046.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmiqmJvLgq2gukYTCY7rS_5LhtLPCbrPbbhV2-R51GAyt0tJt9DTPt66B0fl1iRPOxhRqELRLKO91zPUS-Tk611UhNopVl8waBtCUfPCpqLQFk5FXB1XDPmwPOIYqWKUE7gOirylWvUyk/s640/IMG_20141010_201046.jpg" width="480" /></a></div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-32431811597683927462015-12-07T22:46:00.000-08:002015-12-14T02:31:51.892-08:00SNP calling using GATK for de novo genome<div dir="ltr" style="text-align: left;" trbidi="on">
I have a got a chance to work in leishmania genome, where i have a genome assembly and i dont have any deposited dbSNP or any other reference file to do variant calling, i have been working and stuck in many steps and posted in GATK forums they replied to some of my queries at one point stopped to reply since People were having a fat and busy holiday on thanks giving , and figured out how to do the variant calling, i think this blog will be much useful for the naive person like me, lets see the workflow and please refer GATK documentation for the explanation.<br />
#first build the index for the reference genome<br />
/share/apps/bowtie2-2.1.0/bowtie2-build after_removing_2k.fasta leishmania.index.bt2<br />
#after index map the reference to the reads<br />
/share/apps/bowtie2-2.1.0/bowtie2 -x leishmania.index.bt2 -1 /data/results/STLab/NahidAli/141218_SND393_A_L005_HTI-5_trim_R1_filtered.fastq -2 /data/results/STLab/NahidAli/141218_SND393_A_L005_HTI-5_trim_R2_filtered.fastq -S bowtie_aligned.sam<br />
#convert the sam file to bam<br />
samtools view -S bowtie_aligned.sam -b -o bowtie_aligned.bam<br />
#sort the bam file<br />
samtools sort bowtie_aligned.bam bowtie_aligned_sorted<br />
#create a pileup file<br />
samtools mpileup -uf after_removing_2k.fasta bowtie_aligned_sorted.bam|/share/apps/samtools-0.1.18/bcftools/bcftools view -bvcg - > leishmania.raw.bcf<br />
#convert bcf to vcf<br />
/share/apps/samtools-0.1.18/bcftools/bcftools view leishmania.raw.bcf > leishmania.raw.vcf<br />
<span style="color: blue;">*********************************************************************************</span><br />
<div>
<span style="color: red;"> The bove commands are just initial way of mapping the reads to the reference and the real GATK pipeline starts below since i don't have the any known sites i have done without base cailbration</span> </div>
<div>
#create the dictionary<br />
#java -jar /share/apps/picard-tools-1.56/CreateSequenceDictionary.jar R=after_removing_2k.fasta O=after_removing_2k.dict<br />
#add or mark group ids<br />
#java -jar /share/apps/picard-tools-1.56/AddOrReplaceReadGroups.jar I=bowtie_aligned.sam O=group_added_read.bam SO=coordinate RGID=1 RGLB=library1 RGPL=illumina RGPU=1 RGSM=leishmania VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE<br />
#mark duplicates<br />
#java -jar /share/apps/picard-tools-1.56/MarkDuplicates.jar I=group_added_read.bam O=mapped_reads_dup.bam METRICS_FILE=metricsFile CREATE_INDEX=true<br />
#sort bam file<br />
#java -jar /share/apps/picard-tools-1.56/BuildBamIndex.jar INPUT=mapped_reads_dup.bam<br />
#create realign target creator<br />
#share/apps/GenomeAnalysisTK.jar -T RealignerTargetCreator -R after_removing_2k.fasta -o target_interval.intervals -I mapped_reads_dup.bam<br />
#indel realigner<br />
#/share/apps/GenomeAnalysisTK.jar -T IndelRealigner -R after_removing_2k.fasta -I mapped_reads_dup.bam -targetIntervals target_interval.intervals -o Indel_realigned.bam<br />
#haplotype caller<br />
#java -jar /share/apps/GenomeAnalysisTK.jar -T HaplotypeCaller -R after_removing_2k.fasta -I Indel_realigned.bam -stand_call_conf 30 -stand_emit_conf 10 -o raw_variants.vcf<br />
#choose the variants from the raw vcf file<br />
#java -jar /share/apps/GenomeAnalysisTK.jar -T SelectVariants -R after_removing_2k.fasta -V raw_variants.vcf -selectType SNP -o raw_snps.vcf<br />
#do the filtration<br />
#java -jar /share/apps/GenomeAnalysisTK.jar -T VariantFiltration -R after_removing_2k.fasta -V raw_variants.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "my_snp_filter" -o filtered_snps.vcf<br />
#Extract the indels<br />
#java -jar /share/apps/GenomeAnalysisTK.jar -T SelectVariants -R after_removing_2k.fasta -V raw_variants.vcf -selectType INDEL -o raw_indels.vcf<br />
#do the filteration<br />
#java -jar /share/apps/GenomeAnalysisTK.jar -T VariantFiltration -R after_removing_2k.fasta -V raw_variants.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "my_indel_filter" -o filtered_indels.vcf</div>
<div>
<br /></div>
<div>
</div>
<div>
from the above predicted snps and indels extract the regions and further annotate and work on it happy variant calling !!!!!!!!!!</div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-86128541228353144202015-11-30T21:08:00.002-08:002015-11-30T22:33:25.665-08:005C bed file data format<div dir="ltr" style="text-align: left;" trbidi="on">
5C and 3C are the newer technologies in sequencing where the chromatin inetraction data can be obtained. If you looking for such data and happen to download from UCSC genome browser, it may be hard to look around for format describing the fields. We asked the authors and here is the explanation:<br />
<br />
The site from which you may download data may be this: https://www.encodeproject.org/experiments/ENCSR000CYD/<br />
<br />
BED file format descrition can be found from : <a href="https://genome.ucsc.edu/FAQ/FAQformat.html#format1" style="color: #1155cc; font-family: arial, sans-serif; font-size: 12.8px;" target="_blank">https://genome.ucsc.edu/FAQ/<wbr></wbr>FAQformat.html#format1</a><span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: 12.8px;"> </span><br />
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: 12.8px;"><br /></span>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: 12.8px;">Here is a sample data for GM12878 cell line:</span><br />
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: 12.8px;"><br /></span>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: 12.8px;"><br /></span>
<br />
<div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
chr22 31998728 33247041 5C_301_ENm004_FOR_292.5C_301_<wbr></wbr>ENm004_REV_</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
32 1000 . 31998728 33247041 0 2 12744,40</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
98, 0,1244215,</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
chr5 131346229 132145236 5C_299_ENm002_FOR_241.5C_299_<wbr></wbr>ENm002_REV_</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
33 1000 . 131346229 132145236 0 2 2609,210</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
5, 0,796902,</div>
</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col1: Chromosome name</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col2: Chromosome start</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col3: chromosome end</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col4: Name of the interacting sites (primer names)</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col5:</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col7: chromosome start</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col8: chromosome end</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col11: block sizes in comma separated list</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
col12: block offset in comma separated list</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
Now I will explain what col11 and col12 means...</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
the beginning of interacting site is the cromosome start and the beginning of offset is 0.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
So, the interacting site begins at 31998728 + 0 and the interacting block length is 12744.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
The beginning position of interacting site 2 is: 31998728 + 1244215 = 33242943</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
The size of interacting block 2 is 4098. so, end of interacting site is 33242943 + 4098 = 33247041.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
Here is a diagrammatic representation:</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUGfL9dYt5oF8MydFcWeJj266UrZd-3sDkeExI5p-3bAAnsI2ZtzSEdsQGdXCeu4NPmLgOCt8cudJxdS-nRxOAjxrJ1RoF4KGu0kVyHxXrDkud4hOcp_n5p4Oa8KYmmrRURnCpwOCpw80/s1600/New+Microsoft+Office+PowerPoint+Presentation.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUGfL9dYt5oF8MydFcWeJj266UrZd-3sDkeExI5p-3bAAnsI2ZtzSEdsQGdXCeu4NPmLgOCt8cudJxdS-nRxOAjxrJ1RoF4KGu0kVyHxXrDkud4hOcp_n5p4Oa8KYmmrRURnCpwOCpw80/s640/New+Microsoft+Office+PowerPoint+Presentation.jpg" width="640" /></a></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 12.8px;">
<br /></div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com1tag:blogger.com,1999:blog-7757118243803522394.post-36293137344663701512015-11-23T01:02:00.006-08:002015-11-23T22:24:06.125-08:00Algal Biotechnology Workshop at IIT Mumbai on 21st Nov 2015<div dir="ltr" style="text-align: left;" trbidi="on">
It was an insightful workshop on algal biotechnology at VMCC hall, IIT Mumbai during 21st November 2015. The organizers managed to have the world leaders as speakers in this area. The workshop started with handing over the materials to the participants followed by welcome address by Dr. Wangikar from IIT Mumbai followed by an insightful talk by Dr. Santanu Dasgupta from Reliance Industries.<br />
<br />
<b>Summaries of some of the interesting talks are discussed here:</b><br />
<br />
<b>De. Duu-long Jee from Department of Chemical Engineering, National Taiwan University:</b><br />
Lutein, one of the 600 naturally occurring carotenoids is abundantly found in marigold flowers as well as in Micro-algae. Dr. jee presented an overview of cost-effectiveness of Lutein production with microalgae vs marigold flowers. Although microalgae produces about 3-4 times more lutein compared to Marigold flowers, but the energy required to extract those from micro-alga makes it an expansive option. Marigold on the other hand needs less nutrient, less power but more space... So, there is a need for engineering micro-algae that can have enhanced Lutein production with lesser energy dependence for extraction.<br />
<br />
<b>John Beardall, Monash University:</b><br />
<br />
Extremophiles will play a major role in algal biotechnology, since they have altered metabolism. It is a well known fact that CO2 is sequestered in algae to enhance growth. But growth and lipid accumulation are oxymoron. Don't happen at the same time. They have explored media as a way for determining what favors the optimized fatty acid production. Their observation indicates that some micro-algae grow really well in media with altered source of carbon (glycerol and xylose) and also produce optimal fatty acid. Myxotrophic growth is favored for higher fatty acid production.<br />
<br />
<b>Jo-Shu Chang, Department of chemical engineering, National Cheng Kung University, Tanian, Taiwan:</b><br />
<br />
Talked about CO2 sequestration by micro-algae and production of economically important compounds. He discussed about major energy components from micro-algae as Butanol, ethanol, H2, Diols, lactic acids and succinic acids. The effluent gas composed of 23.1% CO2, SOx 85 ppm, NOx 75 ppm and at temperature of 230C can be used for growing microalgae. Burkholderia (a proteo bacter) can be used for lipase production.<br />
<br />
<b>Min S. Park, Advanced Biomass R & D Center, BioEnergy Engineering and Research Laboratories, Dept. of chemical and Biomolecular Engineering, Daejeon, Republic of Korea:</b><br />
<br />
Nanocloropsis is the choicest microalgae used for studying bioenergy production. These organisms have lipid droplets in their chloroplast. They have done series of signalling work involving Nanochloropsis and came to conclusion that JNK type of MAPK was highly activated under osmotic stress. NaCl induces osmotic stress -> acts upon MAPKK -> acts on MAPK -> represses Transcription factor -> inducing lipid production. They also observed that lipid production is inhibited by treatment of MEK specific inhibitor. The microbial culture community comprising the treatment plants mostly contained scenedesmus, Golenkinia, Microspora, Micractinium etc.<br />
<br />
<b>Jong Moon Park, Department of Chemical Engineering, School of Environmental Science and Engineering, Division of Advanced Nuclear Engineering, POSTECH, Republic of Korea:</b><br />
<br />
He presented 2 different aspects of Bio-enegy production: 1. Enhanced fatty acid production from microalgae and ethanol production of Cyanobacteria.<br />
In Cyanobacteria, they have used several approaches for enhancing ethanol production directly by manipulating few enzymes. One is glucose-6-phosphate 1-dehydrogenase, encoded by <i>zwf and </i>the other is Pdc. His admission is that ethanol from these engineered bacteria is released out of the cell and hence is not dangerous for the organism itself.<br />
His notable work is also on microalgae where they have used food waste water and municipal sludge as one of the combinations for optimal growth of microalgae. He has also suggested that the municipality wate or food waste water can be diluted 20 times for growing micro-algae in them.<br />
Chlorella was used for bio-diesel production.<br />
Article look up are: Dexter and Fu, 2009; Li, C. 2015 for ethanol from Cyanobacteria. <br />
<br />
Apart from this there were many more interesting talks, that I am not delving upon here. So, in all, everyone is looking for a breakthrough in growing these organisms faster and producing fatty acids quickly....<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRkW54hIXuhMdCW3npcF29SbjHSwl99pO1405ol_Ms4-D2ws38WqGDh88XzeRLB-83zRxz1hRfu8sD9Wxxa90KCina3gvWG5EiqQfUpgq4bNtZ9c4HMo4BJTiLluS-l3zyTx8FxVB6ylI/s1600/IMG_20151121_101719.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRkW54hIXuhMdCW3npcF29SbjHSwl99pO1405ol_Ms4-D2ws38WqGDh88XzeRLB-83zRxz1hRfu8sD9Wxxa90KCina3gvWG5EiqQfUpgq4bNtZ9c4HMo4BJTiLluS-l3zyTx8FxVB6ylI/s320/IMG_20151121_101719.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3c_WnYffbqYzix2m2AJl9SMD7Afl92pIX6K8zIj5CI7TrFStfEG8qFW4bQ4kFLci9r11AxAUknT2VgyoCQiHD40r2oRjHazFjss1X1slT4aaMvNtVOGkDWgYpbPPo-0vLxW2lij4tPvQ/s1600/IMG_20151121_103017.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3c_WnYffbqYzix2m2AJl9SMD7Afl92pIX6K8zIj5CI7TrFStfEG8qFW4bQ4kFLci9r11AxAUknT2VgyoCQiHD40r2oRjHazFjss1X1slT4aaMvNtVOGkDWgYpbPPo-0vLxW2lij4tPvQ/s320/IMG_20151121_103017.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxvFFC7rHbgVLPwExp6YcmOkhBS2zqSfq-lQpsZtgiSH3xnGKwkMocUoY0RCSvshHW42YeDvYPibRxp7DwovsQl1eI6WarnAmHk6fUzIZJ0qEUemKw0D6PHHJvitIkmHN81ap1QeLQhDY/s1600/IMG_20151121_103531.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxvFFC7rHbgVLPwExp6YcmOkhBS2zqSfq-lQpsZtgiSH3xnGKwkMocUoY0RCSvshHW42YeDvYPibRxp7DwovsQl1eI6WarnAmHk6fUzIZJ0qEUemKw0D6PHHJvitIkmHN81ap1QeLQhDY/s320/IMG_20151121_103531.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9SOamseX2wPG1niOgjBO-eea1oK0Ft4A9OAXVV8sojgfNYp6Mt07yf6YyEo-v_ikX1tvWnl2-x8WwIe6BixCcm4F8uTD4VlB6aJRa5jFXy4sNJWh1jb6YxraN5HV6AzWEnAQHyEFj2GY/s1600/IMG_20151121_104228.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9SOamseX2wPG1niOgjBO-eea1oK0Ft4A9OAXVV8sojgfNYp6Mt07yf6YyEo-v_ikX1tvWnl2-x8WwIe6BixCcm4F8uTD4VlB6aJRa5jFXy4sNJWh1jb6YxraN5HV6AzWEnAQHyEFj2GY/s320/IMG_20151121_104228.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdN-Jx5BNlKXRYvxiN4Sr8iGeV6F6Q41upqiwMHGRff6K1Grohm19qTvOs6QD_sTEHKbjcZe7XBk_e1ZHDH5sTY9dzGnsG8X_UerVVgeJsGXlTtZokG6n94ZPeZAepwy7S2Ejk-C4HjGs/s1600/IMG_20151121_114158.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdN-Jx5BNlKXRYvxiN4Sr8iGeV6F6Q41upqiwMHGRff6K1Grohm19qTvOs6QD_sTEHKbjcZe7XBk_e1ZHDH5sTY9dzGnsG8X_UerVVgeJsGXlTtZokG6n94ZPeZAepwy7S2Ejk-C4HjGs/s320/IMG_20151121_114158.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitXysjmbD0v-MO2e72qnnEbJ-Y7YCbvIZa0EJu24M8CF8b6Ts3gJ_TeCA-lq4GQSvfpAPiNzRMIIckWc_BceJgn0MOPYRvceDE-BC9cwiGZKyCk_HRHiJ1ze67wX8cDHtVKaBfvrvZrhE/s1600/IMG_20151121_114353.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitXysjmbD0v-MO2e72qnnEbJ-Y7YCbvIZa0EJu24M8CF8b6Ts3gJ_TeCA-lq4GQSvfpAPiNzRMIIckWc_BceJgn0MOPYRvceDE-BC9cwiGZKyCk_HRHiJ1ze67wX8cDHtVKaBfvrvZrhE/s320/IMG_20151121_114353.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_iPj4qoB5gs3zA7Ue9jfYFB6Twi2X1540a45dMNqNMLZay4UKl_PudWQFGKESSBdqsWtWfnh3tzbAsYSaIB_kkX3z2cvbmUBnIKWgPUaIsfgp_LfojnTiFVzl_OPD9b4EaqbT2c3Dcmc/s1600/IMG_20151121_115649.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_iPj4qoB5gs3zA7Ue9jfYFB6Twi2X1540a45dMNqNMLZay4UKl_PudWQFGKESSBdqsWtWfnh3tzbAsYSaIB_kkX3z2cvbmUBnIKWgPUaIsfgp_LfojnTiFVzl_OPD9b4EaqbT2c3Dcmc/s320/IMG_20151121_115649.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht7ncV5DW3NMP43zJuKHmKoBNaHO_0n_nf9LPuFGfYXXktSzvGpOxyiAw3JoHb7FRjki3Q482JwHSs9k65ei0Fnv-hjTMjRRWNmN_BlRIwcMJ7I7kAfBN8w_pniGdTdzHlzwfFeNaRSdg/s1600/IMG_20151121_123140.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht7ncV5DW3NMP43zJuKHmKoBNaHO_0n_nf9LPuFGfYXXktSzvGpOxyiAw3JoHb7FRjki3Q482JwHSs9k65ei0Fnv-hjTMjRRWNmN_BlRIwcMJ7I7kAfBN8w_pniGdTdzHlzwfFeNaRSdg/s320/IMG_20151121_123140.jpg" width="240" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com4tag:blogger.com,1999:blog-7757118243803522394.post-60076370433190781152015-11-05T23:20:00.000-08:002015-11-05T23:41:43.408-08:00Installing R packages that use shared library in Linux<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;">Many R packages use scripts (or libraries) written in other languages like C, FORTRAN etc from shared libraries. Normally the main scripts (and their dependencies like header files(.h files)) are kept in the <i>src</i> directory inside the package. During installation o</span><span style="font-family: "times" , "times new roman" , serif;">f the package from the source file(<i>.gz</i>) using R CMD INSTALL </span><i style="font-family: Times, 'Times New Roman', serif;">somepackage.tar.gz</i><span style="font-family: "times" , "times new roman" , serif;">, </span><span style="font-family: "times" , "times new roman" , serif;">the scripts are compiled and generates some shared objects in the local directory which dynamically links to the shared library (to lib<i>somename.so.some_number </i>file) which is generally <i>/usr/local/lib</i>. This linking happens through some configuration file (<i>/etc/ld.so.conf</i> ) and some environment variables (e.g. LD_LIBRARY_PATH). Often the conf</span><span style="font-family: "times" , "times new roman" , serif;"> file and the environment variable does not contain the path of the shared library (mostly happens when users use their own shared library instead of the default) and thus during installation it shows the </span><span style="font-family: "arial" , "helvetica" , sans-serif;"><b>error: </b><b> "shared object not found.... no such file or directory"</b></span><span style="font-family: "times" , "times new roman" , serif;">.</span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">one way to solve this problem is problem is to run the ldconfig (or /sbin/ldconfig) commands(preferably in verbose mode(-v) ). This program creates </span><span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;">the required links and cache to the most recent shared libraries.</span></span></div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;"><br /></span></span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;">example:</span></span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;"><br /></span></span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;">I faced similar type of error </span></span><b style="font-family: Arial, Helvetica, sans-serif;">"shared object not found.... no such file or directory" </b><span style="font-family: "times" , "times new roman" , serif;">during installation of the package fftwtools (R CMD INSTALL <i>fftwtools.tar.gz</i>). The steps I followed to fix the problem are:</span></div>
</div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><b>1. error obsereved :</b> can not open .../fftwtools/src/fftwtools.so: ... libfftw3.so.3... no such file or directory.</span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><b>2. located the file using:</b> locate libfftw3.so.3 (to be sure that the file exists)</span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">output: </span></div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"></span><br />
<div>
<span style="font-family: "times" , "times new roman" , serif;">/usr/local/lib/libfftw3.so.3</span></div>
<span style="font-family: "times" , "times new roman" , serif;">
</span>
<br />
<div>
<span style="font-family: "times" , "times new roman" , serif;">/usr/local/lib/libfftw3.so.3.3.2</span></div>
<span style="font-family: "times" , "times new roman" , serif;">
</span>
<br />
<div>
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
<span style="font-family: "times" , "times new roman" , serif;">
</span></div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="background-color: white; text-align: left;"><span style="font-family: "times" , "times new roman" , serif;">3. <b>run /sbin/ldconfig -v</b></span></span></div>
</div>
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">output:</span></span></div>
</div>
<div style="text-align: left;">
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;"></span></span><br /></div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">/sbin/ldconfig: Path `/lib64' given more than once</span></span></div>
</div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">
</span></span>
<br />
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">/sbin/ldconfig: Path `/usr/lib64' given more than once</span></span></div>
</div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">
</span></span>
<br />
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">........................</span></span></div>
</div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">
</span></span>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">/opt/bio/EMBOSS/lib:</span></span></div>
</div>
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white;">
<div>
<div style="text-align: justify;">
libajax.so.6 -> libajax.so.6.0.3</div>
</div>
<div>
<div style="text-align: justify;">
libnucleus.so.6 -> libnucleus.so.6.0.3</div>
</div>
<div>
<div style="text-align: justify;">
.................</div>
</div>
<div>
<div>
<div style="text-align: justify;">
/lib:</div>
</div>
<div>
<div style="text-align: justify;">
libdevmapper-event.so.1.02 -> libdevmapper-event.so.1.02</div>
</div>
<div>
<div style="text-align: justify;">
libiw.so.28 -> libiw.so.28</div>
</div>
</div>
<div>
<div style="text-align: justify;">
.......................</div>
</div>
</span></span></div>
<div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">/lib64:</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> libdevmapper-event.so.1.02 -> libdevmapper-event.so.1.02</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> libiw.so.28 -> libiw.so.28</span></div>
</div>
<div style="font-family: Times, 'Times New Roman', serif;">
<div style="text-align: justify;">
.........................</div>
</div>
</div>
<div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">/usr/local/lib:</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> libnucleus.so.6 -> libnucleus.so.6.0.5</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> libeplplot.so.3 -> libeplplot.so.3.2.7</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">.........</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> <b><span style="color: red;"> libfftw3.so.3 -> libfftw3.so.3.3.2</span></b></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"> libezlib.so.1 -> libezlib.so.1.1.0</span></div>
</div>
<div style="font-family: Times, 'Times New Roman', serif;">
<div style="text-align: justify;">
.........................</div>
</div>
</div>
<div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><b>4. Install the package</b>: R CMD INSTALL fftwtools.tar.gz</span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: "times" , "times new roman" , serif;">Hope the steps works for you. I will be happy to answer any queries regarding this issue. </span><span style="font-family: "times" , "times new roman" , serif;">Thanks a lot for reading the post.</span></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/08852418005276737682noreply@blogger.com2tag:blogger.com,1999:blog-7757118243803522394.post-79343860288858054222015-07-16T03:19:00.000-07:002017-07-19T02:41:13.380-07:00Using Scipio for generating training dataset for Augustus gene modeler<div dir="ltr" style="text-align: left;" trbidi="on">
It is a bit of a hassel if you have a brand new species and would like to train augustus for your dataset. However, there is an incredibly easy way to do this using scipio. Scipio is a wrpper for BLAT program that takes advantage of having a protein file and a genome file of a reference organism. It is very easy and fast to generate genbank files from this 2 files using scipio and BLAT.<br />
<br />
Since Scipio only has 3 perl scripts, one can install it inside the augustus installation directory.<br />
<br />
Proceed the following way:<br />
[Make sure BLAT is in your path. Also make sure YAML module is installed in your system]<br />
<br />
./scipio.1.4.1.pl --blat_output=test.psl genome.fa proteins.aa > test.yaml<br />
cat test.yaml | yaml2gff.1.4.pl > test.scipiogff<br />
scipiogff2gff.pl --in=test.scipiogff --out=scipio.gff -> this script comes with augustus distribution<br />
cat test.yaml | yaml2log.1.4.pl > scipio.log<br />
<br />
# Convert gff into Genbank format for training purposes.<br />
# Here 1000 means intergenic distance is minimum 1000<br />
gff2gbSmallDNA.pl scipio.gff genome.fa 1000 genes.raw.gb -> This script also comes with augustus package<br />
<br />
Generate train.err file first using the following command<br />
etraining --species=myspecies --stopCodonExcludedFromCDS=true genes.raw.gb 2> train.err<br />
<br />
# Modify these crude gb files into a more cleaner gb file<br />
cat train.err | perl -pe 's/.*in sequence (\S+): .*/$1/' > badgenes.lst<br />
filterGenes.pl badgenes.lst genes.raw.gb > genes.gb<br />
grep -c "LOCUS" genes.raw.gb genes.gb<br />
<div>
<br /></div>
# Running Training for gene prediction<br />
etraining --species=myspecies --stopCodonExcludedFromCDS=true genes.gb 2> train.err<br />
<br />
# Modify these crude gb files into a more cleaner gb file<br />
cat train.err | perl -pe 's/.*in sequence (\S+): .*/$1/' > badgenes.lst<br />
filterGenes.pl badgenes.lst genes.raw.gb > genes.gb<br />
grep -c "LOCUS" genes.raw.gb genes.gb<br />
<br />
# Now run etraining again<br />
etraining --species=myspecies --stopCodonExcludedFromCDS=true genes.gb 2> train.err<br />
<br />
[<b>NOTE: </b>Here you have to remember few things: 1) first the AUGUSTUS_CONFIG_PATH should be set to the config directory inside the augustus installation path. 2) make a directory named 'myspecies' inside config/species directory and place a file 'myspecies_parameters.cfg' under config/species sub directory and one 'generic_weightmatrix.txt'. Both these files can be copied from ../generic/generic_parameters.cfg (rename this file) and ../generic/generic_weightmatrix.txt (leave it as such). Then run etraining. Several files will be created under config/species/myspecies directory. Now you are ready to roll]<br />
<div>
<br /></div>
<div>
# Now run Augustus for gene prediction</div>
<div>
# It can be run with these simple commandline steps:</div>
<div>
<br /></div>
<div>
augustus --species=myspecies genome.fa > test.gff</div>
<div>
<br /></div>
<div>
Augustus 3.1 now produces very neat gff files that is both easy to visualize and easy to understand</div>
<div>
<br /></div>
<div>
<div>
# start gene g1</div>
<div>
Scaffold_1 AUGUSTUS gene 2248 2811 0.62 - . g1</div>
<div>
Scaffold_1 AUGUSTUS transcript 2248 2811 0.62 - . g1.t1</div>
<div>
Scaffold_1 AUGUSTUS stop_codon 2248 2250 . - 0 transcript_id "g1.t1"; gene_id "g1";</div>
<div>
Scaffold_1 AUGUSTUS CDS 2248 2811 0.62 - 0 transcript_id "g1.t1"; gene_id "g1";</div>
<div>
Scaffold_1 AUGUSTUS start_codon 2809 2811 . - 0 transcript_id "g1.t1"; gene_id "g1";</div>
<div>
# protein sequence = [MLLTTPNRIAIYSGLDTAMATFSFEVSLRQSSYYELFFAHSVCFLRKSGERDADFWQYCGGRADVFYLHQWCEHRGAD</div>
<div>
# REFCSANIYSDGEDDSQKKGTSRAKKGNRKRRGSSQAEVLATLAESVSAITAANNTREAQEVAWKDQALLLHDKRISHRLGMLTEIFDRTNCYIFDLK</div>
<div>
# KHMKMTMATTS]</div>
<div>
# end gene g1</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-2972945043379874212015-07-08T23:32:00.001-07:002015-07-08T23:32:48.074-07:00Hypocrisy of scientific journals<div dir="ltr" style="text-align: left;" trbidi="on">
I dont want to sound like a bitter dejected negative person, however, this has been my un-biased view of publication policies of some of the well known prestigious journals.<br />
<br />
When I consider publishing somewhere I quickly browse and get to the section where it lists the publication cost. While the cost of publication is usually very high, they do have sometimes concession for List A and List B countries which is economically un-developed. In both the lists you will not find India there, so that means we have to pay the full publication cost! But you open a news paper in any of the western countries or just see the economic rating given to India by wetern raters and it is always in the junk level. So, why this hypocrisy? On one hand you want to rate India way below many developing countries, and on the other hand you consider India to be considerably developed to pay the full publication fee...<br />
<br />
My experiences with pre-publication inquiry to some of the journals that I published before (when I was in US) is also very alarming. The same paper will get published if sent by an US lab, but a lab from India will face rejection at the pre-publication inquiry stage itself. I am wondering if anyone else has noticed this. But we dont want this to deter us from publishing good articles. We will rise above the cloud and publish our work in higher journals.</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com1tag:blogger.com,1999:blog-7757118243803522394.post-52894803103267676802015-07-07T00:22:00.001-07:002015-07-07T00:22:37.313-07:00Trend line and regression analysis. How to do it easily<div dir="ltr" style="text-align: left;" trbidi="on">
Today I came across a thesis os a student who used this and it was a long forgotten topic for me. nevertheless, it was refreshing to read and revisit this very simple and statistically inclined topic that we encounter in day to day life. So what is a trend line?<br />
<br />
We all knowingly or unknowingly try to predict future from our past experience. For example if a student is scoring good grades in previous years, we tend to predict what his/her grades are going to be in coming years. There could be many more such examples as predicting the stock market, predicting weather and so on. So, what we do internally is generate a trend and try to fit in the future based on that trend.<br />
<br />
So, let take a simple example of scores of a student:<br />
<br />
1 300<br />
2 340<br />
3 320<br />
4 400<br />
5 420<br />
9 500<br />
<br />
So, here these are some values/scores the student had in the months in the left column. Eye balling this data what we see is, there is a gradual improvement in scores. And we also observe that there are some missing points here (6,7,8). So, can we predict the score for the 10th month? Can we say what the score would have been on 6th, 7th and 8th month?<br />
<br />
So, in this case, we may like to fit this into a trendline. The easiest way to do it is to use excel. So lets fill in the data in an excel sheet as shown below: [figure 1]<br />
<br />
So, now the formula is Y = 25.5X + 278 at R2=0.928. Using this equation, we can predict the scores on 6th month e.g; Y = 25.5 X 6 + 278 = 309.5;<br />
On 7th month it is = 25.5 X 7 + 278 = 454.4<br />
On 8th month it is = 25.5 X 8 + 278 = 479.6<br />
On 10th month, it is going to be 25.5 X 10 + 278 = 533<br />
<br />
Now putting these values back into excel we get the following figure (Figure - 2)<br />
So, what we see there? The equation and R2 has changed!! Slightly though.. So, what does it tell us? Probably the linear trendline is not correct for this data series.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOrdWO00OdopKN-3t3MUBL8bYKJpaCZxaIRC_enjZsHcHGzPl-3rzhyphenhyphenOL6GMHO1iAjQ1MMp7DxwdrXcsQP_PFcbMH9GhfeiQTRsBALstDkc4JFKOYP01op_uk3COmY5iJLmMSWJJtxt0I/s1600/blog.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOrdWO00OdopKN-3t3MUBL8bYKJpaCZxaIRC_enjZsHcHGzPl-3rzhyphenhyphenOL6GMHO1iAjQ1MMp7DxwdrXcsQP_PFcbMH9GhfeiQTRsBALstDkc4JFKOYP01op_uk3COmY5iJLmMSWJJtxt0I/s400/blog.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1: Excel screen shot of how to select trendline in excel and display the formula.<br />
<br /></td></tr>
</tbody></table>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUk2ROCMHwzCwF5IGxpLTuCWJVENmoCC16It7GDIvTnsMgjWRJxQLBpayXexCgQpy0IRwRrkwlAd9rHvjvt9cKmvNM28ikxppvuKopX2quHG7qCKQg-c2P_kVQtTbfvnesmeLl2F9dR1I/s1600/fig2-blog.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUk2ROCMHwzCwF5IGxpLTuCWJVENmoCC16It7GDIvTnsMgjWRJxQLBpayXexCgQpy0IRwRrkwlAd9rHvjvt9cKmvNM28ikxppvuKopX2quHG7qCKQg-c2P_kVQtTbfvnesmeLl2F9dR1I/s400/fig2-blog.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure -2: Changed R value as well as trend line formula</td></tr>
</tbody></table>
<span id="goog_82464295"></span><span id="goog_82464296"></span><br />
<br />
<br />
<br />
<br />
<br />
<br /></div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com2tag:blogger.com,1999:blog-7757118243803522394.post-42226922283099525252015-06-09T23:03:00.000-07:002015-06-09T23:03:53.370-07:00Installing Bio Python in local Ubuntu server<div dir="ltr" style="text-align: left;" trbidi="on">
I am trying to install biopython in my ubuntu server but hit the wall several times with this error message:<br />
Cannot fetch index base URL https://pypi.python.org/simple/<br />
<br />
I checked my pip version using the following command:<br />
<br />
$pip --version and the output is 1.5.4.<br />
<br />
I googled little bit about it and saw this nice post at http://stackoverflow.com/questions/17416938/pip-cannot-install-anything<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBzYLKMAvpeseGH56wTtetxr9gi50GWtPK2iMIlcZqQa_hDWisY6Su12V6_JT5WXQRW5SBM0s0-2rXQ5lcOSHwsypgiwFW1i01FR2i_D8swPdtW0V5hTf1aC1EjsnrCSqYxZm60lroLho/s1600/blog.bmp" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBzYLKMAvpeseGH56wTtetxr9gi50GWtPK2iMIlcZqQa_hDWisY6Su12V6_JT5WXQRW5SBM0s0-2rXQ5lcOSHwsypgiwFW1i01FR2i_D8swPdtW0V5hTf1aC1EjsnrCSqYxZm60lroLho/s640/blog.bmp" width="640" /></a></div>
<br />
I followed this exactly and installed pip version 1.2.1 and checked the path where it is installed. For instance my earlier pip version is not over written and it is installed in a separate place, then calling pip will still use it from the previous location.<br />
<br />
So, I did the following:<br />
<br />
/usr/local/bin/pip install biopython and it worked like a charm.</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com2tag:blogger.com,1999:blog-7757118243803522394.post-75641661880513313142015-06-01T03:10:00.000-07:002015-06-01T03:10:08.096-07:00downloading a research articles even if your institute dont have access you can download ...<div dir="ltr" style="text-align: left;" trbidi="on">
Guys,<br />
no need to put request in facebook to download papers also here to download research articles<br /> Just follow this procedure to download any Research Article<br />
Step 1: Goto <a href="http://freestuffy.in/" target="_blank">freestuffy.in</a><br /> Step 2 Then goto Research Papers <span><br /> Step 3 Then goto Scientific Journals<br /> Step 3 Type here DOI no of your article that usually starts from 10.10.....<br /> Step 4 Hit Search button to download same</span><br />
<div>
If unable to download it<br /> Just follow these steps<br /> Step 1: Goto <a href="http://freestuffy.in/" target="_blank">freestuffy.in</a><br /> Step 2 Then goto Research Papers <br /> Step 3 Then goto server 1<br /> Step 4 Just type URL of paper and hit search and paper is ready to download <br />
and <span><span> </span><span><span><span>Method1
doesnt work...go to server1 option and don't type URL ,doi, pmid etc
there , instead paste the full title of paper...it works...</span></span></span></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/18071472759535434809noreply@blogger.com6tag:blogger.com,1999:blog-7757118243803522394.post-52912623741215873622015-02-27T03:53:00.000-08:002015-02-27T03:53:45.936-08:00Quantifying Microbiome of self<div dir="ltr" style="text-align: left;" trbidi="on">
Here is a very interesting paper that I came across today in Genome Biology Journal. [ http://genomebiology.com/2014/15/7/R89 ]<br />
<br />
The authors took time series data of their microbiota at 10,000 longitudinal data points and concluded that the gut microbiota remains more or less same over a period of time unless there is a real change in life style or disease condition.<br />
<br />
For example in the two individuals this study was undertaken, visibly the gut microbiome was upset when the subject visited from developed world to developing world. In case of subject 2 there was a clear change in the gut microbiome pattern when he had bouts of diarrhea [Image below].<br />
<img src="http://genomebiology.com/content/figures/gb-2014-15-7-r89-1-l.jpg" /><br />
<br />
It was also reported that the variation between individuals is much larger than variation within an individual [Figure below]<br />
<br />
<img src="http://genomebiology.com/content/figures/gb-2014-15-7-r89-2.jpg" /><br />
<br />
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-68664227341106877142015-01-01T03:59:00.001-08:002015-01-01T03:59:24.959-08:00Determining Outlier<div dir="ltr" style="text-align: left;" trbidi="on">
Reference: http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm<br />
<br />
<b>Boxplot Construction:</b><br />
<br />
The box plot is a useful graphical display for describing the behavior of the data in the middle as well as at the ends of the distributions. The box plot uses the <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda351.htm">median</a>and the lower and upper quartiles (defined as the 25th and 75th <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc252.htm">percentiles</a>). If the lower quartile is Q1 and the upper quartile is Q3, then the difference (Q3 - Q1) is called the interquartile range or IQ.<br />
<br />
<b>Box plot with fences:</b><br />
<br />
A box plot is constructed by drawing a box between the upper and lower quartiles with a solid line drawn across the box to locate the median. The following quantities (called <i>fences</i>) are needed for identifying extreme values in the tails of the distribution:<br />
<ol>
<li>lower inner fence: Q1 - 1.5*IQ</li>
<li>upper inner fence: Q3 + 1.5*IQ</li>
<li>lower outer fence: Q1 - 3*IQ</li>
<li>upper outer fence: Q3 + 3*IQ</li>
</ol>
<b>Outlier Detection:</b><br />
<br />
A point beyond an inner fence on either side is considered a <b>mild outlier</b>. A point beyond an outer fence is considered an <b>extreme outlier</b>.<br />
<br />
<b>My script to detect outlier:</b><br />
<b><br /></b>
#!/usr/bin/perl -w<br />
<br />
# This script is to detect outliers in a dataset<br />
<br />
<br />
`sort -g $ARGV[0] > 'tmpSorted'`;<br />
<br />
open FH , "tmpSorted" or die "Cant open file for reading $! \n";<br />
<br />
my $count=1;<br />
my @arr;<br />
<br />
while(<FH>){<br />
chomp;<br />
$arr[$count]=$_;<br />
$count++;<br />
}<br />
close(FH);<br />
<br />
my ($median, $lowQ, $upQ, $low1B, $up1B, $low2B, $up2B);<br />
<br />
my $length = scalar(@arr);<br />
<br />
my $m = findMidpoint($length);<br />
<br />
my @med = @$m;<br />
<br />
if(scalar(@med) > 1){<br />
$median = ($arr[$med[0]] + $arr[$med[1]])/2;<br />
print "Inside outer if and $med[0] and $med[1] and median is $median\n";<br />
$m = findMidpoint($med[0]);<br />
my @tmp = @$m;<br />
$m = findMidpoint($length - $med[1] + 1);<br />
my @tmp1 = @$m;<br />
print "printing...",join("|",@tmp),"and ", join("=",@tmp1) , "\n<br />
";<br />
<br />
if(scalar(@tmp) > 1){<br />
$lowQ = ($arr[$tmp[0]] + $arr[$tmp[1]]) /2;<br />
}<br />
else{<br />
$lowQ = $arr[$tmp[0]];<br />
}<br />
if(scalar(@tmp1) > 1){<br />
$upQ = ($arr[ $med[1] + $tmp[0] - 1] + $arr[ $med[1] + $<br />
tmp[1] - 1]) / 2;<br />
<br />
}<br />
else{<br />
$upQ = $arr[$med[1] + $tmp1[0] - 1];<br />
}<br />
<br />
<br />
}<br />
<br />
else{<br />
$median = $arr[$med[0]];<br />
print "Inside outer else and median is $median\n";<br />
$m = findMidpoint($med[0]);<br />
my @tmp = @$m;<br />
$m = findMidpoint($length - $med[0] + 1);<br />
my @tmp1 = @$m;<br />
print "printing",@tmp, @tmp1 , "\n";<br />
<br />
if(scalar(@tmp) > 1){<br />
$lowQ = ($arr[$tmp[0]] + $arr[$tmp[1]]) /2;<br />
}<br />
else{<br />
$lowQ = $arr[$tmp[0]];<br />
}<br />
if(scalar(@tmp1) > 1){<br />
$upQ = ($arr[ $med[0] + $tmp[0] - 1] + $arr[ $med[0] + $<br />
tmp[1] - 1]) / 2;<br />
<br />
}<br />
else{<br />
$upQ = $arr[$med[0] + $tmp1[0] - 1];<br />
}<br />
<br />
}<br />
<br />
<br />
my $innerRange = ($upQ - $lowQ) * 1.5;<br />
my $outerRange = ($upQ - $lowQ) * 3;<br />
<br />
$low1B = $lowQ - $innerRange;<br />
$low2B = $lowQ - $outerRange;<br />
$up1B = $upQ + $innerRange;<br />
$up2B = $upQ + $outerRange;<br />
<br />
print "Median,lowq,upq,innerRange, outerRange, up1b,up2b, low1b, low2b is $media<br />
n, $lowQ, $upQ, $innerRange, $outerRange, $up1B, $up2B, $low1B, $low2B \n";<br />
<br />
sub findMidpoint{<br />
<br />
my $length = $_[0];<br />
my @arr;<br />
<br />
# Even number return 2 values<br />
if(length($length) % 2 == 0){<br />
$arr[0] = $length / 2;<br />
$arr[1] = $arr[0] + 1;<br />
}<br />
else{<br />
$arr[0] = ($length + 1)/2;<br />
}<br />
<br />
return \@arr;<br />
}<br />
<br />
<br />
<table cellpadding="0" cellspacing="20" style="width: 540px;"><tbody>
<tr><td valign="TOP" width="15%"></td><td valign="TOP" width="85%"><table cellpadding="0" cellspacing="20" style="width: 540px;"><tbody>
</tbody></table>
</td></tr>
</tbody></table>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-70865672205432522202014-10-24T14:01:00.000-07:002014-10-24T14:01:27.362-07:00Upgrading ubuntu and the consequences<div dir="ltr" style="text-align: left;" trbidi="on">
When you upgrade ubuntu, there may be many unpleasant side effects. For instance I got an email about our server not accessible for citation purpose. I checked the web document roots and changed some permissions (which seem to have changed since the upgradation), still the site went blank.<br />
<br />
<pre style="background-color: #eeeeee; border: 0px; color: #333333; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; font-size: 14px; line-height: 18.2000007629395px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"><span style="color: black; font-family: 'Times New Roman'; font-size: small; line-height: normal; white-space: normal;">To check ubuntu version do the following:</span></pre>
<pre style="background-color: #eeeeee; border: 0px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"><span style="font-family: Times New Roman;"><span style="white-space: normal;">lsb_release -a</span></span></pre>
<pre style="background-color: #eeeeee; border: 0px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"><span style="font-family: Times New Roman;"><span style="white-space: normal;">Mine was 14.04</span></span></pre>
<br />
<br />
So I went ahead with a restart of apache and the commands are slightly different from that of red hat linux.<br />
<br />
<pre style="background-color: #eeeeee; border: 0px; color: #333333; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; font-size: 14px; line-height: 18.2000007629395px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"><code style="border: 0px; color: #222222; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; margin: 0px; padding: 0px; vertical-align: baseline; white-space: inherit;">sudo /etc/init.d/apache2 restart</code></pre>
<br />
<pre style="background-color: #eeeeee; border: 0px; color: #333333; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; font-size: 14px; line-height: 18.2000007629395px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"><code style="border: 0px; color: #222222; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; margin: 0px; padding: 0px; vertical-align: baseline; white-space: inherit;">Restarting web server apache2
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName
... waiting apache2:
Could not reliably determine the server's fully qualified domain name, using 127.0.1.1 for ServerName</code></pre>
<pre style="background-color: #eeeeee; border: 0px; color: #333333; font-family: 'Ubuntu Mono', 'Ubuntu Beta Mono A', Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; font-size: 14px; line-height: 18.2000007629395px; margin-bottom: 10px; max-height: 600px; overflow: auto; padding: 5px; vertical-align: baseline; width: auto; word-wrap: normal;"></pre>
Then browsing several web sites I did the following:
<br />
<br />
Created a file servername.conf inside<br />
sudo vim /etc/apache2/conf-available/servername.conf<br />
<br />
Inside this file entered a line<br />
<br />
ServerName MyDomainName<br />
sudo a2enconf servername (Name of the file created)<br />
<br />
then did a<br />
service apache2 reload<br />
Then restarted apache using:<br />
<br />
/etc/init.d/apache2 restart<br />
<br />
The warning message dis-appeared but the web page was still not up.<br />
<br />
then you may have to change the document root directory. In our case, it was /var/www earlier, but currently it is /var/www/html<br />
<br />
If you are depending upon bioperl modules, you may also see most of your perl modules dis-appearing. Then you search for that particular module using command:<br />
<br />
find / -name Session.pm<br />
<br />
You may see your INC path has changed. Now you would like to place your bioperl files in INC path.<br />
<br />
Change permissions of some files and then it will start working!!<br />
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0tag:blogger.com,1999:blog-7757118243803522394.post-91177159823293520552014-10-16T15:29:00.000-07:002014-10-16T15:29:33.306-07:00Day 2 and 3 Beyond Genome 2014<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Talk 2: Genetics of
Gleoblastoma:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Different populations
are there in glioblastoma and fits to cancer stem cell model<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Chipseq and functional
elements. In vitro model. Differentiated gleoblastoma cels . xenograft model.
Introduced TFS in vitro to induce tumor.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Core TFs bind to active
TPC regulatory elements<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Single cell RNAseq for
glioblastoma. There is receptor diversity inside the glioblastoma tumor. 430
cells are sequenced. Each cell detected
around 6000 genes. PGFRA and TGFR are negatively co-regulated.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Core TFs are highly
co-related with stemness. Negative correlation with MES. Then classify cells
into high stemness or low stemness. For cells there is a dominant transcrion
signature. Cells can switch fromone subclass to another. Tumors are more
heterogenous than was thought before<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Master regulators of
tumor initiation,progression…<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">SynGen algorithm for
predicting synergisim of molecules. ARACNe algorithm was used for
reconstructing genetic network.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Transcription profile:
regulatory network and functional network<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">LincsProject.org<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Cell types,
perturbations and phenotypic assays.Cell types can be cancer cell lines, any
other cell lines, perturbations can be drug related. Assays can be transcriptomics,
proteomic. Lincs L1000 data (CMapIII). 22119 genetic reagents, 77 cellular
context, 20413 chemical reagents<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Genomics, epigenomic…<o:p></o:p></span></div>
<div style="border-bottom: solid windowtext 1.0pt; border: none; mso-border-bottom-alt: solid windowtext .75pt; mso-element: para-border-div; padding: 0in 0in 1.0pt 0in;">
<div class="MsoNormal" style="border: none; mso-border-bottom-alt: solid windowtext .75pt; mso-padding-alt: 0in 0in 1.0pt 0in; padding: 0in;">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Myc cell cycle, apoptosis, cellular transformation, cell proliferation.
Myc is over expressed in many cancers.<o:p></o:p></span></div>
</div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Drugging the cancer
interactions<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Multifaceted target
assessment for druggability<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Cncerdrug targets make
distinct subnetworks inside a network. Cansar.icr.ac.uk<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">A complete catalogue,
identification of drivers. Data sets are fewer for epigenetic data
modifications.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">40epigenetic marks are
there<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Understand the chromatin states between Normal->
Tumor -> Metastasis<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Chipseq for 35 chromatin
marks. Generated a lot of data. Chromatin state prediction with ChromHMM<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Relative changes in
chromatin marks.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Loss of acylation in tumorigenic
cells. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Six billion reads have
been generated. Epigenomic plasticity<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Personalized medicine<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">BRAF is mutated in human
melanoma<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Everolimus -> Has
17000 somatic mutation for a person who responded well.Map2k1 15 bp deletion.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">EGFR BRAF MEK AKT mTORC1 Chemo<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">All the patients with
solid tumors have what kind of mutation needs to be determined before assessing
their treatment type.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">341 genes are listed for
assay<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Ten trillion bacterial
cells, ten times morebacteril cells human genes<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">100 times more number of
genes<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Circulating tumor DNA.
Cell free DNA 90% are hematopoetic stem cells. Cell free DNA increases in
cancer patients.. Plasma-seq . The coverage is .1X depth<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Grail is text mining
tool<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Finding cancer driver
genes<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Blair cell 2013
co-morbidity studies. 15cancers are there in TCGA hassomaticmutations.
Gain,mutated,loss.OMIM has germlinemutations. Genetic links network,pathways.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Cancer is co-morbid with
another genetic disease that happens due to mutation. Albinism is associated
with some common genes associated with melanoma.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">1/3 of the medelian
disase have co-morbidity with ancer.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Bacterial-human somatic
lateral gene transfer for cancer. <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Fourth chromosoma of
drosophila has 20% genome from bacteria.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Day 3<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Talk1: Anchored
Assembly: You can try at </span><a href="mailto:trial@spiralgenetics.com"><span style="background: white; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">trial@spiralgenetics.com</span></a><span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Bioinformatics
challenge:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">BTG Informatics
challenge: Single cell Copy Number analysis<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="http://schatzlab.cshl.edu/btg14/"><span style="background: white; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">http://Schatzlab.cshl.edu/btg14/</span></a><span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;"> <o:p></o:p></span></div>
<div class="MsoNormal">
<a href="mailto:Btgcg2014@gmail.com"><span style="background: white; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Btgcg2014@gmail.com</span></a><span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Baslan Nature protocol
2012 <o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Visualizationofmulti
dimensionalcancer data Genome Medicine 2013<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Copy number prediction
using Titan, Ha et al. Genome Research
2014<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Genomic media
andclinical cancer medicine: Dana Faber Institute:<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">IGB<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Guided visualization
exploration of cancer genomics<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">--<o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<span style="background: white; color: #1f1f1f; font-family: "Verdana","sans-serif"; font-size: 10.0pt; line-height: 115%;">Jian Ma<o:p></o:p></span></div>
</div>
Sucheta Tripathy PI @ Computational Genomics Group at IICB, Kolkatahttp://www.blogger.com/profile/17433426304045795341noreply@blogger.com0