Followers

Sunday, March 26, 2017

analysing the alleles from haplotypes on pacbio data

I had been working in pacbio data and when am trying to identify the alleles from haplotypes from diploid assembly, in the very early step itself i got many errors, because i had been following the illumina dataset method like for pacbio data, but the developed tools behaves strange with the data and I got stuck for 3-4 days i googled the maximum and tried various approaches, Finally i posted in the forums and interacted with the GATK developers, they suggested me a simple solution for solving my errors, so those who are working in long reads and want to identify the haplotypes here is my commandline and verified one
[ Any aligners can be used even BLASR initially i was thinking there was a problem with my aligner, but really not] and no need to mark duplicates in case of long reads only for illumina reads its been recommended by the developer, i had reached till the step of HaplotypeCaller so far no error its running smooth, If i change commands or face any problems, will be updated, once the output is ready maybe i can paste some of my output

bwa index 2017_V6_Pr102_assembly.fa
bwa mem -x pacbio 2017_V6_Pr102_assembly.fa /data/results1/STLab/Takao_data/Raw_data/ND886/all_ND886.fastq > aln.sam
samtools view -b -S aln.sam -o aln.bam
samtools sort aln.bam > aln_sorted.bam
samtools index aln_sorted.bam
samtools mpileup -uf 2017_V2_ND886_assembly.fa aln_sorted.bam | /share/apps/bcftools-1.2/bcftools call  -cv - > out.vcf [use bcftools1.2 otherwise its not producing the genotype information]

Use any of your favourite haplotype phaser (whatshap/ hapcut) along with the above produced bam and vcf file

Now u get the phased  alleles from haplotypes u can compare them and these can be used to downstream analysis



Friday, March 24, 2017

Fancy genomics “Iam taking you all to the world of two-Speed genomes concept"



My Phd problem includes the various approaches for solving genome assembly problems. When I was working on oomycetes project, I was attracted by the effector proteins, Evolution, pathogenicity, synteny, transposon, Repeat regions, suddenly the fancy thing which came in the mind after reading an interesting paper from biorxiv that is verticullum genome, a group from Netherlands have sequenced and studied the 2-speed genome concepts among the strains. http://genome.cshlp.org/content/early/2016/07/12/gr.204974.116.full.pdf+html I was impressed by the work, then I showed the work to my PI even she was impressed by the speed genomes. I work in a collaborative program where exactly my collaborator also was fascinated by the  speed genome work.
Let me explain what is 2 speed genomes?
It was already known that fungi and the plant pathogen genomes comprises of Effector proteins. Which plays an important role in causing pathogenicity to the host. These Effector genes are not randomly distributed across the genomes, tend to be associated with the compartments enriched with repeat sequences and transposons. This led to the ‘two-speed genome’ model in which filamentous pathogen genomes have a bipartite architecture with gene sparse, repeat rich compartments for adaptive evolution.  The unusual genome architecture and occurrence of effector genes in specific genome compartments is a feature that has evolved repeatedly in independent phylogenetic lineages of filamentous pathogens. Genome analyses of P. infestans and three of its sister species revealed uneven evolutionary rates across genomes with genes in repeat-rich regions showing higher rates of structural polymorphisms and positive selection.  Two-speed genome architecture with the effector genes populating the more rapidly evolving sections of the genomes.  Lineages that acquired two-speed genomes have increased survivability — they are less probabe to go extinct compared to lineages with less adaptable genomes, which are more probabe to be purged out of the biota as their hosts develop full resistance or become extinct. In this ‘jump or die’ model, pathogen lineages that have an increased likelihood to produce virulent genotypes on resistant hosts and non-hosts benefit from a macroevolutionary advantage and end up dominating the biota. Several filamentous plant pathogens have evolved by shifting or jumping from one host plant to another.
The information has been shared from this paper a great detailed review by Sophien and Raffaele et al its available here http://www.sciencedirect.com/science/article/pii/S0959437X15000945 .
For who don’t have access to science direct the same paper is available at biorxiv repository please find the link http://biorxiv.org/content/early/2015/07/01/021774