Wednesday, October 26, 2016
Structural variation in the genomes
Structural variation:Structural variation is a change or variation which leads to change in the structure of organisms's chromosome. structural variants can be of Insertions, duplication, Inversion and translocation. According to the human genome or people work in genome say that if there is a variant more than of 50 base pairs changes in the human genome of 1%. Its believed that some of the genetic diseases are caused due to the structural variations.whats the difference between the SNP's and structural variation?SNP's are single nucleotide base mutations which have been validated to be present in more than 1% of the population when a single base differes between the 2 genomes. These are any mutations which cause a change in the organism's chromosome structure, such as Insertions, deletions, copy number variations, duplications, inversions and translocation. SNPs and INDELs are about low-level genomic variation. The structural variants which affect the genome at larger scales. Events like gene duplications, tandem repeats, transposon insertions, inversions, and other chromosomal rearrangements. The long read sequencing technology paves the way to understand the structural variants using the split read alignment.[Information from literature Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly Yingrui Li, et al] structural variations from short sequencing reads are hampered by one or more of the following limitations: (i) the methods may favor a particular length range of structural variations; (ii) they may favor discovery of particular types of structural variations; (iii) they may be unable to resolve the exact structural variation genotypes and/or breakpoints at single nucleotide resolution; and (iv) because of difficulties mapping reads to the genome, they may not be able to accurately identify complex rearrangements. Paired-end mapping, for example, can only predict insertion breakpoints within a few base pairs of the exact breakpoint position, and it can only detect insertions when the entire sequence is contained within the DNA fragment whose ends are being sequenced; thus, the maximum size of an insertion that can be detected by paired-end mapping is limited by the largest insert size present in a library. Split-read methods, on the other hand, can precisely define a breakpoint and genotype of an insertion, but only when it is shorter than the read length. Thus, studies carried out so far have been of limited completeness, accuracy and/or resolution.
BWA-MEM or BLASR
http://lh3.github.io/2014/12/10/bwa-mem-for-long-error-prone-reads/ this is a very nice blog discusses about the alignment methods useful of the pacbio long reads.
https://www.biostars.org/p/63306/ forum discusses about the split read alignments.
Tips for structural variant analysis:
1. The maximum number of Reads should be mapped in the breakpoints of the chromosome and the coverage should be high.
2. How many Individual reads are supporting the translocation versus supporting assembly for identifying the translocations.
[ I spoke with some of the developers asking about the structural variants of draft pacbio assembly plant pathogen human said completely I can use the tools for predicting , am trying to do for one of the plant pathogen genome]
one of the paper in 2014 talks about all approaches