Tuxedo suite comprises of bowtie, tophat, cufflink, cummeRBund and many more accessory tools.
First get your genome fasta file (final genome assembly file).
1. Map your RNAseq fastq files using tophat (if all is well your run will be seamless)
2. Run cufflink over your tophat output file (cufflinks accepted_hits.bam). This run will take a while since cufflink will actually merge the reads into transcripts, isoforms, genes and so on. If your files are large then in a good enough server expect it to run for 8-12 hours.
3. Run cuffmerge: cuffmerge list.txt -> where list.txt carries the names of the files of *_transcripts.gtf files. This will run very fast and will merge all the gene_ids that will be same across all your samples. The output of this file is a merged.gtf file.
4. For running differential expression analysis run the following:
/cuffdiff merged.gtf tophat_HTI1-vs-HTI4/accepted_hits.bam tophat_HTI2-vs-HTI4/accepted_hits.bam tophat_HTI3-vs-HTI4/accepted_hits.bam
For running cummeRbund, get all these files to your working directory
The best option will be to put all of these 11 files into a separate directory inside your working directory: say 'diff_exp'
You can run Rstudio if you like in your windows machine or run R in your server. For running CummeRbund you will need the following packages that you can go ahead and download upfront:
For example: setwd("C:/Users/Sucheta/Documents/MyLabIICB/AllCollaborations/NahidAliCollaboration/companion")
Then load the library:
library(cummeRbund)
Now read your 11 files using this command
data <-readCufflinks("diff_exp")
This will take a while to read but will create a db file in your source directory. This is your database file.
Now you can plot gene density using the following command:
csDensity(genes(data))
Or can do a volcano plot of differentially expressed genes using:
v<-csVolcanoMatrix(genes(data))
v
As you can see from this file, the different conditions have least difference among themselves.
This will continue in next blog...
First get your genome fasta file (final genome assembly file).
1. Map your RNAseq fastq files using tophat (if all is well your run will be seamless)
2. Run cufflink over your tophat output file (cufflinks accepted_hits.bam). This run will take a while since cufflink will actually merge the reads into transcripts, isoforms, genes and so on. If your files are large then in a good enough server expect it to run for 8-12 hours.
3. Run cuffmerge: cuffmerge list.txt -> where list.txt carries the names of the files of *_transcripts.gtf files. This will run very fast and will merge all the gene_ids that will be same across all your samples. The output of this file is a merged.gtf file.
4. For running differential expression analysis run the following:
/cuffdiff merged.gtf tophat_HTI1-vs-HTI4/accepted_hits.bam tophat_HTI2-vs-HTI4/accepted_hits.bam tophat_HTI3-vs-HTI4/accepted_hits.bam
This will create a plethora of files, but the following files are the ones you will be proceeding with for cummeRbund for result visualization and generating publication quality images.
For running cummeRbund, get all these files to your working directory
isoforms.fpkm_tracking
|
isoform_exp.diff
|
genes.fpkm_tracking
|
gene_exp.diff
|
tss_groups.fpkm_tracking
|
tss_group_exp.diff
|
cds.fpkm_tracking
|
cds_exp.diff
|
cds.diff
|
promoters.diff
|
splicing.diff
|
You can run Rstudio if you like in your windows machine or run R in your server. For running CummeRbund you will need the following packages that you can go ahead and download upfront:
- RSQLite
- ggplot2 v0.9.2
- reshape2
- plyr
- fastcluster
- rtracklayer
- Gviz
- BiocGenerics (>=0.3.2)
- Hmisc
For example: setwd("C:/Users/Sucheta/Documents/MyLabIICB/AllCollaborations/NahidAliCollaboration/companion")
Then load the library:
library(cummeRbund)
Now read your 11 files using this command
data <-readCufflinks("diff_exp")
This will take a while to read but will create a db file in your source directory. This is your database file.
Now you can plot gene density using the following command:
csDensity(genes(data))
Or can do a volcano plot of differentially expressed genes using:
v<-csVolcanoMatrix(genes(data))
v
As you can see from this file, the different conditions have least difference among themselves.
This will continue in next blog...
seems tuxedo suite is pretty good , thanks for the blog, in my future work will use this one
ReplyDelete