When you are dealing with colorspace data the choice of mappers get limited. Howver, my favorite mapper is still bowtie for several reasons. Lifescope has its own inhouse mapper; which claims to have a all round better approach in mapping colorspace data, but the lack of transparency on what happens within puts me off using this tool. Once bowtie maps the reads by default parameter, the next thing to do is to convert the sam file into bam file, sort it and index it. All these can be done using samtools. However, if the file size is large, you could do the sorting job using sort operations from unix commandline.
1. sort sam file
LC_ALL="C" sort -k 3,3 -k 4,4n input_sam > output_sam # This step will take a long time
sort options for samtools works but only on bam files and on many instances downstream analysis softwares complain about co-ordinates not being sorted...
or Use picard:
java -jar /share/apps/picard-tools-1.56/SortSam.jar I=bowtie.sam O=bowtie.bam SO=coordinate # This took one hour in a HPC with 48 GB RAM on each node for a file size of 30 GB
2. Make an index file of bam file
samtools index bowtie.bam bowtie.bai
3. MarkDuplicates using picard
java -jar /share/apps/picard-tools-1.56/MarkDuplicates.jar I=bowtie.bam M=metrics.bam O=duplicateMarked.bam
4. sort this bam file and make index using samtools
samtools sort duplicateMarked.bam duplicateMarked.sorted
samtools index duplicateMarked.sorted.bam duplicateMarked.sorted.bai
java -jar /share/apps/GenomeAnalysisTK-2.4-9-g532efad/resources/GenomeAnalysisTK.jar -T RealignerTargetCreator -I duplicateMarked.sorted.bam -R /share/reference/human/samtools/hg19.fa -o dM.bam.list
# The output file dM.bam.list returns 0 output. Check it later.