The following commands create a sorted bam file that can be viewed in IGV, mapping the lambda PacBio raw reads against the reference:
/tools/bwa/bwa index lambda.fasta
nice -n 15 /tools/bwa/bwa bwasw -t 16 lambda.fasta pacbio.filtered_subreads.fasta > pacbio.filtered_subreads.sam
/tools/samtools/samtools view -bS pacbio.filtered_subreads.sam > pacbio.filtered_subreads.bam
/tools/samtools/samtools sort pacbio.filtered_subreads.bam pacbio.filtered_subreads.sorted
/tools/samtools/samtools index pacbio.filtered_subreads.sorted.bam
The software can be downloaded here:
The bam files can be viewed using this software:
Mauve needs to be called from within its root directory, so first let’s change directories:
cd /tools/mauve
Now we can run Mauve:
java -cp Mauve.jar org.gel.mauve.assembly.ScoreAssembly -reference /data/assembly/lambda.fasta -assembly ~/training/assembly/celera/asm/9-terminator/asm.ctg.fasta -reorder -outputDir ~/training/assembly/mauve_celera/
Note that we need to specify both a -reference and an -assembly, as well as the output directory -outputDir. -reorder allows Mauve to reorder and flip the strand of the contigs to best align the contigs to the reference. A recent build of Mauve is required to use this functionality.
More details about scoring genome assemblies with Mauve can be found here: http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve
Mauve includes a Java-based browser of the genome alignments, downloadable at http://asap.ahabs.wisc.edu/mauve/ . To browse an alignment, open one of files in the subdirectories in the output directory, e.g. alignment5/alignment5
Using nucmer from the MUMmer package, we can calculate the per base accuracy of contigs against a reference:
/tools/MUMmer/nucmer --maxmatch -c 100 -p celera lambda.fasta celera-assembly.fasta
/tools/MUMmer/show-coords -c celera.delta > celera.coords
Notes:
celera.coords contains the results:
/data/assembly/lambda.fasta /home/lhon/training/assembly/celera/asm/9-terminator/asm.ctg.fasta
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [COV R] [COV Q] | [TAGS]
==========================================================================================================
42 48496 | 48452 1 | 48455 48452 | 99.93 | 99.90 100.00 | ref000001|lambda_NEB3011 ctg7180000000001
In this case, the single contig that Celera Assembler generated had a 99.93% accuracy with 99.9% coverage of the genome.
Now let’s visualize the alignment, using celera.delta generated by the nucmer command above:
/tools/MUMmer/mummerplot --png -p celera celera.delta -R lambda.fasta -Q celera-assembly.fasta