Evaluating assemblies

Aligning reads against a reference

The following commands create a sorted bam file that can be viewed in IGV, mapping the lambda PacBio raw reads against the reference:

/tools/bwa/bwa index lambda.fasta
nice -n 15 /tools/bwa/bwa bwasw -t 16 lambda.fasta pacbio.filtered_subreads.fasta > pacbio.filtered_subreads.sam
/tools/samtools/samtools view -bS pacbio.filtered_subreads.sam > pacbio.filtered_subreads.bam
/tools/samtools/samtools sort pacbio.filtered_subreads.bam pacbio.filtered_subreads.sorted
/tools/samtools/samtools index pacbio.filtered_subreads.sorted.bam

The software can be downloaded here:

The bam files can be viewed using this software:

Running Mauve

Mauve needs to be called from within its root directory, so first let’s change directories:

cd /tools/mauve

Now we can run Mauve:

java -cp Mauve.jar org.gel.mauve.assembly.ScoreAssembly -reference /data/assembly/lambda.fasta -assembly ~/training/assembly/celera/asm/9-terminator/asm.ctg.fasta -reorder -outputDir ~/training/assembly/mauve_celera/

Note that we need to specify both a -reference and an -assembly, as well as the output directory -outputDir. -reorder allows Mauve to reorder and flip the strand of the contigs to best align the contigs to the reference. A recent build of Mauve is required to use this functionality.

More details about scoring genome assemblies with Mauve can be found here: http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve

Mauve includes a Java-based browser of the genome alignments, downloadable at http://asap.ahabs.wisc.edu/mauve/ . To browse an alignment, open one of files in the subdirectories in the output directory, e.g. alignment5/alignment5

Nucmer: Contig accuracy and dotplots

Using nucmer from the MUMmer package, we can calculate the per base accuracy of contigs against a reference:

/tools/MUMmer/nucmer --maxmatch -c 100 -p celera lambda.fasta celera-assembly.fasta
/tools/MUMmer/show-coords -c celera.delta > celera.coords

Notes:

  • --maxmatch -c 100 tunes nucmer to look exhaustively for longer matches.
  • -p celera specifies celera to be the output prefix, in this case generating celera.delta
  • Here we placed the reference fasta before the assembly fasta.

celera.coords contains the results:

/data/assembly/lambda.fasta /home/lhon/training/assembly/celera/asm/9-terminator/asm.ctg.fasta
NUCMER

    [S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [COV R]  [COV Q]  | [TAGS]
==========================================================================================================
      42    48496  |    48452        1  |    48455    48452  |    99.93  |    99.90   100.00  | ref000001|lambda_NEB3011        ctg7180000000001

In this case, the single contig that Celera Assembler generated had a 99.93% accuracy with 99.9% coverage of the genome.

Now let’s visualize the alignment, using celera.delta generated by the nucmer command above:

/tools/MUMmer/mummerplot --png -p celera celera.delta -R lambda.fasta -Q celera-assembly.fasta

Table Of Contents

This Page