Evaluating assemblies ===================== Aligning reads against a reference ---------------------------------- The following commands create a sorted bam file that can be viewed in IGV, mapping the lambda PacBio raw reads against the reference:: /tools/bwa/bwa index lambda.fasta nice -n 15 /tools/bwa/bwa bwasw -t 16 lambda.fasta pacbio.filtered_subreads.fasta > pacbio.filtered_subreads.sam /tools/samtools/samtools view -bS pacbio.filtered_subreads.sam > pacbio.filtered_subreads.bam /tools/samtools/samtools sort pacbio.filtered_subreads.bam pacbio.filtered_subreads.sorted /tools/samtools/samtools index pacbio.filtered_subreads.sorted.bam The software can be downloaded here: * samtools: http://samtools.sourceforge.net/ * bwa: http://bio-bwa.sourceforge.net/ The bam files can be viewed using this software: * IGV: http://www.broadinstitute.org/software/igv/download * Tablet: http://bioinf.scri.ac.uk/tablet/ Running Mauve ------------- Mauve needs to be called from within its root directory, so first let's change directories:: cd /tools/mauve Now we can run Mauve:: java -cp Mauve.jar org.gel.mauve.assembly.ScoreAssembly -reference /data/assembly/lambda.fasta -assembly ~/training/assembly/celera/asm/9-terminator/asm.ctg.fasta -reorder -outputDir ~/training/assembly/mauve_celera/ Note that we need to specify both a ``-reference`` and an ``-assembly``, as well as the output directory ``-outputDir``. ``-reorder`` allows Mauve to reorder and flip the strand of the contigs to best align the contigs to the reference. A recent build of Mauve is required to use this functionality. More details about scoring genome assemblies with Mauve can be found here: http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve Mauve includes a Java-based browser of the genome alignments, downloadable at http://asap.ahabs.wisc.edu/mauve/ . To browse an alignment, open one of files in the subdirectories in the output directory, e.g. alignment5/alignment5 Nucmer: Contig accuracy and dotplots ------------------------------------ Using ``nucmer`` from the MUMmer package, we can calculate the per base accuracy of contigs against a reference:: /tools/MUMmer/nucmer --maxmatch -c 100 -p celera lambda.fasta celera-assembly.fasta /tools/MUMmer/show-coords -c celera.delta > celera.coords Notes: * ``--maxmatch -c 100`` tunes nucmer to look exhaustively for longer matches. * ``-p celera`` specifies ``celera`` to be the output prefix, in this case generating ``celera.delta`` * Here we placed the reference fasta before the assembly fasta. ``celera.coords`` contains the results:: /data/assembly/lambda.fasta /home/lhon/training/assembly/celera/asm/9-terminator/asm.ctg.fasta NUCMER [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [COV R] [COV Q] | [TAGS] ========================================================================================================== 42 48496 | 48452 1 | 48455 48452 | 99.93 | 99.90 100.00 | ref000001|lambda_NEB3011 ctg7180000000001 In this case, the single contig that Celera Assembler generated had a 99.93% accuracy with 99.9% coverage of the genome. Now let's visualize the alignment, using ``celera.delta`` generated by the nucmer command above:: /tools/MUMmer/mummerplot --png -p celera celera.delta -R lambda.fasta -Q celera-assembly.fasta