AHA

AHA is the hybrid scaffolding approach found in SMRT Analysis. Contigs from a draft assembly generated by a different assembler can be joined using PacBio long reads. The current draft genome size limit for AHA in SMRT Analysis 1.4 is 160 Mbases and 20.000 contigs. On the SMRT Pipe command line fasta files can be used directly for long read coverage. If a mixture of fasta and bas.h5 files is available for PacBio long read coverage it is suggested to export filtered_subreads from the bas.h5 files and combine them with the other fasta file in order to use the following input.xml file.

AHA requires a params.xml and an input.xml.

params.xml

Here is a params.xml:

<?xml version="1.0"?>
<smrtpipeSettings>
        <!-- HybridAssembly 1.2.0 parameter file for long reads -->
        <module name="HybridAssembly">
        <!-- General options -->
        <!-- Parameter schedules are used for iterative hybrid assembly. They are
                given in comma delimited tuples separate by semicolons. The fields
                in order are:

                - Minimum alignment score (aka Z-score). Higher is more stringent.
                - Minimum number of reads needed to link two contigs. (Redundancy)
                - Minimum subread length to participate in alignment.
                - Minimum contig length to participate in alignment.

                If a tuple contains less than 4 fields, defaults will be used for
                the remaining fields. -->
        <paramSchedule>6,3,75;6,3,75;6,2,75;6,2,75</paramSchedule>

        <!-- Untangling occurs after the main scaffolding step. Valid values
                are "bambus" and "pacbio" (recommended and the default). -->
        <untangler>pacbio</untangler>

        <!-- Gap fillin can be turned on by setting to True or off by setting to False -->
        <fillin>False</fillin>

        <!-- These options allow long reads -->
        <longReadsAsStrobe>True</longReadsAsStrobe>
        <blasrOpts>-minMatch 10 -minPctIdentity 70 -bestn 10 \
        -noSplitSubreads</blasrOpts>

        <!-- Parallelization options -->
        <numberProcesses>4</numberProcesses>
        </module>
</smrtpipeSettings>

input.xml

The input.xml specifies the contigs and long reads used to scaffold them:

<?xml version="1.0"?>
<pacbioAnalysisInputs>
        <dataReferences>

        <!-- High-confidence sequences fasta file -->
        <url ref="assembled_contigs:/home/lhon/analysis/aha/contigs.fasta"/>

        <!-- PacBio reads, either in fasta or in bas.h5 format. -->
        <url ref="file:/home/lhon/analysis/aha/pacbio.filtered_subreads.fasta" />

        </dataReferences>
</pacbioAnalysisInputs>

job.sh

The job.sh will contain the command to run the AHA hybrid assembly module using SMRT Pipe.

smrtpipe.py --params=params.xml xml:input.xml

Running AHA

First ensure you have setup the SMRT Analysis environment:

source /opt/smrtanalysis/etc/setup.sh

Afterwards, now run job.sh:

source job.sh

Table Of Contents

This Page