P_ErrorCorrection ================= P_ErrorCorrection is a module in SMRT Pipe versions 1.3.1 to 1.3.3 that performs error correction on PacBio long reads by mapping shorter, high accuracy reads onto the long reads. The error corrected reads can then be assembled using a long-read assembler, such as Celera Assembler, Mira, or Allora. If you are using SMRT Analysis 1.4, please consider using the `HGAP approach `_ instead. Unlike the PacBioToCA module in Celera Assembler, P_ErrorCorrection has the ability to keep long reads together in regions where there is no short read coverage. P_ErrorCorrection requires a params.xml and an input.xml. params.xml ---------- P_ErrorCorrection can use fasta/fastq files as input if the options ``useFastqAsShortReads`` and ``useFastaAsLongReads`` are set to true. The following shows this configuration:: Error Correction True True False False -advanceHalf -noSplitSubreads -ignoreQuality -minMatch 10 -minPctIdentity 70 -bestn 20 --overlapTolerance=25 ``bas.h5`` files can also be used directly if either ``useFastqAsShortReads`` and ``useFastaAsLongReads`` are set to false. The P_Fetch and P_Filter modules are needed in this scenario, and the params.xml needs to be adjusted accordingly; this is detailed in the `SMRT Pipe Reference Guide`_. .. _SMRT Pipe Reference Guide: http://www.smrtcommunity.com/SMRT-Analysis/Software/software/smrtanalysis/1.3.0/doc/SMRT%20Pipe%20Reference%20Guide.pdf Additional parameters are described in the `SMRT Pipe Reference Guide`_. input.xml --------- You can specify the actual data to be error corrected in the ``input.xml`` file:: As the params.xml options suggest, the long reads data need to be in fasta format, and the short read data need to be in fastq format. If referring to ``bas.h5`` files, you can use the following format:: /path/to/bas.h5 The run id needs to be unique. If there are a number of ``bas.h5`` files, the input.xml can be autogenerated using the ``fofnToSmrtpipeInput.py`` script. This script takes a fofn file, which contains list of filenames separated by carriage returns, and outputs a properly formatted input.xml. job.sh ------ The job.sh will contain the command to run the P_ErrorCorrection module using SMRTPipe. :: smrtpipe.py --params=params.xml xml:input.xml Running P_ErrorCorrection ------------------------- First ensure you have setup the SMRT Analysis environment:: source /opt/smrtanalysis/etc/setup.sh Afterwards, now run ``job.sh``:: source job.sh