This module is currently in beta release. The module and/or documentation may be incomplete.
Aligns long sequences (> 200 bp) to a sequence database using BWA 0.7.4 bwasw.
Author: Heng Li, Broad Institute
Contact:
gp-help@broadinstitute.org
Algorithm Version: BWA 0.7.4
Summary
BWA.bwasw is a fast, light-weight tool that aligns long reads (>200 bp). It performs heuristic Smith-Waterman-like alignment to find high-scoring local hits, and thus chimera. BWA.bwasw can also be used to align ~100bp reads, but it is slower than the BWA.aln module.
This document is adapted from the BWA documentation for release 0.7.4. For more information about BWA.bwasw, see the BWA project site. BWA.bwasw was developed at the Wellcome Trust Sanger Institute and the Broad Institute.
Note: Index files created with BWA version 0.5.x or earlier are not compatible with the aligners of version 0.6.x and newer. Likewise, the BWA 0.6.x and newer index files are not compatible with the 0.5.x aligners. The BWA 0.7.x aligners are able to use index files created with 0.6.x, however.
Speed
Speed of alignment is largely determined by the error rate of the query sequences, faster with near-perfect hits and slower for higher error rates. Pairing is slower for shorter reads, mostly because shorter reads have more spurious hits.
References
BWA manual page: http://bio-bwa.sourceforge.net/bwa.shtml.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589-595. [PMID: 20080505] (http://bioinformatics.oxfordjournals.org/content/26/5/589.long)
Parameters
Name | Description |
---|---|
BWA index * | A BWA index. You can select from a list of hosted indexes or provide a custom index in the form of a ZIP bundle (as generated by the BWA.indexer module). |
read file * | Single-end or first paired-end reads file in FASTA or FASTQ format. For paired-end data, this should be the forward ("*_1" or "left") input file. |
mate read file * |
The reverse ("*_2" or "right") reads file for paired-end reads in FASTA or FASTQ format. |
match score | The score of the match |
mismatch penalty | Specifies the mismatch penalty. |
gap open penalty | Gap open penalty. The gap open penalty is the score taken away for the initiation of the gap in sequence. To make the match more significant you can try to make the gap penalty larger. |
gap extension penalty | Gap extension penalty. The gap extension penalty is added to the standard gap penalty for each base or residue in the gap. To reduce long gaps, increase the extension gap penalty. A few long gaps are expected, rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. (The exception to this rule is where one or both sequences are single reads with possible sequencing errors, in which case many single base gaps are expected. To cope with this, try setting the gap open penalty very low and using the gap extension penalty to control gap scoring.) |
band width | band width in the banded alignment |
min score threshold | Minimum score threshold divided by the match score |
threshold coefficient | Coefficient for threshold adjustment according to query length |
z best heuristics | Z-best heuristics. Specifying a higher number increases accuracy at the cost of speed |
max sa interval size | Maximum SA interval size for initiating a seed. Specifying a higher number increases accuracy at the cost of speed. |
min num seeds | The minimum number of seeds contained in the best alignment from the forward-forward alignment process that allows the algorithm to skip performing the reverse alignment. BWA.bwasw tends to be faster and more accurate if the alignment is supported by more seeds. |
num threads | Number of threads. BWA.bwasw will run more quickly with multiple threads; however, there have been reports that results will not always be identical. |
output prefix * | Prefix to use for the output file name |
* - required
Input Files
-
BWA index
A set of BWA index files bundled as a ZIP archive, as produced by the BWA.indexer module. The GenePattern FTP site also hosts a number of index bundels, available in a dropdown selection (requires GenePattern 3.7.0+).
Note: these index files must have been produced by BWA version 0.6.x or 0.7.x. -
read file
Single-end or first paired-end reads file in FASTA or FASTQ format. For paired-end data, this should be the forward ("*_1" or "left") input file. -
mate read file
The reverse ("*_2" or "right") reads file for paired-end reads in FASTA or FASTQ format.
Output Files
-
SAM file
The aligned sequences are output in SAM format. For more details on this alignment file, see the SAM format specification at http://samtools.sourceforge.net/SAM-1.3.pdf.
Platform Dependencies
Task Type:
RNA-seq
CPU Type:
any
Operating System:
any
Language:
C;Perl
Version Comments
Version | Release Date | Description |
---|---|---|
2 | 2014-07-17 | Beta: Updated to BWA 0.7.4, changed to use dynamic FTP-hosted index files, and switched to HTML-based doc |
1 | 2011-05-02 |