The Hisat2.indexer generates genome indexes for the Hisat2Aligner module. HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). The
Author: Ted Liefeld
Contact:
Ted Liefeld, jliefeld@cloud.ucsd.edu
Algorithm Version: 2.1.0
The Hisat2.indexer uses HISAT2's hisat2-build script to build a HISAT2 index from a set of DNA sequences. It outputs a set of 6 files with suffixes .1.ht2, .2.ht2, .3.ht2, .4.ht2, .5.ht2, .6.ht2, .7.ht2, and .8.ht2. In the case of a large index these suffixes will have a ht2l termination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by HISAT2 once the index is built.
Use of Karkkainen's blockwise algorithm allows hisat2-build to trade off between running time and memory usage. By default, hisat2-build will automatically search for the settings that yield the best running time without exhausting memory. The HISAT2 index is based on the FM Index of Ferragina and Manzini, which in turn is based on the Burrows-Wheeler transform. The algorithm used to build the index is based on the blockwise algorithm of Karkkainen.
https://ccb.jhu.edu/software/hisat2/index.shtml
Name | Description |
---|---|
index name prefix* | The name prefix of the resulting index files and of the zip file which contains them. |
fasta file | One or more FASTA files (or a zip file containing one or more FASTA files) containing the reference sequences to be aligned to. E.g., <reference_in> might be chr1.fa,chr2.fa,chrX.fa,chrY.fa . |
gtf file | Optional GTF file with information about exons. If present this will run extract_exons.py and extrac_splice_sites.py on the GTF file and then add the splice sites and exons to the index |
dry run* | When true, the module only prints the hisat command-line that would be sent to the program's standard output file (stdout.txt) but does not execute the alignment. Useful for testing or generating a command line to run HISAT2 outside of GenePattern. |
* - required
<reference_in>
might be chr1.fa,chr2.fa,chrX.fa,chrY.fa
.This module is implemented using a Docker container to provide the environment.
Task Type:
CPU Type:
Operating System:
Language:
Version | Release Date | Description |
---|---|---|
2 | 2021-10-11 | Rename HISAT2Indexer to HISAT2.indexer |
1 | 2018-10-25 | Initial production release |