TopHat (v9) BETA

This module is currently in beta release. The module and/or documentation may be incomplete.

TopHat 2.0.11 is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Note that we now recommend using HISAT2Aligner instead of TopHat

Author: Marc-Danie Nazaire, gp-help@broadinstitute.org, Broad Institute

Contact:

Marc-Danie Nazaire, gp-help@broadinstitute.org

Algorithm Version: 2.0.11

Summary

Note that we now recommend using HISAT2Aligner instead of TopHat 
 
TopHat is a fast splice junction mapper. TopHat uses Bowtie to map RNA-seq reads to a reference genome, then analyzes the mapping results to identify splice junctions between exons. The software is optimized for reads 75bp or longer. TopHat was created at the University of Maryland Center for Bioinformatics and Computational Biology. This document is adapted from the TopHat documentation for release 2.0.9. 

Usage

TopHat takes one or more reads files in either FASTQ or FASTA format, optionally compressed to gzip or bzip2 format.  For paired-end reads, provide the "*_1" ("left") file through the reads.pair.1 parameter and the "*_2" ("right") file through the reads.pair.2 parameter.  For single-end reads, use the reads.pair.1 parameter only.
 
The TopHat tool provides a number of additonal options and switches that are not directly available through this module's paramters.  The additional.tophat.options parameter is provided to pass these through if you feel that you need them.  To use it, simply specify the extra option(s) along with any arguments in the input text field separated by spaces.  At this time, this parameter unfortunately does not easily support options which require a file argument.  Check the TopHat manual for more details of the available options.  Also note that there may be additional undocumented options; manually running the cufflinks executable at the command line with no arguments may show even more options.  If you feel that a particular missing option would be of broad general interest, please contact the GenePattern team and we will look into adding it.  Use of this parameter is recommended for expert use only; use it at your own discretion.  The GenePattern team does not explicitly test all of the possible options that may be passed through using this parameter and can only provide limited support.  

References

Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013 Apr 25;14(4):R36. 
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578. 
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105-11.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. 
 

Parameters

Name Description
bowtie index * A zip file or directory containing a Bowtie 2 index.
GTF file A GTF file (v. 2.2 or higher) or GFF3 file containing a list of gene model annotations. TopHat will first extract the transcript sequences and align them to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings. 
transcriptome index  A directory containing a transcriptome index generated from a previous run of TopHat. TopHat can be run to generate the trancriptome index only. To do this run TopHat with only a bowtie index and a GTF file.
reads pair 1  Unpaired reads file or first mate for paired reads. One or more files containing reads in FASTA or FASTQ format (bz2 and gz compressed files are supported).
reads pair 2  Second mate for paired reads. Zero or more files in FASTA or FASTQ format (bz2 and gz compressed files are supported).
mate inner dist  The expected mean inner distance between mate pairs.
mate std dev  The standard deviation for the distribution of inner distances between mate pairs.
library type  Library type for strand specific reads.
Bowtie preset options  A combination of pre-packaged options for Bowtie 2 based on speed and sensitivity/accuracy.
transcriptome only  Whether to align the reads to the virtual transcriptome (provided in the GTF file parameter) and report only those mappings as genomic mappings.
max transcriptome hits  The maximum number of mappings allowed for a read when it is aligned to the virtual transcriptome (provided in the GTF file parameter). Any reads found with more than this number of mappings will be discarded.
prefilter multihits  When mapping reads on the virtual transcriptome (provided in the GTF file parameter), some repetitive or low complexity reads that would be discarded in the context of the genome may appear to align to the transcript sequences and thus may end up reported as mapped to those genes only. This option directs TopHat to first align the reads to the whole genome, then exclude such multi-mapped reads.
raw junctions file  A file containing raw junctions. Junctions are specified one per line in a tab-delimited format.
find novel junctions  If you select no, then the module will only look for junctions indicated in the GTF file supplied in the GTF file parameter. (This parameter is ignored when no GTF file is specified.)
min anchor length  The anchor length. This value must be at least 3.
max splice mismatches  The maximum number of mismatches that may appear in the "anchor" region of a spliced alignment.
min intron length  The minimum intron length. TopHat will ignore donor/acceptor pairs closer than this many bases apart.
max intron length  The maximum intron length. When searching for junctions ab initio, TopHat will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read.
max insertion length  The maximum insertion length
max deletion length  The maximum deletion length
quality value scale  Whether to use the Solexa, Phred 33, or Solexa v. 1.3 (Phred 64) quality value scale.
max multihits  The maximum number of times a read can be aligned to the reference genome. If a read is aligned more than this number of times, then TopHat will choose the alignments based on their alignment scores, reporting the alignments with the best alignment scores. If there are more than this number of alignments with the same score for a read, TopHat will randomly report only this many alignments.
read mismatches  Final read alignments having more than these many mismatches should be discarded.
coverage search  Enables or disables the coverage based search for junctions. Use when coverage search is disabled by default (such as for reads 75bp or longer), for maximum sensitivity.
microexon search  Attempts to find alignments incident to microexons. Works only for reads 50bp or longer.
fusion mapping  Whether to enable fusion mapping
fusion anchor length  A "supporting" read must map to both sides of a fusion by at least these many bases. Only applies when fusion mapping is set to yes.
fusion read mismatches  Reads support fusions if they map across fusion with at most these many mismatches. Only applies when fusion mapping is set to yes.
output prefix  The prefix to use for the output file
read edit dist  Final read alignments having more than these many edit distance are discarded.
read gap length  Final read alignments having more than these many total length of gaps are discarded.
additional tophat options  Additional options to be passed along to the TopHat program at the command line. This parameter gives you a means to specify otherwise unavailable TopHat options and switches not supported by the module; check the TopHat manual for details. Recommended for experts only; use this at your own discretion.

* - required

Notes

TopHat has a number of parameters and options, and their default values are tuned for processing mammalian RNA-Seq reads. If you would like to use TopHat for another class of organism, we recommend setting some of the parameters with more strict, conservative values than their defaults. Usually, setting the maximum intron size to 4 or 5 Kb is sufficient to discover most junctions while keeping the number of false positives low.

Input Files

  1. RNA-seq reads files in FASTA/FASTQ format (can be gzip or bzip2 compressed) For more information on the FASTA format, see the NIH description here: 
  2. Custom Bowtie index (optional, if the prebuilt indexes do not include the genome you need) 
    This file is a genome reference index. You must create this file using Bowtie (Bowtie version 2.0 or higher) and can use the Bowtie.indexer GenePattern module for this.   A large and growing number of hosted genomes are selectable from the parameter, possibly allowing you to avoid this step.
  3. GTF file (optional) 
    A GTF/GFF file containing exon annotations, to provide a virtual transcriptome. The values in the first column of this file (the column that indicates the chromosome or contig on which the feature is located) MUST MATCH the sequence names in the reference sequence in the Bowtie index you are using with Tophat. Note that this is also case sensitive. For more information on GTF format, see the specification: http://mblab.wustl.edu/GTF22.html. For more information on GFF format, see the specification: http://www.sequenceontology.org/gff3.shtml
  4. Raw junctions file (optional) 
    Junctions are specified one per line, in a tab-delimited format, like so: 
    <chrom> <left> <right> <+/-> 
    <left> specifies the last character of the left sequence to be spliced to the first character of the <right> sequence, inclusive: that is, the last and first positions of the exons that flank the junction site.
    You can take the junctions.bed output file from TopHat and convert it to this format. 

Output Files

  1. <output.prefix>.accepted_hits.bam 
    A list of read alignments in BAM format. This file can be used as input for Cufflinks. BAM is the binary equivalent of SAM, a compact short read alignment format. For more information on the SAM/BAM formats, see the specification at: 
    http://samtools.sourceforge.net.
  2. <output.prefix>.junctions.bed 
    A BED file of junctions reported by TopHat (for more information on the BED format, see: http://genome.ucsc.edu/FAQ/FAQformat.html). Each junction consists of two connected BED blocks, where each block is as long as the maximal overhang of any read spanning the junction. The score is the number of alignments spanning the junction.
  3. <output.prefix>.insertions.bed 
    UCSC BED tracks of insertions reported by TopHat.
    insertions.bed - chromLeft refers to the last genomic base before the insertion. 
  4. <output.prefix>.deletions.bed. 
    UCSC BED tracks of deletions reported by TopHat. 
    deletions.bed - chromLeft refers to the first genomic base of the deletion.
  5. <output.prefix>.unmapped.bam 
    A list of reads left unaligned in a BAM file.

Command Line Arguments

Usage:
    tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]] [quals1,[quals2,...]] [quals1[,quals2,...]]
 
Options:
-v/--version                      

Prints the TopHat version number and exits.

   
-o/--output-dir Sets the name of the directory in which TopHat will write all of its output. <string>  [ default: ./tophat_out ]
--bowtie1  Uses Bowtie1 instead of Bowtie2. If you use colorspace reads, you need to use this option as Bowtie2 does not support colorspace reads.   [ default: bowtie2 ]
-N/--read-mismatches Final read alignments having more than these many mismatches are discarded.  <int> [ default: 2 ]
--read-gap-length Final read alignments having more than these many total length of gaps are discarded. <int> [ default: 2 ]
--read-edit-dist Final read alignments having more than these many edit distance are discarded.  <int> [ default: 2 ]
--read-realign-edit-dist Some of the reads spanning multiple exons may be mapped incorrectly as a contiguous alignment to the genome even though the correct alignment should be a spliced one - this can happen in the presence of processed pseudogenes that are rarely (if at all) transcribed or expressed. This option can direct TopHat to re-align reads for which the edit distance of an alignment obtained in a previous mapping step is above or equal to this option value. If you set this option to 0, TopHat will map every read in all the mapping steps (transcriptome if you provided gene annotations, genome, and finally splice variants detected by TopHat), reporting the best possible alignment found in any of these mapping steps. This may greatly increase the mapping accuracy at the expense of an increase in running time.  <int> [ default: "read-edit-dist" + 1 ]
-a/--min-anchor The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3. <int> [ default: 8 ]
-m/--splice-mismatches The maximum number of mismatches that may appear in the "anchor" region of a spliced alignment. <0-2> [ default: 0 ]
i/--min-intron-length The minimum intron length. TopHat will ignore donor/acceptor pairs closer than this many bases apart. <int> [ default: 50 ]
-I/--max-intron-length The maximum intron length. When searching for junctions ab initio, TopHat will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read. <int> [ default: 500000 ]
-g/--max-multihits Instructs TopHat to allow up to this many alignments to the reference for a given read, and choose the alignments based on their alignment scores if there are more than this number. The default is 20 for read mapping. Unless you use --report-secondary-alignments, TopHat will report the alignments with the best alignment score. If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments. In case of using --report-secondary-alignments, TopHat will try to report alignments up to this option value, and TopHat may randomly output some of the alignments with the same score to meet this number. <int> [ default: 20 ]
--suppress-hits      
-x/--transcriptome-max-hits Maximum number of mappings allowed for a read, when aligned to the transcriptome (any reads found with more then this number of mappings will be discarded). <int> [ default: 60 ]
-M/--prefilter-multihits When mapping reads on the transcriptome, some repetitive or low complexity reads that would be discarded in the context of the genome may appear to align to the transcript sequences and thus may end up reported as mapped to those genes only. This option directs TopHat to first align the reads to the whole genome in order to determine and exclude such multi-mapped reads (according to the value of the -g/--max-multihits option).  
( for -G/--GTF option, enable an initial bowtie search against the genome )
--max-insertion-length The maximum insertion length. <int> [ default: 3 ]
--max-deletion-length  The maximum deletion length. <int> [ default: 3 ]
--solexa-quals Use the Solexa scale for quality values in FASTQ files.    
--solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.   ( same as phred64-quals )
--phred64-quals     ( same as solexa1.3-quals )
-Q/--quals Separate quality value files - colorspace read files (CSFASTA) come with separate qual files.    
--integer-quals Quality values are space-delimited integer values, this becomes default when you specify -C/--color.    
-C/--color
Colorspace reads, note that it uses a colorspace bowtie index and requires Bowtie 0.12.6 or higher.
Common usage: tophat --color --quals [other options]* <colorspace_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2] <quals1_1[,...,qualsN_1]> [quals1_2,...qualsN_2]
  ( Solid - color space )
--color-out      
--library-type

The default is unstranded (fr-unstranded). If either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below.

fr-unstranded - Standard Illumina: Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
 
fr-firststrand - dUTP, NSR, NNSR: Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
 
fr-secondstrand - Ligation, Standard SOLiD: Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
 
<string>
( fr-unstranded, fr-firststrand, fr-secondstrand )
 
-p/--num-threads Use this many threads to align reads. <int> [ default: 1 ]
-R/--resume
In case a TopHat run was terminated prematurely (process failure due to external factors, e.g. running out of memory because of other processes running on the same machine, or the disk getting full), users can attempt to resume the interrupted TopHat run by just providing this option with the output directory for that run. TopHat sets several checkpoints after every lengthy operations in the pipeline and when this option is provided, it will attempt to resume the pipeline from the last successful checkpoint. This special usage of TopHat only requires this option, e.g. the command line could simply be:
    tophat -R tophat_out (or your TopHat output directory if you used the -o/--output-dir option)
Note that none of the original options used for the original TopHat run should be provided, TopHat will find all the original options (and the checkpoint info) in the logs/run.log file found in the specified directory.
<out_dir> ( try to resume execution )
-G/--GTF
Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file. If this option is provided, TopHat will first extract the transcript sequences and use Bowtie to align reads to this virtual transcriptome first. Only the reads that do not fully map to the transcriptome will then be mapped on the genome. The reads that did map on the transcriptome will be converted to genomic mappings (spliced as needed) and merged with the novel mappings and junctions in the final tophat output.
Please note that the values in the first column of the provided GTF/GFF file (column which indicates the chromosome or contig on which the feature is located), must match the name of the reference sequence in the Bowtie index you are using with TopHat. You can get a list of the sequence names in a Bowtie index by typing:
 
bowtie-inspect --names your_index
 
So before using a known annotation file with this option please make sure that the 1st column in the annotation file uses the exact same chromosome/contig names (case sensitive) as shown by the bowtie-inspect command above.
<filename> ( GTF/GFF with known transcripts )
--transcriptome-index When providing TopHat with a known transcript file (-G/--GTF option above), a transcriptome sequence file is built and a Bowtie index has to be created for it in order to align the reads to the known transcripts. Creating this Bowtie index can be time consuming and in many cases the same transcriptome data is being used for aligning multiple samples with TopHat. A transcriptome index and the associated data files (the original GFF file) can be thus reused for multiple TopHat runs with this option, so these files are only created for the first run with a given set of transcripts. If multiple TopHat runs are planned with the same transcriptome data, TopHat should be first run with the -G/--GTF option together with the --transcriptome-index option pointing to a directory and a name prefix which will indicate where the transcriptome data files will be stored. Then subsequent TopHat runs using the same --transcriptome-index option value will directly use the transcriptome data created in the first run (no -G option needed after the first run).  <bwtidx> (transcriptome bowtie index)
-T/--transcriptome-only Only align the reads to the transcriptome and report only those mappings as genomic mappings.   (map only to the transcriptome)
-j/--raw-juncs
Supply TopHat with a list of raw junctions. Junctions are specified one per line, in a tab-delimited format. Records look like:
 
<chrom> <left> <right> <+/->
left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive. That is, the last and the first positions of the flanking exons. Users can convert junctions.bed (one of the TopHat outputs) to this format using bed_to_juncs < junctions.bed > new_list.juncs where bed_to_juncs can be found under the same folder as tophat
<filename>  
--insertions
Supply TopHat with a list of insertions or deletions with respect to the reference. Indels are specified one per line, in a tab-delimited format, identical to that of junctions. Records look like:
 
For insertion,
<chrom> <left> <dummy> <inserted sequence>
left is zero-based coordinate and dummy can be set to the same value as left
. For instance, "chr1 17491 17491 CA", where two base pairs "CA" are inserted between 17490 and 17491 of the reference genome.
<filename>  
--deletions
Supply TopHat with a list of insertions or deletions with respect to the reference. Indels are specified one per line, in a tab-delimited format, identical to that of junctions. Records look like:
For deletion,
<chrom> <left> <right>
left and right are zero-based coordinates, and specify the last character of the left sequenced to be spliced to the first character of the right sequence, inclusive.
For instance, "chr1 20564 20567", where two base pairs located at 20565 and 20566 are deleted in the sequenced genome.
<filename>  
-r/--mate-inner-dist This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. <int> [ default: 50 ]
--mate-std-dev The standard deviation for the distribution on inner distances between mate pairs.  <int> [ default: 20 ]
--no-novel-juncs Only look for reads across junctions indicated in the supplied GFF or junctions file. (ignored without -G/-j)    
--no-novel-indels Only look for reads across indels in the supplied indel file, or disable indel detection when no file has been provided.    
--no-gtf-juncs      
--no-coverage-search Disables the coverage based search for junctions.    
--coverage-search Enables the coverage based search for junctions. Use when coverage search is disabled by default (such as for reads 75bp or longer), for maximum sensitivity.    
--microexon-search With this option, the pipeline will attempt to find alignments incident to micro-exons. Works only for reads 50bp or longer.    
--keep-tmp Causes TopHat to preserve its intermediate files produced during the run (mostly useful for debugging). The default is to delete these temporary files.    
--tmp-dir    <dirname> [ default: <output_dir>/tmp ]
-z/--zpacker Manually specify the program used for compression of temporary files; default is gzip; use -z0 to disable compression altogether. Any program that is option-compatible with gzip can be used (e.g. bzip2, pigz, pbzip2). <program> [ default: gzip ]
-X/--unmapped-fifo     
[use mkfifo to compress more temporary files for color space reads]

 

Advanced Options:

--report-secondary-alignments By default TopHat reports best or primary alignments based on alignment scores (AS). Use this option if you want to output additional or secondary alignments  (up to 20 alignments will be reported this way, this limit can be changed by using the -g/--max-multihits option above).    
--no-discordant For paired reads, report only concordant mappings.    
--no-mixed For paired reads, only report read alignments if both reads in a pair can be mapped (by default, if TopHat cannot find a concordant or discordant alignment for both reads in a pair, it will find and report alignments for each read separately; this option disables that behavior).    
--segment-mismatches Read segments are mapped independently, allowing up to this many mismatches in each segment alignment. <int> [ default: 2 ]
--segment-length Each read is cut up into segments, each at least this long. These segments are mapped independently. <int> [ default: 25 ]
--bowtie-n TopHat uses "-v" in Bowtie for initial read mapping (the default), but with this option, "-n" is used instead. Read segments are always mapped using "-v" option.   [ default: bowtie -v ]
--min-coverage-intron The minimum intron length that may be found during coverage search. <int> [ default: 50 ]
--max-coverage-intron The maximum intron length that may be found during coverage search. <int> [ default: 20000 ]
--min-segment-intron The minimum intron length that may be found during split-segment search. <int> [ default: 50 ]
--max-segment-intron The maximum intron length that may be found during split-segment search. <int> [ default: 500000 ]
--no-sort-bam Output BAM is not coordinate-sorted.   (Do not output bam format.)
--no-convert-bam Do not convert to bam format. Output is <output_dir>/accepted_hit.sam. Implies --no-sort-bam.    Output is <output_dir>/accepted_hit.sam)
--keep-fasta-order In order to sort alignments in the same order in the genome fasta file, the option can be used. But this option will make the output SAM/BAM file incompatible with those from the previous versions of TopHat (1.4.1 or lower).    
--allow-partial-mapping      

Bowtie2 specific options:

Bowtie 2 provides many options so that users can have more flexibility as to how reads are mapped. TopHat 2 allows users to pass many of these options to Bowtie 2 by preceding the Bowtie 2 option name with the --b2- prefix.

Preset options in --end-to-end mode (local alignment is not used in TopHat2)
TopHat 2 option Corresponding Bowtie 2 option
--b2-very-fast --very-fast
--b2-fast --fast
--b2-sensitive --sensitive
--b2-very-sensitive --very-sensitive

 

Alignment options

--b2-N <int> [ default: 0 ]
--b2-L <int> [ default: 20 ]   
--b2-i  <func> [ default: S,1,1.25 ]
--b2-n-ceil  <func> [ default: L,0,0.15 ]
--b2-gbar <func> [ default: 4 ]

 

Scoring options

--b2-mp <int>,<int> [ default: 6,2 ]
--b2-np <int> [ default: 1 ]
--b2-rdg <int>,<int> [ default: 5,3 ]
--b2-rfg <int>,<int> [ default: 5,3 ]
--b2-score-min <func> [ default: L,-0.6,-0.6 ]


Effort options

--b2-D <int> [ default: 15 ]
--b2-R <int> [ default: 2 ]

 

Fusion related options:

Please note that -tophat-fusion-post is a separate program from -tophat-fusion and was not incorporated into the TopHat algorithm. More information can be found in the TopHat Manual under "Fusion Mapping Options"

--fusion-search Turn on fusion mapping    
--fusion-anchor-length A "supporting" read must map to both sides of a fusion by at least these many bases. <int> [ default: 20 ]
--fusion-min-dist
 
--fusion-min-dist For intra-chromosomal fusions, TopHat-Fusion tries to find fusions separated by at least this distance.
<int> [ default: 10000000 ]
--fusion-read-mismatches Reads support fusions if they map across fusion with at most these many mismatches. <int> [ default: 2 ]
--fusion-multireads Reads that map to more than these many places will be ignored. It may be possible that a fusion is supported by reads (or pairs) that map to multiple places. <int> [ default: 2 ]
--fusion-multipairs Pairs that map to more than these many places will be ignored. <int> [ default: 2 ]
--fusion-ignore-chromosomes Ignore some chromosomes such as chrM when detecting fusion break points. Please check the correct names for chromosomes, that is, mitochondrial DNA is represented as chrM or M depending on the annotation you use. <list> [ e.g, <chrM,chrX> 
--fusion-do-not-resolve-conflicts      [ this is for test purposes ]

 

SAM Header Options (for embedding sequencing run metadata in output):

--rg-id Read group ID <string>
--rg-sample Sample ID <string>
--rg-library Library ID <string>
--rg-description Descriptive string, no tabs allowed <string>
-rg-platform-unit e.g Illumina lane ID <string>
--rg-center Eequencing center name <string>
--rg-date ISO 8601 date of the sequencing run <string>
--rg-platform Sequencing platform descriptor <string>

 

Platform Dependencies

Task Type:
RNA-seq

CPU Type:
any

Operating System:
any

Language:
C++;Perl;Python

Version Comments

Version Release Date Description
9 2014-07-09 Beta: Updated to TopHat 2.0.11. Added the transcriptome index parameter and an Advanced Parameters UI.
8 2014-04-22 Updated to TopHat 2.0.9. Added a parameter to allow the user to pass through extra TopHat options, added read.edit.dist and read.gap.length, and changed to use hosted index files.
7 2014-06-10 Modified to handle spaces in the reads input file names
6 2013-05-07 Updated to TopHat 2.0.8b
5 2011-12-19 Updated to TopHat version 1.3.3
4 2014-06-09 updated to TopHat version 1.3.0 and SAMtools verison 0.1.14
3 2011-02-25 improved error message for when invalid input file is specified and also fixed check for paired reads
2 2011-02-17 updated description for inner mate distance and also fixed check for inner mate distance when running paired reads
1 2011-02-04