GenePattern offers a set of tools to support a wide variety of RNA-seq analyses, including short-read mapping, identification of splice junctions, transcript and isoform detection, quantitation, differential expression, quality control metrics, visualization, and file utilities. The tools released as GenePattern modules are widely-used. We continue to release new and updated tools as they become available. To be informed when new capabilities are added, check this page or sign up for our Twitter feed.
We recommend that you run these modules on a local GenePattern server, due to the average size of the input files for these modules. You can upload your data, and make use of the new file management features in GenePattern 3.6, but large data will take a while to upload, depending on your connection speed, data size, and current available bandwidth. Alternately, on the public GenePattern server, If you have a GenomeSpace account, and already have data stored there, you can link your GenePattern account with your GenomeSpace account and make use of the improved file management features in GenePattern 3.6.
COMPATIBILITY NOTE: A number of tools are built for Unix-based (Mac and Linux) systems and will not run on Windows machines. They are the Tuxedo suite tools (Bowtie, TopHat, Cufflinks, Cuffmerge, Cuffcompare, and Cuffdiff) and BWA.
You can install a local GenePattern server by doing the following:
Broad Institute members and collaborators can use the GPBroad server to send RNA-seq files directly to analysis modules. Community members can contact email@example.com to enable access to their RNA-seq files.
The TopHat, Bowtie, and BWA GenePattern modules provide pre-built reference genome indexes for a number of species. If you need an index for a species that is not hosted, email us at firstname.lastname@example.org. See this FAQ for more information on how to find other reference genome indexes.
Several of the modules accept reference genome annotation files (GTF files) and/or whole genome FASTA files. A list of these is available on our FTP site:
To use one of these files in a GenePattern module, click the Specify URL radio button under the input box for the GTF file parameter, and paste in the URL for the annotation file you want to use.
GenePattern provides support for the Tuxedo suite of Bowtie, Tophat, and Cufflinks, as described in Trapnell et al (2012) (Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks).
Bowtie is short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. For more information, please refer to the Bowtie documentation. The GenePattern Bowtie modules consist of the following tools:
TopHat is a fast splice junction mapper. TopHat uses Bowtie to map RNA-seq reads to a reference genome, then analyzes the mapping results to identify splice junctions between exons. For more information about the algorithm, please refer to the TopHat documentation.
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-seq samples. It accepts aligned RNA-seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. For more information, please refer to the Cufflinks documentation. Cufflinks contains several accessory tools:
For more information, please refer to the BWA documentation. The GenePattern BWA modules consist of the following tools:
Scripture is a method for transcriptome reconstruction that relies solely on RNA-seq reads and an assembled genome to build a transcriptome ab initio. Scripture has been implemented in GenePattern as a pipeline containing several of the functions wrapped as individual modules. Please note: the modules must be executed as part of the Scripture pipeline. For more information, please refer to the Scripture documentation. Available Scripture pipelines are:
This module calculates useful metrics for determining the quality of RNA-seq data such as depth of coverage, rRNA contamination, continuity of coverage, and GC bias. For more information, including a suggested workflow for preprocessing your data files, see the in-depth article about RNA-seq QC in GenePattern.
IGV is a visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types including sequence alignments, microarrays, and genomic annotations. For more information, please refer to the IGV documentation.
The Picard tools are widely-used utilities for manipulating SAM/BAM files, and we have wrapped a number of them for GenePattern. For more information on the SAM/BAM file format, see the SAMtools page. For more information about the Picard command-line tools, see the Picard site.
SAMtools are widely-used utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. We have started to wrap these tools for GenePattern, and will continue to add to the SAMtools modules. For more information on the SAM/BAM file format or about the SAMtools utilities, see the SAMtools site.
ExprToGct: This module converts a file in EXPR format to GCT format. The EXPR file format is a tab-delimited format produced by Cufflinks version 1 (deprecated in Cufflinks version 2 and higher).