Merge multiple Cufflinks assemblies
Author: Cole Trapnell et al, University of Maryland Center for Bioinformatics and Computational Biology
Algorithm Version: Cufflinks 2.0.2
The main purpose of Cufflinks.cuffmerge is to merge together several Cufflinks assemblies, making it easier to produce an assembly GTF file suitable for use with Cufflinks.cuffdiff. Cufflinks.cuffmerge also runs Cuffcompare in the background and automatically filters out transcribed fragments (transfrags) that are likely to be artifacts.
Cufflinks.cuffmerge is essentially a "meta-assembler": it treats the assembled transfrags from Cufflinks the way that Cufflinks treats reads, by merging them together parsimoniously, producing the smallest number of transcripts that explain the data. Furthermore, when a reference genome annotation is available, Cufflinks.cuffmerge can integrate reference transcripts into the merged assembly. It can also perform a reference annotation based transcript (RABT) assembly to merge reference transcripts with sample transfrags and produces a single annotation file for use in downstream differential analysis.
Cufflinks.cuffmerge was created at the University of Maryland Center for Bioinformatics and Computational Biology. This document is adapted from the Cufflinks documentation for release 2.0.2.
Cufflinks.cuffmerge takes one or more GTF files containing individual Cufflinks assemblies, a genome reference, and, optionally, a reference genome annotation GTF, and merges the information into a single assembly GTF file. For more information on the GTF file format, see the Input Files section.
If you have a reference genome GTF file available, you can provide it in order to gracefully merge novel isoforms and known isoforms and maximize overall assembly quality.
For more information on using RNA-seq modules in GenePattern, see the RNA-seq Analysis page.
Cufflinks.cuffmerge version 2+ can no longer accept a .txt input file list on GenePattern versions 3.6.0+. Instead, you may specify multiple files using the drag-and-drop interface.
Legacy information: However, if there are more than two GTF Cufflinks assembly files, they must be specified as a list in a text file passed via the input list file parameter. The files listed must be available on the same file system as the server. In the text file, each filename should include its full path. In GenePattern 3.6.0 and above, this parameter will accept server-hosted files directly through the drag-and-drop file parameter interface.
Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. 2013;31:46-53.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012;7;562–578.
Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 Sep 1;27(17):2325-9.
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511-515.
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105-1111.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.
|input file||GTF Cufflinks assembly files to be merged.|
|reference GTF||An optional reference annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output. Cuffmerge will use this to attach gene names and other metadata to the merged catalog. Cufflinks.cuffmerge will use this to attach gene names and other metadata to the merged catalog.|
|genome file *||A file containing the genomic DNA sequences for the reference. This should be a multi-FASTA file with all contigs present.|
* - required
|2||2013-09-25||Added hosted GTF and genome file selectors and HTML-based docs|