This module is currently in beta release. The module and/or documentation may be incomplete.
Merges HTSeq read count data files into one gct file
Author: Marc-Danie Nazaire, Ted Liefeld
Contact:
gp-help@broadinstitute.org
Algorithm Version:
Summary
This module takes the the read count output from HTSeq for multiple samples and creates a single GCT file. Optionally you can include a sampleInfo file which contains one column with the count file basenames, one column to be used to differentiate the samples for a CLS file and optionally a column with a sample name to use in the gct file. (instead of using the count filename as a sample name)
Parameters
Name | Description |
---|---|
input files * | A directory or one or more text files containing HTSeq read counts. |
output prefix * | The base name of the output file. File extensions will be added automatically. |
sampleinfo file | A sample info file containing HTSeq filenames (minus the extension) and a column to base a cls file on |
filenames column | The column in the sample info file that contains the filenames. This is used to match up the class division column up to the appropriate input file. This can be either a column index (starting at 0) or a string matching the header of a column in the sample info file. |
class division column | The column in the sample info file that specifies a phenotype to use to assign classes and create a class file for the input samples. This is only relevant if a sample info file is provided. |
sample name column | The column in the sample info file that specifies a sample name to use in the generated gct file. In no sample info file is provided or no value is specified the columns in the gct file will match the file names of the input files. |
* - required
Input Files
- One or more tab delimited text files containing HTSeq read counts
A two column text file where the first column contains the gene symbol and the second column specifies the read count.ENSG00000000003 210 ENSG00000000005 320 ENSG00000000419 104 - Sample Info File. See The GenePattern File Formats guide.
Output Files
- A file in the GenePattern GCT format
Requirements
Python 2.6
Platform Dependencies
Task Type:
RNA-seq
CPU Type:
any
Operating System:
any
Language:
Python 2.6
Version Comments
Version | Release Date | Description |
---|---|---|
.7 | 2017-12-11 | Bug fix for error is cls file syntax |
.6 | 2017-11-02 | Added sampleInfo and column inputs, used to rename samples and generate a cls file |