MergeHTSeqCounts (v1) BETA

This module is currently in beta release. The module and/or documentation may be incomplete.

Merges HTSeq read count data files into one gct file

Author: Marc-Danie Nazaire, Ted Liefeld

Contact:

gp-help@broadinstitute.org

Algorithm Version:

Summary

This module takes the the read count output from HTSeq for multiple samples and creates a single GCT file. Optionally you can include a sampleInfo file which contains one column with the count file basenames, one column to be used to differentiate the samples for a CLS file and optionally a column with a sample name to use in the gct file. (instead of using the count filename as a sample name)

Parameters

Name Description
input files * A directory or one or more text files containing HTSeq read counts.
output prefix * The base name of the output file. File extensions will be added automatically.
sampleinfo file A sample info file containing HTSeq filenames (minus the extension) and a column to base a cls file on
filenames column The column in the sample info file that contains the filenames. This is used to match up the class division column up to the appropriate input file. This can be either a column index (starting at 0) or a string matching the header of a column in the sample info file.
class division column The column in the sample info file that specifies a phenotype to use to assign classes and create a class file for the input samples. This is only relevant if a sample info file is provided.
sample name column The column in the sample info file that specifies a sample name to use in the generated gct file. In no sample info file is provided or no value is specified the columns in the gct file will match the file names of the input files.

* - required

Input Files

  1. One or more tab delimited text files containing HTSeq read counts
    A two column text file where the first column contains the gene symbol and the second column specifies the read count.
    ENSG00000000003 210
    ENSG00000000005 320
    ENSG00000000419 104
    ​​
  2. Sample Info File.  See The GenePattern File Formats guide.

 

Output Files

  1. A file in the GenePattern GCT format

Requirements

Python 2.6

Platform Dependencies

Task Type:
RNA-seq

CPU Type:
any

Operating System:
any

Language:
Python 2.6

Version Comments

Version Release Date Description
.7 2017-12-11 Bug fix for error is cls file syntax
.6 2017-11-02 Added sampleInfo and column inputs, used to rename samples and generate a cls file