Genomic Identification of Significant Targets in Cancer (version 2.0.22)
Author: Steven Schumacher, Jen Dobson, Rameen Beroukhim, Gad Getz
Contact:
Gad Getz, Rameen Beroukhim, Craig Mermel, Steven Schumacher, and Jen Dobson, GISTIC-Forum
Algorithm Version: 2.0.22
The GISTIC module identifies regions of the genome that are significantly amplified or deleted across a set of samples. Each aberration is assigned a G-score that considers the amplitude of the aberration as well as the frequency of its occurrence across samples. False Discovery Rate q-values are then calculated for the aberrant regions, and regions with q-values below a user-defined threshold are considered significant.
For each significant region, a “peak region” is identified, which is the part of the aberrant region with greatest amplitude and frequency of alteration. In addition, a “wide peak” is determined using a leave-one-out algorithm to allow for errors in the boundaries in a single sample. The “wide peak” boundaries are more robust for identifying the most likely gene targets in the region.
Each significantly aberrant region is also tested to determine whether it results primarily from broad events (longer than half a chromosome arm), focal events, or significant levels of both. The GISTIC module reports the genomic locations and calculated q-values for the aberrant regions. It identifies the samples that exhibit each significant amplification or deletion, and it lists genes found in each “wide peak” region.
Note: The GISTIC module is memory-intensive.
Mermel C, Schumacher S, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biology. 2011;12:R41.
Beroukhim R, Mermel C, et al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463:899-905.
Name | Flag | Description |
---|---|---|
refgene file * | -refgene | The reference file including cytoband and gene location information. |
seg file * | -seg | The segmentation file contains the segmented data for all the samples identified by GLAD, CBS, or some other segmentation algorithm. (See GLAD file format in the Genepattern file formats documentation.) It is a six column, tab-delimited file with an optional first line identifying the columns. Positions are in base pair units. |
markers file * | -mk | The markers file identifies the marker names and positions of the markers in the original dataset (before segmentation). It is a three column, tab-delimited file with an optional header. If not already, markers are sorted by genomic position. |
array list file | -alf | The array list file is an optional file identifying the subset of samples to be used in the analysis. It is a one column file with an optional header. The sample identifiers listed in the array list file must match the sample names given in the segmentation file. |
cnv file | -cnv | There are two options for the cnv file. The first option allows CNVs to be identified by marker name. The second option allows the CNVs to be identified by genomic location. |
gene gistic * | -genegistic | Flag indicating that the gene GISTIC algorithm should be used to calculate the significance of deletions at a gene level instead of a marker level. |
amplifications threshold * | -ta | Threshold for copy number amplifications. Regions with a log2 ratio above this value are considered amplified. |
deletions threshold * | -td | Threshold for copy number deletions. Regions with a log2 ratio below the negative of this value are considered deletions. |
join segment size * | -js | Smallest number of markers to allow in segments from the segmented data. Segments that contain a number of markers less than or equal to this number are joined to the neighboring segment that is closest in copy number. |
qv thresh * | -qvt | Threshholding value for q-values. |
remove X * | -rx | Flag indicating whether to remove data from the X-chromosome before analysis. |
cap val * | -cap | Minimum and maximum cap values on analyzed data. Regions with a log2 ratio greater than the cap are set to the cap value; regions with a log2 ratio less than -cap value are set to -cap. |
confidence level * | -conf | Confidence level used to calculate the region containing a driver. |
run broad analysis * | -broad | Flag indicating whether an additional broad-level analysis should be performed. |
broad length cutoff * | -brlen | Threshold used to distinguish broad form focal events, given in units of fraction of chromosome arm. |
max sample segs * | -maxseg | Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis. |
arm peel * | -armpeel | Whether to perform arm level peel off. This helps separate peaks which cleans up noise. |
sample center * | -scent | Method for centering each sample prior to the GISTIC analysis. |
gene collapse method * | -gcm | Method for reducing marker-level copy number data to the gene-level copy number data in the gene tables. Markers contained in the gene are used when available, otherwise the flanking marker or markers are used. |
output prefix * | -fname | The prefix for the output file name |
* - required
The reference genome file contains information about the location of genes and cytobands on a given build of the genome. Reference genome files are created in MATLAB TM and are not viewable with a text editor. The GISTIC 2.0 release includes the following reference genomes: hg16.mat, hg17.mat, hg18.mat, and hg19.mat).
The array list file is an optional file identifying the subset of samples to be used in the analysis. It is a one column file with an optional header (array). The sample identifiers listed in the array list file must match the sample names given in the segmentation file.
Sample Data
Each of the analyzed samples is represented in one of the columns following the lesion data (columns 10 through end). The data contained in these columns varies slightly by section of the file.
Please see the GenePattern FAQ (http://www.broadinstitute.org/cancer/software/genepattern/doc/faq) for assistance with specific errors.
Task Type:
SNP Analysis
CPU Type:
64-bit
Operating System:
Linux
Language:
MATLAB
Version | Release Date | Description |
---|---|---|
6 | 2012-10-12 | Added description of the all_thresholded.by_genes.txt output file to the documentation. |
5 | 2012-06-20 | GISTIC module v.5 contains the update to GISTIC 2.0.16. There are extensive changes to the algorithms and result files, from GISTIC 1.0. See Mermel et al (2011) for more info |