Methylation array processing with the R package minfi. Compatible with either 450k or EPIC array data.
Author: Clarence Mah, ckmah@ucsd.edu
Contact:
Clarence Mah, ckmah@ucsd.edu
Algorithm Version:
Introduction
The minfi package provides tools for analyzing Illumina’s Methylation arrays, specifically the 450k and EPIC (also known as the 850k) arrays. This module addresses basic preprocessing, QC assessments and plotting functionality. The input data to this package is a zip archive of IDAT files, where each IDAT file contain probe intensity values of 2 different color channels.
In this section, we briefly introduce the 450K array. Each sample is measured on a single array, in two different color channels (red and green). As the name of the platform indicates, each array measures more than 450,000 CpG positions. For each CpG, we have two measurements: a methylated intensity and an unmethylated intensity. Depending on the probe design, the signals are reported in different colors:
For Type I design, both signals are measured in the same color: one probe for the methylated signal and one probe for the unmethylated signal. For Type II design, only one probe is used. The intensity read in the green channel measures the methylated signal, and the intensity read in the red channel measures the unmethylated signal.
Two commonly measures are used to report the methylation levels: Beta values and M values.
Beta value:
where and denote the methylated and unmethylated signals respectively.
MValue:
References
- Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD and Irizarry RA (2014). “Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays.” Bioinformatics, 30(10), pp. 1363–1369. doi: 10.1093/bioinformatics/btu049.
- Maksimovic J, Gordon L and Oshlack A (2012). “SWAN: Subset quantile Within-Array Normalization for Illumina Infinium HumanMethylation450 BeadChips.” Genome Biology, 13(6), pp. R44. doi: 10.1186/gb-2012-13-6-r44.
- Fortin J, Labbe A, Lemire M, Zanke BW, Hudson TJ, Fertig EJ, Greenwood CM and Hansen KD (2014). “Functional normalization of 450k methylation array data improves replication in large cancer studies.” Genome Biology, 15(12), pp. 503. doi: 10.1186/s13059-014-0503-2.
- Triche TJ, Weisenberger DJ, Van Den Berg D, Laird PW and Siegmund KD (2013). “Low-level processing of Illumina Infinium DNA Methylation BeadArrays.” Nucleic Acids Research, 41(7), pp. e90. doi: 10.1093/nar/gkt090.
- Fortin J and Hansen KD (2015). “Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data.” Genome Biology, 16, pp. 180. doi: 10.1186/s13059-015-0741-y.
- Andrews SV, Ladd-Acosta C, Feinberg AP, Hansen KD and Fallin MD (2016). “'Gap hunting' to characterize clustered probe signals in Illumina methylation array data.” Epigenetics & Chromatin, 9(56). doi: 10.1186/s13072-016-0107-z.
- Fortin J, Triche TJ and Hansen KD (2016). “Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array.” bioRxiv. doi: 10.1101/065490.
Parameters
Name | Description |
---|---|
dataset* |
Provide a ZIP or GZ file containing the methylation microarray data in the Illumina Demo Dataset folder structure. Example data can be found here: infinium-methylationepic-demo-dataset.zip. |
normalization* | Specify the normalization method to preprocess the data. Minfi authors suggest "Functional Normalization" if there exist global methylation differences between your samples (e.g. dataset with cancer and normal samples), and otherwise suggest "Quantile Normalization" if you do not expect global differences beween your samples (e.g. blood or one tissue type dataset). "Illumina Preprocessing" performs background subtraction and control normalization as available in GenomeStudio. See references 1, 2, 3 for normalization algorithm details. |
output type* | Either beta value, m-values, or a MethylSet class R object can be provided as output. The MethylSet output type option only works when the normalization option is None. |
* - required
Input Files
- dataset
The ZIP/GZ archive folder structure should look like the following:
- 200144450018
- 200144450018_R04C01_1_Green.xml
- 200144450018_R04C01_1_Red.xml
- ...
- Effective.cfg
- Metrics.txt
- 200144450019
- SampleSheet.csv
- ...
Output Files
- methyl-[output type].txt - Tab delimited text file containing either the beta or m-values of every probe for each sample.
- qcPlots.pdf - A PDF containing plots for quality control.
Requirements
- R v3.4.2
- optparse 1.4.4
- minfi 1.24.0
Platform Dependencies
Task Type:
Preprocessing
CPU Type:
N/A
Operating System:
N/A
Language:
R
Version Comments
Version | Release Date | Description |
---|---|---|
1 | 2018-04-27 | Initial release |