MinfiPreprocessing (v1)

Methylation array processing with the R package minfi. Compatible with either 450k or EPIC array data.

Author: Clarence Mah, ckmah@ucsd.edu

Contact:

Clarence Mah, ckmah@ucsd.edu

Algorithm Version:

Introduction

The minfi package provides tools for analyzing Illumina’s Methylation arrays, specifically the 450k and EPIC (also known as the 850k) arrays. This module addresses basic preprocessing, QC assessments and plotting functionality. The input data to this package is a zip archive of IDAT files, where each IDAT file contain probe intensity values of 2 different color channels.

In this section, we briefly introduce the 450K array. Each sample is measured on a single array, in two different color channels (red and green). As the name of the platform indicates, each array measures more than 450,000 CpG positions. For each CpG, we have two measurements: a methylated intensity and an unmethylated intensity. Depending on the probe design, the signals are reported in different colors:

alt tag

For Type I design, both signals are measured in the same color: one probe for the methylated signal and one probe for the unmethylated signal. For Type II design, only one probe is used. The intensity read in the green channel measures the methylated signal, and the intensity read in the red channel measures the unmethylated signal.

Two commonly measures are used to report the methylation levels: Beta values and M values.

 

Beta value:

MM

where M and U denote the methylated and unmethylated signals respectively.

MValue:

Mval=log(MU)

References

  1. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD and Irizarry RA (2014). “Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays.” Bioinformatics, 30(10), pp. 1363–1369. doi: 10.1093/bioinformatics/btu049.
  2. Maksimovic J, Gordon L and Oshlack A (2012). “SWAN: Subset quantile Within-Array Normalization for Illumina Infinium HumanMethylation450 BeadChips.” Genome Biology13(6), pp. R44. doi: 10.1186/gb-2012-13-6-r44.
  3. Fortin J, Labbe A, Lemire M, Zanke BW, Hudson TJ, Fertig EJ, Greenwood CM and Hansen KD (2014). “Functional normalization of 450k methylation array data improves replication in large cancer studies.” Genome Biology15(12), pp. 503. doi: 10.1186/s13059-014-0503-2.
  4. Triche TJ, Weisenberger DJ, Van Den Berg D, Laird PW and Siegmund KD (2013). “Low-level processing of Illumina Infinium DNA Methylation BeadArrays.” Nucleic Acids Research41(7), pp. e90. doi: 10.1093/nar/gkt090.
  5. Fortin J and Hansen KD (2015). “Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data.” Genome Biology16, pp. 180. doi: 10.1186/s13059-015-0741-y.
  6. Andrews SV, Ladd-Acosta C, Feinberg AP, Hansen KD and Fallin MD (2016). “'Gap hunting' to characterize clustered probe signals in Illumina methylation array data.” Epigenetics & Chromatin9(56). doi: 10.1186/s13072-016-0107-z.
  7. Fortin J, Triche TJ and Hansen KD (2016). “Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array.” bioRxiv. doi: 10.1101/065490.

Parameters

Name Description
dataset*

Provide a ZIP or GZ file containing the methylation microarray data in the Illumina Demo Dataset folder structure. Example data can be found here: infinium-methylationepic-demo-dataset.zip.

normalization* Specify the normalization method to preprocess the data. Minfi authors suggest "Functional Normalization" if there exist global methylation differences between your samples (e.g. dataset with cancer and normal samples), and otherwise suggest "Quantile Normalization" if you do not expect global differences beween your samples (e.g. blood or one tissue type dataset). "Illumina Preprocessing" performs background subtraction and control normalization as available in GenomeStudio. See references 1, 2, 3 for normalization algorithm details.
output type* Either beta value, m-values, or a MethylSet class R object can be provided as output. The MethylSet output type option only works when the normalization option is None.

* - required

Input Files

  1. dataset

    The ZIP/GZ archive folder structure should look like the following:

  • 200144450018
    • 200144450018_R04C01_1_Green.xml
    • 200144450018_R04C01_1_Red.xml
    • ...
    • Effective.cfg
    • Metrics.txt
  • 200144450019
  • SampleSheet.csv
  • ...

Output Files

  1. methyl-[output type].txt - Tab delimited text file containing either the beta or m-values of every probe for each sample.
  2. qcPlots.pdf - A PDF containing plots for quality control.

Requirements

  1. R v3.4.2
  2. optparse 1.4.4
  3. minfi 1.24.0

Platform Dependencies

Task Type:
Preprocessing

CPU Type:
N/A

Operating System:
N/A

Language:
R

Version Comments

Version Release Date Description
1 2018-04-27 Initial release