Creates customizable plots of expression data-derived data.
Author: Rachel Melamed, Gordon Hyatt, Benoist-Mathis Lab, Joslin Diabetes Center
Multiplot allows users to create 2-parameter scatter plots from microarray data. The scatter plots created are customizable and interactive. On such plots, each probe (gene) is represented by an individual dot, whose identity and characteristics can be queried. The two axes represent two values chosen by the user, allowing any combination of parameters to be displayed. There are also default parameter subsets for commonly used plots: Fold Change vs Expression (“MvA”), Fold Change vs Significance (“Volcano”), Fold Change vs Fold Change (“FC/FC”).
Conceptually, the representation and user interaction that underlie analyses with Multiplot derive from the software tools used to analyze cell populations in multiparameter flow cytometry. This is particularly true for the flow cytometry “gating” concept: select a group of genes that have particular characteristics for one or more parameters, and display their values for another set of parameters by filtering or with color highlights.
Multiplot is thus most useful for chip data with several classes (experimental conditions). For such data, a user might want to know not only what genes best distinguish between two classes, but also how the gene sets that distinguish two classes behave in the other classes. The tool allows the user to easily create plots comparing expression in different classes, and to add highlights of gene sets of interest from other comparisons. Multiplot is also quite useful for quality control, particularly when where there are limited numbers of replicates for each condition, and where statistical tools better suited for larger datasets cannot be applied. Highlight plots can be used to test whether the aggregate behavior of a gene set applies in all replicate pairs. In addition, scatter plots are quite useful in bringing forth the artifactual changes created by gross outlier values, which can then be filtered out.
The plots are based on values pre-computed for Multiplot by the MultiplotPreprocess module. These include quantifications of all genes’ values within each class (mean expression value, coefficient of variation), but also pairwise comparisons between all classes in the dataset (ie ratio of expression, p-value).
The user can control the style and scale of the plot and can select data points (chip probes). A “dashboard” indicates what values and classes are shown and how they are filtered or highlighted.
The Multiplot module uses data created by the MultiplotPreprocess module as input. It allows the user to make two-dimensional scatter plots of many expression metrics (class mean, fold change, coefficient of variation, Student’s p-value, and Hochberg-adjusted p-value). Features include:
The following screen capture and caption boxes demonstrate how a user could utilize Multiplot. Here, the comparison of “B6 WT” versus “B6 KO” is plotted, but highlights add more information.
In order to run Multiplot, you must first run the MultiplotPreprocess module, found in Analysis -> Preprocess & Utilities -> MultiplotPreprocess. Drag in the data file (.GCT) and .CLS file that you want to plot. Then choose the options for data generation:
Press “Run.” This can take a few minutes. A folder should pop up in the “Results” window. If you open it, you will see a zip file. This will be used to give Multiplot the data to plot.
In GenePattern, select the “Visualization” menu, and then select “Multiplot.” This will bring up a screen similar to the one below. In the “Results” window, open the folder called “MultiplotPreprocess.” It contains the results of the run of MultiplotPreprocess, which includes a few files, including a “____.zip” file. Drag the zip file to the “input file name” box. Multiplot also allows you to set:
Press “Run.” It can take a couple minutes to start. There’s no need to press “Run” more than once.
First choose from the Plot Type options. This will help narrow down the choices of what data types to plot on the X and Y axes. Click on the dropdown and you will see the following choices:
The choices depend on what you select for MultiplotPreprocess. If you selected to create replicate vs replicate fold changes, then “Replicate vs Replicate Fold Change” will be an option (Note: Replicate fold change can only be plotted within this plot subset. These factors do not show up in the Freestyle setting because there are usually very many of them).
Once you have chosen a plot type, choose which data to plot on the X and Y axes. Data types include:
To select the X and Y factors, first select the data type, If, for example, you choose to run Pvalue vs Fold Change (Volcano) plots, for X you can choose only the Fold Change data type, and for Y you can choose between two types of measures of statistical difference—either the T-test value or the Hochberg T-test value, with adjustment for multiple testing. Once you’ve chosen a data type, choose which classes you want that data type for. For example, if you want to plot the ratio of class A to class B, select “Fold Change” from the first dropdown, then select A from the second, and B from the third. Then click the “Plot” button. A plot can only be created if both X axis and Y axis are selected.
You can also customize the axis scale options, or you can leave these to the default. They should default to the commonly used settings. For example, it makes sense to plot expression values on a logarithmic scale, so Log is automatically checked once you select to plot expression values.
You must press Plot for the plot to be redrawn. Any changes to any settings will only show up in the plot after pressing this button.
Auto lines are lines that demarcate a level of differential expression. For example, if you were to compare expression value betwee two classes, drawing lines parallel to the line y=x allows you to see that points beyond these lines have a certain differential expression level.
In addition, by default Multiplot counts the number of points that lie outside of these lines of differential expression.
To add these lines to your plot, click on “Create Auto Lines...”, then select the fold change level you wish to draw, and the color. Press “Add” once you’ve made your selections. When you press plot again, the lines will show up.
Note that the lines drawn depend on the Plot Type you have selected. For a Fold Change vs Fold Change plot, for example, the lines will look like this:
Thus, you must have a Plot Type besides Freestyle selected for these lines to appear on your plot.
You can select a subset of genes to either use as a filter, or as a highlight. Filter means that you are filtering out genes that do not meet your set of criteria. Highlight a set of genes to plot their points in a different color, shape, or size, with a legend item title (specified in the “Name” box).
Create a filter or a highlight by pressing the “Add Filter...” or “Add Highlight...” buttons. A box will pop up that allows you to specify the criteria a gene must meet to be included in this filter or highlight. Criteria you can choose include:
Use this criterion selection to choose genes based on their expression values in your data, or in derived data. For example, you can choose to only include genes where the fold change (ratio) of the value in condition A versus condition B is greater than a certain number or less than a certain number.
To create the specifications for the criterion, choose the value you are interested in, and the thresholds, and press “Add”. Note that you can combine multiple data types to make a more complex criterion. For example, you could look for genes with a high fold change in numerous comparisons by successively choosing selection criteria and pressing the “Add” button. You can further control selection by toggling the “Probes can meet any condition” or “Probes must meet all conditions” buttons. In the example below, only genes where the Fold change of 7R2/7R1 is greater than 2 or less than .5 AND the Fold change in ON/7R1 is greater than 2 or less than .5 will be selected. These genes will be highlighted as blue circles.
You can delete one of your conditions by pressing the “X” on the left side. When you are finished, press OK. You can always come back to edit the criterion later.
If you have a list of probes, perhaps from another experiment, you can select these probes within your data. Go to the “Select Genes in List” option within the “Add Filter/Highlight Criteria...” dialog.
To use this feature, you need to create a plain text file with one probe per line. Then use the “Browse...” button to select that file on your computer. Press OK to finish creating the criterion.
Filter and Highlight display
Once you have added a criterion, either filter or highlight, pressing OK to close the pop up box, it will show up in the main display. An example can be seen in the display snippet below, where the user has selected to highlight all genes where the Coefficient of Variation of class O71 is less than .5.
As you add more highlights or filters, more rows will be added to the Highlight or Filter boxes.
For each row (each specific highlight or filter), there are a number of pieces of information and controls for that criterion.
One last feature of the Filter and Highlight display is the toggle button at the top of the filter display that allows you to choose how multiple filters will be used. If you choose to unite them so that “genes must match: any filter”, then the results can be quite different from choosing that “genes must match: all filters.”
Using filters, you can create a subset of genes that you find most interesting. One potential use of this subset is to save this data and use it in other programs. Click on the “Save Data for Plotted Probes” button to do this. This will bring up the dialog below:
Use this dialog to choose what data “columns” you want to save. For example, you could choose to save Fold Changes of condition X versus every experimental condition, and you could choose to save that in your “My Documents” folder. To do this, perform these steps:
Step 1) Select data type to save:
Click on the dropdown, and select “Fold Change” as the data type you want to save.
Step 2) Select which columns of this type:
You can choose to save all columns, ie every Fold Change that has been created from your data. If you wish to specify columns, change the toggle to “specified columns of this type.” Then you can choose specific fold changes to save. You can either pick one fold change or pick “ALL CLASSES” for one of the selections. In the example above, you will save 7R1 versus every other condition.
Step 3) “Add these columns to save!”
Press this button to add the selection. You can do this numerous times to create your desired output data. If you want to remove one of your selections, press the “X” button next to it.
Step 4) Choose save folder
Pressing this will bring up a dialog where you can choose your save folder. The files will be automatically named and stored in that folder.
Finally, press the Do Save button to create the files.
The plot created is interactive. Use your mouse to access the following features:
When data points are selected from the graph, the points will show up in the information table. This table shows the Probe Name and Description values from your GCT file. It also shows the X and Y values of the probe at the time that it is added to the table.
There are a few options at the Menu Bar on top of the window.
|input file name *||File created by MultiplotPreprocess (.zip)|
|number of plots *||Smaller one-plot visualizer or larger two-plot|
* - required
|2||2013-10-21||Updated for Java 7|