The QC module step by step and its function description

arrayanalysis.org - affyAnalysisQC module - document version: 1.0.0

Table of Contents

[Description of the excecution steps]
[Summary table of the functions]
[References]
[Scripts and functions description]

This section contains a description of run_affyAnalysisQC.R script, which is the core script of the module. Indeed this script is used by both available versions (on-line, and R fucntion)

Description of the excecution steps

After loading the required Bioconductor libraries, a reload function is defined and called, to load all needed scripts to the system. These scripts are in functions_processing.R and functions_images.R files.
After that step, the dataset is loaded using the ReadAffy (affy) function. Note that in the entire script some checks are done on parameters that have been set (e.g. before loading the data there is a check whether it already exists). These checks seem superfluous but are related to the fact that the same script is run in automated calls from the arrayanalysis.org webserver form. This will lead to some values already be set or defined, which is checked for that reason.

After loading the data, a cdf annotation is loaded for the data, in case this has not already been done when reading the data (c.f. addStandardCDFenv function). Then the type of array: perfect match and mismatch probes or mismatch probes only, is determined (c.f. the getArrayType function).

Next the description file (as given in arrayGroup parameter) is loaded and relevant variables are set. In case no file is given, default names (the CEL file names) are used, and no grouping is assigned. Also a vector of colours is created for the arrays, and another for the groups (c.f. colorsByFactor function).

After setting the data and relevant variables, a cover sheet png file is created. This can be used as an ‘opening image’ when all images are shown in a presentation. Also one or more images representing the description file, 35 array descriptions per image, are provided for reference (c.f. coverAndKeyPlot function)

Then, when any image requiring these data objects is requested, a qc object (created with simpleaffy ) and some variables based on a yaqc object (created with yaqcaffy) are computed. This is done in the main script, as these computations are relatively intensive, and the objects can better be passed to the several functions needing them, instead of being computed anew within each function (though they are prepared to do so, when the objects are not passed). Then a table of some QC statistics is produced and saved to png file (c.f. plotQCtable function)

Thereafter the script starts plotting the QC images that have been requested by the user (c.f. the several functions described below).
All images generated by affyAnalysisQC are png formatted and their dimensions are closed to A4 sheet format. These dimension are set by two variables, WIDTH and HEIGHT, used by all functions returning images, and set by default to 1000 by 1414. The variable POINTSIZE is set by default to 24 and is adapted to the 1000x1414 format.
Another variable, MAXARRAY, is used to optimise the picture according to the number of arrays in the dataset. It is set by default to 41 and is used in most of the functions.

After plotting all raw data QC images, the data object is normalized using the method indicated by the user (or if this is not suitable for the array type, using a similar applicable method, in which case a warning is given). Normalization can be done for the whole dataset (generally applicable) or per experimental group (in specific cases, e.g. overall differences expected between the groups). Normalization is run using a data object annotated with either the standard affy cdf file, or an updated custom cdf file by BrainArray (preferred and proposed by default, c.f. normalizeData function). Hereafter, QC images of the normalized data are plotted.

A final step is the saving of the normalized data table. First a table suited for saving is created, to which some annotation is added in specific cases (c.f. createNormDataTable function). Thereafter this table is saved to a tab delimited text file, e.g. for viewing in Excel or as input for further computational or evaluative tools.

Summary table of functions and used libraries and calls

Now each of the functions in functions_processing.R and functions_images.R scripts will be described in a structured format. First a brief description of what the function does, and possible relevant considerations, is given e.g a link to the description of the module for explanations on the plots and their interpretation.
Next, the default call to the function by the run_affyAnalysisQC.R script is given.
Finally a table listing all INPUT PARAMETERS

(name, type, required or not, description, default), and output values or images is provided.

The functions are divided into six sub-sections. The following tables present the functions of each category and give the main BioConductor librairies and function calls made by these functions:

1) Preparation of the data

FUNCTIONLIBRARIESNOTED CALLS
addStandardCDFenvaffygetCdfInfo
getArrayTypeaffymm
colorsByFactor//
coverAndKeyPlot//
plotQCtablesimpleaffy, yaqcaffyqc, detection.p.val, yaqc

2) Control of the Sample Quality

FUNCTIONLIBRARIESNOTED CALLS
samplePrepPlotsimpleaffy, yaqcaffydetection.p.val, yaqc
ratioPlotsimpleaffyqc
RNAdegPlotaffyAffyRNAdeg, plotAffyRNAdeg

3) Hybridization and overall signal quality

FUNCTIONLIBRARIESNOTED CALLS
hybridPlotsimpleaffyqc
backgroundPlotsimpleaffyqc
percPresPlotsimpleaffyqc
computePMAtablesimpleaffydetection.p.val
PNdistrPlotaffyQCReportborderQC1
controlPlotsaffy, ArrayTools 

4) Signal comparability and biases diagnostic

FUNCTIONLIBRARIESNOTED CALLS
scaleFactPlotsimpleaffyqc
boxplotFunaffyboxplot
densityFunaffyhist
densityFunUnsmoothedaffyexprs
maFunaffyMAplot
plotArrayLayoutaffyexprs
PNposPlotaffyQCReportborderQC2
spatialImagesaffyPLMfitPLM, image
array.imageaffyexprs
nuseFunaffyPLMfitPLM, NUSE
rleFunaffyPLMfitPLM, Mbox

5) Correlation between arrays

FUNCTIONLIBRARIESNOTED CALLS
correlFunaffyQCReportcorrelationPlot
pcaFunaffy, baseexprs, prcomp
clusterFunaffy, base, bioDistexprs, hclust, cor.dist, spearman.dist, euc

6) Re-annotation and normalization

FUNCTIONLIBRARIESNOTED CALLS
deduceSpecies//
normalizeDataaffy, gcrma, pliermas5, rma, gcrma, justPlier
addUpdatedCDFenvaffygetCdfInfo
createNormDataTableaffy, biomaRtexprs, useMart, getBM

Sub-sections 2) to 5) are also present in the module description; they describes the functions returning plots and quality control indicators. For these sub-sections, you will find links between technical documentation and the module description.

References for the packages and databases used by affyAnalysisQC:

BioConductor
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, and Zhang J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10):R80 Full text

[affy] package contains functions for exploratory oligonucleotide array analysis.
Authors: Rafael A. Irizarry, Laurent Gautier, Benjamin Milo Bolstad and Crispin Miller

[affycomp] package contains functions to compare expression measures for Affymetrix arrays.
Authors: Rafael A. Irizarry and Zhijin Wu with contributions from Simon Cawley

[affypdnn] package contains functions to perform the PDNN method described by Li Zhang et al.
Authors: H. Bjorn Nielsen and Laurent Gautier.

[affyPLM] package extends the base affy package, mainly by implementing methods for fitting probe-level models.
Author: Ben Bolstad

[affyQCReport] package creates a QC report for an AffyBatch.
Authors: Craig Parman, Conrad Halling , Robert Gentleman

[simpleaffy] package provides high level functions for reading CEL files, phenotypic data, and then computing simple things with it.
Author: Crispin J Miller

[yaqcaffy] package computes Quality control of Affymetrix GeneChip expression data with the MAQC reference datasets.
Author: Laurent Gatto

[ArrayTools] package provides solutions for quality assessment and detection of differentially expressed genes for Affymetrix arrays.
Authors: Xiwei Wu, Arthur Li

[bioDist] package offers a collection of software tools for calculating distance measures.
Authors: B. Ding, R. Gentleman and Vincent Carey

[biomaRt] package enalbles an easy access to biological databases implementing the BioMart software suite.
Authors: Steffen Durinck, Wolfgang Huber

Brainarray
Manhong Dai, Pinglang Wang, Andrew D. Boyd, Georgi Kostov, Brian Athey, Edward G. Jones, William E. Bunney, Richard M. Myers, Terry P. Speed, Huda Aki, Stanley J. Watson and Fan Meng. (2005) Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip Data. Nucleic Acid Research 33 (20), e175 Full text

BioMart
Haider S, Ballester B, Smedley, D, Zhang J, Rice P, Kasprzyk A. BioMart Central Portal-unified access to biological data. Nucliec Acids Res. 2009 July 1;37(Web Server issue):W23-7. Full text

[Top]


Preparation of the data

The addStandardCDFenv function

DESCRIPTION
This function (from functions_processing) makes sure that a cdf environment is loaded for the current chiptype. In some cases a cdf environment will already be available after reading the data with the ReadAffy function, then the function will detect this and return the object as is (unless overwrite is set to TRUE). In case no cdf environment has been assigned, it will try to search for a suitable one, and add this if found. If no suitable cdf can be found, a warning will be generated and the object returned as is.

USAGE
By default, the script will call:

rawData <- addStandardCDFenv(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
overwritelogicaloptionalShould the cdfName be overwritten if
there is already a value assigned to
the object passed to the function
FALSE

OUTPUT VALUE

TYPEDESCRIPTION
AffyBatchThe object with a cdf annotation assigned if found

The getArrayType function

DESCRIPTION
This function (from functions_processing) detects whether the chip at hand is a classic chiptype with perfect match and mismatch probes, or one with perfect match probes only.

USAGE
By default, the script will call:

aType <- getArrayType(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTION
DataAffyBatchrequiredThe raw data object

OUTPUT VALUE

TYPEDESCRIPTION
characterString indicating the type of the current chip, either
“PMMM” for chips with perfect match and mismatch probes,
or “PMonly” for chips with perfect match probes only

The colorsByFactor function

DESCRIPTION
This function (from functions_processing) creates a list with two elements: a vector of colors, one for each array and a vector of one representative color for each group within the experiment, for use in legends. The colors are based on groups present in the dataset (as provided by the user in experimentFactor). If there is only one group, colors are chosen randomly over the rainbow palette. Otherwise arrays belonging to the same group get different shades of a similar color.

USAGE
By default, the script will call:

colList <- colorsByFactor(experimentFactor)
plotColors <- colList$plotColors
legendColors <- colList$legendColors
rm(colList)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTION
experimentFactorfactorrequiredThe factor of groups

OUTPUT VALUE

TYPEDESCRIPTION
list  A list with two fields: plotColors contains a vector
of colors, one for each array; legendColors contains
a representative color for each group, for use in legends

The coverAndKeyPlot function

DESCRIPTION
This function (from functions_images.R) plots a cover sheet, and one or more key sheet indicating the links between CEL file names, array names used in the plots, and to which experimental group they belong based on the description file provided by the user, which has been loaded into the description variable earlier in the main script, passed as an argument (arrayGroup). For the key sheets, one sheet will be created for every 35 arrays in the experiment.

USAGE
By default, the script will call:

coverAndKeyPlot(description)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
descriptiondata.framerequiredThe data.frame containing the description file
information (column 1: file names; column 2: names
to be used in the plots; column 3: experimental
groups the samples belong to)
 
refNamecharacteroptionaldataset name. It is deduced from the name of the
zip file containing the CEL files when used from
arrayanalysis.org.
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414

OUTPUT IMAGES

TYPEDESCRIPTION
png fileFile called ‘Cover_1’ that can be used as an opening image
png file(s)File(s) called ‘Description’, if needed followed by an index number,
that represent(s) the information as given in the input parameter.

The plotQCtable function

DESCRIPTION
This function (from functions_images.R) computes and plots a table of QC statistics based on the qc (simpleaffy Bioconductor package) and yaqc (yaqcaffy Bioconductor package) functions, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.

USAGE
By default, the script will call:

computeQCtable(rawData, quality, sprep, lys, samplePrep = samplePrep, ratio = ratio, hybrid = hybrid, percPres = percPres, bgPlot = bgPlot, scaleFact = scaleFact)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function (simpleaffy).
When not provided, it is computed within the function.
NULL
sprepmatrixoptionalA matrix of 3’probe intensities for dap, thr, lys, and
phe, taken from an object of class YAQCStats (yaqc
function, yaqcaffy). When not provided, it is computed
within the function.
NULL
lysmatrixoptionalMatrix of A, M, P calls for the 3’ probeset of Lys on
each array, based on results from the detection.p.val
function (simpleaffy) . When not provided, it is computed
within the function.
NULL
samplePreplogicaloptionalDoes the table have to contain sample preparation QC
statistics?
TRUE
ratiologicaloptionalDoes the table have to contain 3’/5’ ratio statistics?TRUE
hybridlogicaloptionalDoes the table have to contain hybridisation QC
statistics?
TRUE
percPreslogicaloptionalDoes the table have to contain percentage present QC
statistics?
TRUE
bgPlotlogicaloptionalDoes the table have to contain background signal
intensity QC statistics?
TRUE
scaleFactlogicaloptionalDoes the table have to contain scale factor QC statistics?TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE

TYPEDESCRIPTION
png fileFile called ‘QCtable’ with the requested statistics

[Top]


Control of the Sample Quality

The samplePrepPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of sample preparation controls based on the yaqc (yaqcaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

samplePrepPlot(rawData,sprep,lys,plotColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
sprepmatrixoptionalA matrix of 3’probe intensities for dap, thr,
lys, and phe, taken from an object of class
YAQCStats (yaqc function, yaqcaffy). When not
provided, it is computed within the function.
NULL
lysmatrixoptionalMatrix of A, M, P calls for the 3’ probeset of
Lys on each array, based on results from the
detection.p.val function (simpleaffy) . When not
provided, it is computed within the function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of sample preparation controls, called ‘RawDataSamplePrepControl’

The ratioPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of beta-actin and GAPDH 3'/5' ratios based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

ratioPlot(rawData, quality=quality, experimentFactor, plotColors, legendColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGES

TYPEDESCRIPTION
png imagAn image of 3’/5’ratios for beta actin, called ‘RawData53ratioPlot_beta-actin’
png imageAn image of 3’/5’ ratios for GAPDH, called ‘RawData53ratioPlot_GAPDH’

The RNAdegPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of overall RNA degradation for all arrays. It calls the function AffyRNAdeg (affy Bioconductor package).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

RNAdegPlot(rawData,plotColors=plotColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.rnadeglistoptionalList as obtained by calling the AffyRNAdeg function
(affy). When not provided, it is computed internally
within this function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGES

TYPEDESCRIPTION
png imageAn image of overall RNA degradation, called ‘RawDataRNAdegradationPlot’

[Top]


Hybridization and overall signal quality

The hybridPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of hybridisation controls based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

hybridPlot(rawData,quality=quality,plotColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function (simpleaffy).
When not provided, it is computed within the function.
NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of hybridisation controls, called ‘RawDataSpike-inPlot’

The backgroundPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of the background intensities and deviations for each array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

backgroundPlot(rawData, quality=quality, experimentFactor, plotColors,legendColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of background intensities, called ‘RawDataBackgroundPlot’

The percPresPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of the percentage present values of each array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

percPresPlot(rawData, quality=quality, experimentFactor, plotColors, legendColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of percent present values, called ‘RawDataPercentPresentPlot’

The computePMAtable function

DESCRIPTION
This function (from functions_processing) computes a table of Absent (A), Marginal (M), Present (P) calls based on the MAS5 function (affy package). This function will only work for chiptypes that have mismatch probes and for which the detection.p.val ( simpleaffy Bioconductor package) function that it calls works. A try construction will be used and if no table can be created a warning is given.
Note that this function will always use the MAS5 algorithm, regardless of the normalization method used in the normalizeData function. In case customCDF is TRUE, annotation is updated using BrainArray custom cdf environments, before proceeding with the normalization (and summarisation of probe expressions into probeset expressions). To update the annotation, a sub call is made to the addUpdatedCDFenv function.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call, if customCDF is TRUE:

PMAtable <- computePMAtable(rawData,customCDF,species,CDFtype)

or, if customCDF is FALSE:

PMAtable <- computePMAtable(rawData,customCDF)

After this call, the main script will save the result to a tab-delimited text file, called PMAtable.txt

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
customCDFlogicaloptionalShould annotation of the chip be updated before
computing the calls (and building the probesets
out of the separate probes)? If requested, this
is done using BrainArray updated cdf environments,
c.f. addUpdatedCDFenv.
TRUE
speciescharacterRequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The species associated with the chip type. NULL
CDFtypecharacterRequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The type of custom cdf requested. NULL

OUTPUT VALUE

TYPEDESCRIPTION
data.frameTable called ‘PMAtable’ containing the PMA values for each probeset and array

The PNdistrPlot function

DESCRIPTION
This function (from functions_images.R) creates an image with boxplots of positive and negative control intensities for each array. It calls the borderQC1 function (affyQCReport Bioconductor package).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

PNdistrPlot(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE

TYPEDESCRIPTION
png imagAn image of boxplots of positive and negative control intensities,
called ‘RawDataPosNegDistribPlot’

The controlPlots function

DESCRIPTION
This function (from functions_images.R) creates images of the expression values of the affx controls, if present. One image shows the expression profiles over all samples, for each control separately. The other image shows the boxplots of all controls (together), for each sample.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

controlPlots(rawData,plotColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
experimentFactorfactorrequiredThe factor of groups.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental group.NULL
affxplotslogicaloptionalDoes the AFFX expression profiles have to be plotted?TRUE
boxplotslogicaloptionalDoes the AFFX and other controls boxplots have to be
plotted?
TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of arrays41

OUTPUT IMAGES

TYPEDESCRIPTION
png imageAn image of the expression profiles of the affx controls,
called ‘Profiles_affx_controls’
png imageAn image of the boxplots of the affx controls, called
‘Boxplots_affx_controls’

[Top]


Signal comparability and biases diagnostic

The scaleFactPlot function

DESCRIPTION
This function (from functions_images.R) creates an image of the scale factors of the array based on the qc (simpleaffy Bioconductor package) function, which generally only work for chiptypes with perfect match and mismatch probes, but even not for all of those. As such, when these statistics are not provided as parameters, trys are used in this function to compute them internally. Values for which the try fails are not computed, but the script will continue after giving a warning.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

scaleFactPlot(rawData, quality=quality, experimentFactor, plotColors,legendColors)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
qualityQCStatsoptionalobject obtained by calling the qc function
(simpleaffy). When not provided, it is computed
within the function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of scale factors, called ‘RawDataScaleFactorsPlot’

The boxplotFun function

DESCRIPTION
This function (from functions_images.R) creates an image with intensity boxplots for all arrays in the raw or normalized dataset (depending on the object passed). It calls the boxplot function applied to an AffyBatch object.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

boxplotFun(Data=rawData, experimentFactor, plotColors, legendColors)

and after normalization:

boxplotFun(Data=normData, experimentFactor, plotColors, legendColors, normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of the raw or normalized boxplots of the arrays, called ‘DataBoxplot’

The densityFun function

DESCRIPTION
This function (from functions_images.R) creates an image with a density curve of the intensities for all arrays in the raw or normalized dataset (depending on the object passed).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

densityFun(Data=rawData, plotColors)

and after normalization:

densityFun(Data=normData, plotColors, normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of the raw or normalized density plots of the arrays, called ‘DensityHistogram’

The densityFunUnsmoothed function

DESCRIPTION
This function (from functions_images.R) creates an image with an unsmoothed density curve of the intensities for all arrays in the raw or normalized dataset (depending on the object passed).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will not call this function, it could be called as such before normalization:

densityFunUnsmoothed(Data=rawData, plotColors)

and after normalization:

densityFunUnsmoothed(Data=normData, plotColors, normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
plotColorscharacterrequiredVector of colors assigned to each array.NULL
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of the raw or normalized unsmoothed density plots of the arrays,
called ‘DensityHistogramUnsmoothed’

The maFun function

DESCRIPTION
This function (from functions_images.R) creates MA plots for each array versus the median array for the raw or normalized dataset. The median array is computed for the whole data set (if perGroup is FALSE) of per experimental group (perGroup is TRUE). In the script this setting will depend on the setting op the MAOption1 parameter, which can have the values “dataset” or “group”.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

maFun(Data=rawData, experimentFactor, perGroup=(MAOption1=="group"), aType=aType)

and after normalization:

maFun(Data=normData, experimentFactor, perGroup=(MAOption1=="group"), normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequired
when
perGroup
is TRUE
The factor of groups.NULL
perGrouplogicaloptionalAre MA plots to be made for each experimental
group separately or not?
FALSE
normMethcharacterrequired
when
Data is a
normalized
data object
String indicating the normalization method used
(see normalizeData function for more information
on the possible values).
“”
aTypecharacteroptionalString indicating the type of the current chip,
either “PMMM” for chips with perfect match and
mismatch probes, or “PMonly” for chips with
perfect match probes only.
Required when Data is a raw data object.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE(S)

TYPEDESCRIPTION
png image(s)Images of the MA plots of each array versus the median array, each file
contains MA plots for six arrays. The files contain the string ‘MAplot” and a
number if more than one are needed; in case of groupwise computation, the name
of the group is also included in the filename.

The plotArrayLayout function

DESCRIPTION
This function (from functions_images.R) creates an image of the layout of the current chiptype. Thus, this plot does not plot any data, but shows where control and regular perfect match (PM) and (if applicable) mismatch (MM) probes are present on the array. Note: due to resolution issues, banding may seem different from the real situation, e.g. normally on classical chiptypes, PM and MM are present in alternate lines, but patterns may appear due to image resolution. This function tries to load annotation libraries, depending on the chiptype.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

plotArrayLayout(rawData,aType)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
aTypecharacterrequiredString indicating the type of the current chip, either
“PMMM” for chips with perfect match and mismatch probes,
or “PMonly” for chips swith perfect match probes only.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of the expression profiles of the affx controls, called ‘Array_layout_plot’

The PNposPlot function

DESCRIPTION
This function (from functions_images.R) creates an image showing the centres of intensity of the positive and negative border elements for each array. It calls the borderQC2 function ( affyQCReport Bioconductor package).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

PNposPlot(rawData)

INPUT PARAMETER

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of centres of intensity of positive and negative border
elements for each array, called ‘RawDataPosNegPositionPlot’

The spatialImages function

DESCRIPTION
This function (from functions_images.R) creates an image per array, containing one to four spatial images. For any but the raw images, an object obtained by calling fitPLM (affyPLM Bioconductor package) is used. If this object is not provided, it will be computed internally within this function.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
There are two default calls by the script:

1/ Compute only the images showing the residuals of the PLM (if spatialImage parameter is TRUE):
spatialImages(rawData, Data.pset=rawData.pset, Resid=TRUE, ResSign=FALSE, Raw=FALSE, Weight=FALSE)

2/ Compute the four images for all arrays (if PLMimage parameter is TRUE):
spatialImages(rawData, Data.pset=rawData.pset)

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM),
used for each but the raw plot. When not provided,
it is computed within the function.
NULL
ResidlogicaloptionalShould a residual plot be made?TRUE
ResSignlogicaloptionalShould a residual sign plot be made?TRUE
RawlogicaloptionalShould a raw plot be made?TRUE
WeightlogicaloptionalShould a weight plot be made?TRUE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE(S)

TYPEDESCRIPTION
png image(s)Images containing each of the requested plots, one file per array. Naming
is ‘virtual_image’ followed by the (sample)name of the array.

The array.image function

DESCRIPTION
This function (from functions_images.R) creates several types of spatial images. It does not make use of a PLMset object and is called when the PLM images cannot be computed due to time or memory constaints. Also, it offers more flexibility. By default, it creates a relative intensity plot versus the median array on a blue to red colour scale. Note that the median array will always be computed based on the complete data set, also when plots for a subset of arrays are requested. If the median array has to be computed on subsets (e.g. on experimental group), a data object with only those arrays can be provided to the function (without setting the arrays parameter). The color ranges will saturate at pcut percentage(s) of the data range. The color ranges can be modified by tuning col.mod. By default, for the relative plots, the arrays are first balanced for their overall intensity and a symmetrical color range is used. When there is less than 6 arrays, the median array is not used. Intensities are plotted using a virtual symetric color scale, from blue to red. [Description of the OUTPUT IMAGE and its interpretation]

USAGE
There are two default calls by the script:

1/ relative intensity plot versus the median array on a blue to red colour scale:
array.image(rawData)

2/ absolute intensity plot when there are less than 6 arrays in the dataset:
array.image(rawData,relative=FALSE,col.mod=4,symm=TRUE)

Other calls could be made, such as:

# spatial intensity plot on a virtual colour scale (red to yellow)
array.image(rawData,relative=FALSE)

# signs of the relative intensities versus the median array (e.g. lower or higher) with blue as lower and red as higher.
array.image(rawData,quantitative=FALSE)

# similar to the default plot, but not balanced within arrays
array.image(rawData,balance=FALSE)

# or: per experimental group
for (i in levels(experimentFactor)) {
  array.image(rawData[, experimentFactor == i], postfix = paste("_",i), balance=FALSE)
}

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
pcutnumericoptionalEither a numeric value or a vector of two numeric
values, both in the interval [0, 0.5], used a
color saturation limits as a percentage of the
data range. If one value is provided, this is
taken both a lower and upper saturation limit.
NULL. This means that
for relative plots a
value of 0.001 will
be used, and for other
plots values of
c(0.01, 0.05).
relativelogicaloptionalIs the plot to be created a plot of each array
relative to the median array, or not (i.e. an
absolute plot)?
TRUE
symmlogicaloptionalShould a symmetric color scale be used? TRUE if "relative" is
TRUE, FALSE otherwise.
balancelogicaloptionalShould the arrays first be balanced for their
average intensities?
TRUE if "relative" is
TRUE, FALSE otherwise.
quantitativelogicaloptionalShould the plot be quantitative or qualitative
(i.e. only indicate the sign of the value)?
TRUE if "relative"is
TRUE, FALSE otherwise.
Setting "quantitative"
to TRUE has no effect
if "relative" is FALSE,
a warning will be
produced.
col.modnumericoptionalA numeric value used as a modifier for the
color range. A value of 1 means no modification
(linear), smaller leads to faster saturation,
larger to slower saturation.
1
postfixcharacteroptionalString to be attached to the file names produced.""
arraysnumericoptionalWhich arrays are to be plotted. NOTE: for
relative plot types, the median array is still
computed using all arrays in the dataset.
NULL (in which case
all arrays are plotted)
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE(S)

TYPEDESCRIPTION
png image(s)Images containing each of the requested plots, one file per six arrays. Naming is
‘virtual_array_plots’ followed by the postfix given by the user.

The nuseFun function

DESCRIPTION
This function (from functions_images.R) creates an image with boxplots of normalized Unscaled Standard Errors (NUSE) for each array. It calls the NUSE function (affyPLM Bioconductor package) .
An object obtained by calling fitPLM (affyPLM) is needed. If this object is not provided, it will be computed internally within this function.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

nuseFun(rawData, Data.pset=rawData.pset, experimentFactor, plotColors, legendColors

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM).
When not provided, it is computed within the
function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image containing boxplots of the NUSE values per array, called ’RawDataNUSEplot’

The rleFun function

DESCRIPTION
This function (from functions_images.R) creates an image with boxplots of Relative Log Expression (RLE) values for each array. It calls the RLE function (affyPLM Bioconductor package) .
An object obtained by calling fitPLM (affyPLM) is needed. If this object is not provided, it will be computed internally within this function.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, the script will call:

rleFun(rawData, Data.pset=rawData.pset, experimentFactor, plotColors, legendColors

where rawData.pset has been constructed by calling:

rawData.pset <- fitPLM(rawData)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
Data.psetPLMsetoptionalAn object obtained by calling fitPLM (affyPLM).
When not provided, it is computed within the
function.
NULL
experimentFactorfactorrequiredThe factor of groups.NULL
plotColorscharacterrequiredVector of colors assigned to each array.NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image containing boxplots of the NUSE values per array, called ’RawDataRLEplot’

[Top]

Correlation between arrays

The correlFun function

DESCRIPTION
This function (from functions_images.R) creates an image of the intensity correlation values of the arrays in the raw or normalized dataset (depending on the object passed). It calls correlationPlot ( affyQCReport Bioconductor package).
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

correlFun(Data=rawData)

and after normalization:

correlFun(Data=normData, normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData for more information on the
possible values).
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image of the intensity correlation values of the arrays,
called ‘DataArrayCorrelationPlot’

The pcaFun function

DESCRIPTION
This function (from functions_images.R) creates a Principal Component Analysis (PCA) plot of the arrays in the raw or normalized dataset (depending on the object passed). When the dataset consists of less than three arrays, no PCA plot is generated and a warning is given. Before computing the PCA each probeset’s expression values are centred on zero. If scaled_pca is TRUE, they will also be rescaled to unit variance. When the maximum length of an array (sample)name is ten characters, and there are no more than 16 samples, the array (sample)names are put within the plot, otherwise they are put in the legend.
Since computing a PCA (using the prcomp function) can be memory intensive, a try is used. Furthermore, in cases where scaling is not possible due to loss of any variation, a second attempt is done using no scaling (when scaled_pca had been set to TRUE), and a warning is given. When no PCA can be computed the image is not created, and a warning is given.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

pcaFun(Data=rawData, experimentFactor=experimentFactor,
plotColors=plotColors, legendColors=legendColors,
namesInPlot=((max(nchar(sampleNames(rawData)))<=10)&&
(length(sampleNames(rawData))<=16))


and after normalization:

pcaFun(Data=normData, experimentFactor=experimentFactor,
normMeth=normMeth, plotColors=plotColors,
legendColors=legendColors,
namesInPlot=((max(nchar(sampleNames(rawData)))<=10)&&
(length(sampleNames(rawData))<=16))

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
experimentFactorfactorrequiredThe factor of groups.NULL
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method
used (see normalizeData for more information
on the possible values).
""
scaled_pcalogicaloptionalShould each probeset’s expression be scaled
to unit variance before proceeding? Note that
the expression is centred on zero in any case.
TRUE
plotColorscharacterrequiredVector of colors assigned to each array. NULL
legendColorscharacterrequiredVector of colors assigned to each experimental
group.
NULL
namesInPlotlogicaloptionalShould the array (sample)names be put within
the plot, or in the legend?
FALSE
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image with PCA plot of the arrays. Naming is either ‘Raw’ or
the normalization method, followed by ‘DataPCAanalysis’

The clusterFun function

DESCRIPTION
This function (from functions_images.R) creates a hierarchical clustering dendrogam of the arrays in the raw or normalized dataset (depending on the object passed). When the dataset consists of less than three arrays, no dendrogram is generated and a warning is given.
[Description of the OUTPUT IMAGE and its interpretation]

USAGE
By default, before the normalization the script will call:

clusterFun(Data=rawData, clusterOption1=clusterOption1, clusterOption2=clusterOption2)

and after normalization:

clusterFun(Data=normData, clusterOption1=clusterOption1, clusterOption2=clusterOption2, normMeth=normMeth)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatch or
ExpressionSet
requiredThe raw or normalized data object 
clusterOption1characteroptionalString indicating the distance function to be
used. Possible values are “pearson”, “spearman”,
or “euclidean”.
“Pearson”
clusterOption2characteroptionalString indicating the hierarchical clustering
function to be used. Possible values are "ward",
"single", "complete", "average", "mcquitty",
"median" or "centroid".
“ward”
normMethcharacterrequired when
Data is a
normalized
data object.
String indicating the normalization method used
(see normalizeData for more information on the
possible values).
""
WIDTHnumberoptionalpng image width1000
HEIGHTnumberoptionalpng image height1414
POINTSIZEnumberoptionalpng image point size24
MAXARRAYnumberoptionalthreshold to adapt the image to the number of
arrays
41

OUTPUT IMAGE

TYPEDESCRIPTION
png imageAn image with the clustering dendrogram of the arrays.
Naming is ‘DataCluster’ followed by the name of the distance function used

[Top]

Re-annotation and normalization

The deduceSpecies function

DESCRIPTION
This function (from functions_processing) tries to determine the species related to the current chiptype, if the species has not been provided by the user (and as such is set to ""). If the descr parameter is not provided or is empty, an empty string is returned as species. In other cases the function tries to load an annotation library depending on the chiptype to find the species. If not successful, it will be set by hand for some predefined chiptypes. If still not successful, the empty string is returned.

USAGE
By default, the script will call (if customCDF is TRUE and species is ""):

species <- deduceSpecies(rawData@annotation)

INPUT PARAMETER

NAMETYPESTATUSDESCRIPTIONDEFAULT
descrcharacterrequiredA string indicating the chiptype, which can be obtained
by getting the @annotation slot from an AffyBatch object.
NULL

OUTPUT VALUE

TYPEDESCRIPTION
characterThe species associated with the current chiptype, or ""
if detection was unsuccessful

The normalizeData function

DESCRIPTION
This function (from functions_processing) normalizes the data in the AffyBatch object provided. Currently, GCRMA, RMA, and PLIER normalization are supported. For GCRMA, fast normalization is not used, as this gives unreliable results. For PLIER, justPlier (plier Bioconductor package) is used, with the "together" option for arrays having perfect match and mismatch probes, and the "PMonly" option for arrays with perfect match probes only.
When normalization per experimental group is selected, the function makes sure that still one normalized data object including all arrays is returned. In case customCDF is TRUE, annotation is updated using BrainArray custom cdf environments, before proceeding with the normalization (and summarisation of probe expressions into probeset expressions). To update the annotation, a sub call is made to the addUpdatedCDFenv function.

USAGE
By default, the script will call, if customCDF is TRUE:

normData <- normalizeData(rawData, normMeth,
perGroup=(normOption1=="group"), experimentFactor, customCDF,
species, CDFtype)


or, if customCDF is FALSE:

normData <- normalizeData(rawData, normMeth,
perGroup=(normOption1=="group"), experimentFactor, customCDF)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
normMethcharacterrequiredString indicating the normalization method used.
Possible values: RMA, GCRMA, PLIER or none.
""
perGrouplogicaloptionalShould normalization be performed per experimental
group (e.g. when global differences are expected
between groups) or for the dataset as a whole?
FALSE
experimentFactorfactorrequired if
perGroup is TRUE
The factor of groups.NULL
customCDFlogicaloptionalShould annotation of the chip be updated before
normalizing the data (and building the probesets
out of the separate probes)? If requested, this
is done using BrainArray updated cdf environments,
c.f. addUpdatedCDFenv.
TRUE
speciescharacterrequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The species associated with the chip type. NULL
CDFtypecharacterrequired when
customCDF is
TRUE, c.f.
addUpdatedCDFenv
The type of custom cdf requested.NULL
aTypecharacteroptionalString indicating the type of the current chip,
either “PMMM” for chips with perfect match and
mismatch probes, or “PMonly” for chips with
perfect match probes only. Required if normMeth
is “PLIER”.
NULL

OUTPUT IMAGE AND VALUE

TYPEDESCRIPTION
png fileA reference sheet indicating the cdf annotation (cdfName) used,
the normalization method used, and if this is the case stating
that normalization has been performed per experimental group
ExpressionSetnormalized data object containing the normalized
(and mostly 2log-transformed) values for each probeset and array

The addUpdatedCDFenv function

DESCRIPTION
This function (from functions_processing) tries to find and load an updated cdf environment from BrainArray for the current chiptype. If this is not successful, a warning will be generated and the raw data object returned as is, i.e. with the cdf annotation that it already had, if any. Note that the IDs given to the reporters are artificially created by adding “_at” as a postfix to the ID from the entry in the database that the probeset corresponds to (c.f. also createNormDataTable).
Note also that reannotation takes places at a probe level, not at a probeset level. This means that completely new probesets are constructed, not necessarily equal in size.

USAGE
By default, the script will call, if customCDF is TRUE (from within the normalizeData and computePMAtable functions, leaving the original Data intact and using Data.copy to further process):

Data.copy <- Data
Data.copy <- addUpdatedCDFenv(Data.copy, species, CDFtype)

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
DataAffyBatchrequiredThe raw data object 
speciescharacterrequiredThe species associated with the chip type *NULL
typecharacteroptionalThe type of custom cdf requested **
This parameter indicates the database based on which
an updated cdf environment should be selected.
“ENSG”

OUTPUT VALUE

TYPEDESCRIPTION
variable of class AffyBatchThe object with a updated cdf annotation assigned if found

* Possible values are:

- Abbreviated: "Ag", "At", "Bt", "Ce", "Cf", "Dr", "Dm", "Gg", "Hs", "MAmu", "Mm", "Os", "Rn", "Sc", "Sp", "Ss"
- Full names: "Anopheles gambiae", "Arabidopsis thaliana", "Bos taurus", "Caenorhabditis elegans",
"Canis familiaris", "Danio rerio", "Drosophila melanogaster", "Gallus gallus", "Homo sapiens",
"Macaca mulatta", "Mus musculus", "Oryza sativa", "Rattus norvegicus", "Saccharomyces
cerevisiae", "Schizosaccharomyces pombe", "Sus scrofa"

** Possible values are:
"ENTREZG", "REFSEQ", "ENSG", "ENSE", "ENST", "VEGAG", "VEGAE", "VEGAT", "TAIRG", "TAIRT", "UG", "MIRBASEF", "MIRBASEG"

The createNormDataTable function

DESCRIPTION
This function (from functions_processing) prepares a table suitable for visualisation and saving to disk from a normalized data object. In case of use of an updated cdf annotation (customCDF is TRUE), it will remove the artificial “_at” postfixes from all probeset IDs, apart from the affx controls, in order to get the real IDs from the database used to create the update. Also, in case an updated cdf environment based on Ensemble Gene ID (“ENSG”) has been used, the function tries to connect to BioMart (using the biomaRt BioConductor package) in order to add two extra columns of information to the table: the common gene name, and a gene description. This will (for now) not be done for other updated cdf types, as there is no one-to-one mapping between BioMart entries and these IDs or it has not been sufficiently tested. Note that the BioMart connection may sometimes not be established (e.g. if the service is down or busy), it may in such cases be worthwhile to try again.

USAGE
By default, the script will call:
normDataTable <- createNormDataTable(normData, customCDF, species, CDFtype)

After this function has been called, the normDataTable object is saved to a tab-delimited text file by the main script.

INPUT PARAMETERS

NAMETYPESTATUSDESCRIPTIONDEFAULT
normDataExpressionSetrequiredThe normalized data object  
customCDFlogicalrequiredHas the normalized data object been created after
updating the cdf annotation using BrainArray
updated cdf environments, c.f. normalizeData above.
NULL
speciescharacterrequired when
customCDF is
TRUE
The species associated with the chip type.NULL
CDFtypecharacterrequired when
customCDF is
TRUE
The type of custom cdf requested.NULL

OUTPUT VALUE

TYPEDESCRIPTION
data.frameTable with normalized data, and possible extra annotation

The following figure shows an example of the table generated for normalized data. it was obtained with a GC-RMA normalization applied on the example_dataset1 (see Download page), using the ensemble annotation (CDF was customized). As you can see, the first column (ENSG_ID) contains ensembl ids that were mapped using biomaRt package; two extra columns were added (external_gene_id and description) to give details on the genes.

NormDataTable

[Top]