Technical documentation
R local usage

arrayanalysis.org - affyAnalysisQC workflow - document version: 1.0.0

Table Of Content

[Introduction]
[Install R and the required libraries for local usage]
[Parameter settings and use affyQC as an R function]
[Scripts and functions description]

Introduction

All source code has been written in R and is open-source, available under the Apache License version 2.0. It is available on our Download page.

affyAnalysisQC can be run :
  - on-line via the arrayanalysis.org webportal (follow "Get started").
  - locally as an automated R workflow called via a wrapper function.

The main functions of affyAnalysisQC are:
   - to compute array quality information;
   - to plot images that allow identifying any aberrations present in the dataset;
   - to return pre-processed data and QC reports.

Bug tracking system

If you encounter an issue by using the code, you can report it at any moment or, once you have your own account, using our internal tracking system. You can also use this system to post comments or feature suggestions.

Example datasets

Note that three example datasets has been made available on our Download page. They include:
• dataset raw CEL files,
• description file,
• affyAnalysisQC ouput files:
   - execution logfile,
   - report file (pdf),
   - zip archive with images and tables and
   - normalized data (text file)

[Top]

Install R and the required libraries for local use of affyAnalysisQC

Installing R

The R software can be downloaded from http://www.r-project.org. From this website, follow the link to a local CRAN mirror in order to download the program. affyAnalysisQC is compatible with R version 2.12.0 and higher

Installing R libraries

Several libraries (packages) need to be installed before affyAnalysisQC can be executed. For your convenience, we prepared a script that loads almost all of the libraries needed. You can remove from the list libraries that are already included on your R installation.
The next line could be entered in R to execute the script (installing libraries will take a while):
source("http://svn.bigcat.unimaas.nl/arrayanalysis/tags/version_2.0.0/src/install_libraries.R")

Note that this script does not install some of the annotation packages needed. These are chiptype specific, and installing all of them would take a large amount of disk space. Depending on your system, R may automatically install these when running the script. If not, it will be needed to install these libraries manually (c.f. the instructions that follow).

Installing required R libraries under the R GUI for Windows

In the Windows GUI, select the packages menu, followed by Select repositories... A dialog window will pop up, from where the 'BioC software' is selected. Then, go back to packages and select install package(s). Select the required packages from the list below and click OK.

Installing required R libraries using the R terminal

In the R terminal, you can do exactly the same as above, by using the following procedure. Use:
setRepositories()
and make sure that at least 'BioC software' is selected. Next, use:
install.packages("libraryname")
to install a package or:
install.packages(c("libraryname1", "libraryname2", "libraryname3"))
to install - for example - three packages named libraryname1, libraryname2 and libraryname3 respectively.

Alternatively, the required packages can be installed using BioConductor, using the following command:
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("libraryname1", "libraryname2", "libraryname3"))

[Top]

Parameter settings for a usage on R (affyAnalysisQC.R script)

This section contains a description of the settings to be provided in the affyAnalysisQC.R script. By the way comparable settings and parameters are requested and used from the webportal and the GenePattern module.

Directories description

First of all, three directories need to be defined:

- a directory containing the CEL files (DATA.DIR)
- a directory containing the affyAnalysisQC.R helper scripts (SCRIPT.DIR)
- a directory to which the output tables and images should be written (WORK.DIR)

The SCRIPT.DIR is set to http://svn.bigcat.unimaas.nl/arrayanalysis/tags/version_2.0.0/src/ by default, to collect the up-dated script files (mainly functions_processing.R and functions_images.R) from the repository. Of course, if you would like to make changes to these functions, you can download them to your local machine, and set the SCRIPT.DIR to the correct location.

Parameters description

Description file
AffyAnalysisQC can use a description file containing information about the arrays (samples) in the dataset. This file is retrieved from the arrayGroup parameter. It require the entire path to the file. If arrayGroup is not set, the CEL file names are used as sample names, and no distinctive groups colours will be used in the images produced.
The description file is a tab-delimited text file containing three columns with the following layout:

ArrayDataFileSourceNameFactorValue
Array1.CELpatient1patient
Array14.CELcontrol1control
Aray23.CELpatient2patient
Array7.CELpatient3patient
.........

The first column contains the names of the CEL files (or any type that can be read by the ReadAffy (affy) function, e.g. CEL.gz) that are in the DATA.DIR. The second column contains the names to be used for each array in the plots and tables produced. The third column contains the names of the groups the samples belong to.
The column headers should be present, but may be named otherwise (as long as the order is the same, and no spaces are used in the names). If there are more than three columns, all further ones are ignored.

"reorder" parameter
The next parameter, reorder, indicates whether for the images and tables produced, the arrays have to be reordered by experimental group first, as this may ease interpretation.

Choice of the plots to be computed
All further parameters are mostly Booleans that indicate whether a certain plot or table has to be computed or not. Information on these plots can be found in the comment lines in the affyAnalysisQC.R file itself, and on the arrayanalysis.org website (see: "module description").

Options required for some plots
A few other parameters provide options for the plots:
MAOption1 and normOption1 indicate whether MA plots and normalization should be computer for the whole dataset (“dataset”) or per experimental group (“group”).
The clusterOption1, clusterOption2, and normMeth parameters give settings for clustering and normalization, respectively (c.f. help given in the script itself).

Array re-annotation
Finally customCDF indicates whether, before normalization, the array annotation has to be updated – as is advisable – with a custom cdf environment from the BrainArray lab. This is made by default.
CDFtype and species are two settings needed when an updated cdf is requested: the first indicates the database for which the updated cdf should be chosen (when selecting “ENSG”, the common gene name and description will also be added to the normalized data table), the second indicates the species (if not given, the script will try to deduce it from the chiptype).

The last line of the affyAnalysisQC.R script, starting with source, loads the run_affyAnalysisQC.R script, that creates all the images and output tables.

How to run the workflow after adjusting the settings?

After opening R (by either running the R GUI or typing R in a command shell), affyAnalysisQC can be initiated by entering:
> source("affyAnalysisQC.R")

[Top]

Scripts and functions description

For a detailed description of the run_affyAnalysisQC.R script, which is the core script of the module and all functions it calls, we refer the the function guide at doc_affyQC_func.php.

[Top]