Microarray Analysis

Starting from raw data

Data upload

Input

To start the analysis, the user should upload two files:

1. .zip folder with .CEL or .CEL.gz files. The filenames of the CEL files should match with the sample names in the meta data file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the CEL filenames.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE6955) by clicking on Run example.

Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.

Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select normalization method. The default normalization method is RMA. However, you can also choose for GCRMA and PLIER.

4. Select probeset annotation. By default, ArrayAnalysis will use the custom ENTREZ gene annotation from brainarray. If you select No annotation, the Affymatrix probeset annotation will be used.

NOTE: ArrayAnalysis automatically checks for potential outliers in the data. If potential outliers have been identified, you will receive a warning message. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.

Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots (static). The boxplots show the distribution of expression values for all samples. Click here for more information on this figure.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples. Click here for more information on this figure.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods. Click here for more information on this figure.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality. Click here for more information on this figure.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Statistical Analysis

Input

Statistical analysis is performed using the limma package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.

Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by DESeq2. Click here for more information on this table.

2. Histograms (interactive) of the P value and logFC distribution. Click for more information on the P value histogram and logFC value histogram .

3. Volcano plot (interactive): scatter plot of logFC versus -log P value. Click here for more information on this figure.

4. MA plot (interactive): scatter plot of mean log expression versus logFC. Click here for more information on this figure.

5. Summary of the number of differentially expressed genes for selected P and logFC thresholds.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Analysis

After the statistical analysis, you can perform Gene Set Analysis (GSA) using different gene set colections (i.e., Gene Ontology, KEGG and WikiPathways). Two different methods can be used for GSA: Overrepresentation analyis (ORA) and Gene Set Enrichment Analysis (GSEA).

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Input genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the input genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes. The background gene list includes all genes that passed QC.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable. Indicate which variable (logFC, -log P value, or signed -log P value) will be used to rank genes for the GSEA.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. Interactive table with GSEA statistics. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly enriched genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Starting from processed data

Data upload

Input

To start the analysis, the user should upload two files:

1. Intensity table in .csv/.tsv or Series Matrix File format. Series Matrix Files can be downloaded from the GEO website.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File.

You can analyse an example dataset (GSE6955) by clicking on Run example.

Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.

Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional). You can perform log₂-transformation to make the data more normally distributed.

4. Select normalization method (optional). You can perform quantile normalization to normalize the expression values across the samples.

NOTE: ArrayAnalysis automatically checks for potential outliers in the data. If potential outliers have been identified, you will receive a warning message. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.

Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots (static). The boxplots show the distribution of expression values for all samples. Click here for more information on this figure.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples. Click here for more information on this figure.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods. Click here for more information on this figure.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality. Click here for more information on this figure.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Statistical Analysis

Input

Statistical analysis is performed using the limma package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.

Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by DESeq2. Click here for more information on this table.

2. Histograms (interactive) of the P value and logFC distribution. Click for more information on the P value histogram and logFC value histogram .

3. Volcano plot (interactive): scatter plot of logFC versus -log P value. Click here for more information on this figure.

4. MA plot (interactive): scatter plot of mean log expression versus logFC. Click here for more information on this figure.

5. Summary of the number of differentially expressed genes for selected P and logFC thresholds.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Analysis

After the statistical analysis, you can perform Gene Set Analysis (GSA) using different gene set colections (i.e., Gene Ontology, KEGG and WikiPathways). Two different methods can be used for GSA: Overrepresentation analyis (ORA) and Gene Set Enrichment Analysis (GSEA).

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Input genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the input genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes. The background gene list includes all genes that passed QC.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable. Indicate which variable (logFC, -log P value, or signed -log P value) will be used to rank genes for the GSEA.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. Interactive table with GSEA statistics. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly enriched genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

RNA-seq Analysis

Starting from raw data

Data upload

Input

To start the analysis, the user should upload two files:

1. Expression matrix. This matrix can be provided as a .tsv/.csv file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the column names of the matrix.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE128380) by clicking on Run example.

Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.

Pre-processing

Input

The data preprocessing is performed using the DESeq2 package. The data undergoes DESeq2 normalization and, for PCA and sample-sample correlations, additional correlation unblinded variance stabilizing transformation is performed. For the preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select filtering threshold. A gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.

NOTE: ArrayAnalysis automatically checks for potential outliers in the data. If potential outliers have been identified, you will receive a warning message. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.

Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots (static). The boxplots show the distribution of expression values for all samples. Click here for more information on this figure.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples. Click here for more information on this figure.

4. Raw Read Counts (static). This bar chart displays the total number of raw sequencing reads measured for each sample. Click here for more information on this figure.

5. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods. Click here for more information on this figure.

6. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality. Click here for more information on this figure.

7. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Statistical Analysis

Input

Statistical analysis is performed using the DESeq2 package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. logFC shrinkage. You can shrink imprecise logFCs towards 0 with the apeglm method. This option is recommended to get more accurate estimates of the logFCs.

4. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.

Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by DESeq2. Click here for more information on this table.

2. Histograms (interactive) of the P value and logFC distribution. Click for more information on the P value histogram and logFC value histogram .

3. Volcano plot (interactive): scatter plot of logFC versus -log P value. Click here for more information on this figure.

4. MA plot (interactive): scatter plot of mean log expression versus logFC. Click here for more information on this figure.

5. Summary of the number of differentially expressed genes for selected P and logFC thresholds.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Analysis

After the statistical analysis, you can perform Gene Set Analysis (GSA) using different gene set colections (i.e., Gene Ontology, KEGG and WikiPathways). Two different methods can be used for GSA: Overrepresentation analyis (ORA) and Gene Set Enrichment Analysis (GSEA).

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Input genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the input genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes. The background gene list includes all genes that passed QC.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable. Indicate which variable (logFC, -log P value, or signed -log P value) will be used to rank genes for the GSEA.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. Interactive table with GSEA statistics. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly enriched genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Starting from processed data

Data upload

Input

To start the analysis, the user should upload two files:

1. Expression matrix. This matrix can be provided as a .tsv/.csv file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the column names of the matrix.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE128380) by clicking on Run example.

Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.

Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional). You can perform log₂-transformation to make the data more normally distributed.

3. Select filtering threshold (optional). If this option is selected, a gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.

4. Select normalization method (optional). You can perform quantile normalization to normalize the expression values across the samples.

NOTE: ArrayAnalysis automatically checks for potential outliers in the data. If potential outliers have been identified, you will receive a warning message. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.

Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots (static). The boxplots show the distribution of expression values for all samples. Click here for more information on this figure.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples. Click here for more information on this figure.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods. Click here for more information on this figure.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality. Click here for more information on this figure.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Statistical Analysis

Input

Statistical analysis is performed using the limma package (limma-trend). To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.

Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by DESeq2. Click here for more information on this table.

2. Histograms (interactive) of the P value and logFC distribution. Click for more information on the P value histogram and logFC value histogram .

3. Volcano plot (interactive): scatter plot of logFC versus -log P value. Click here for more information on this figure.

4. MA plot (interactive): scatter plot of mean log expression versus logFC. Click here for more information on this figure.

5. Summary of the number of differentially expressed genes for selected P and logFC thresholds.

6. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Analysis

After the statistical analysis, you can perform Gene Set Analysis (GSA) using different gene set colections (i.e., Gene Ontology, KEGG and WikiPathways). Two different methods can be used for GSA: Overrepresentation analyis (ORA) and Gene Set Enrichment Analysis (GSEA).

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Input genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the input genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes. The background gene list includes all genes that passed QC.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable. Indicate which variable (logFC, -log P value, or signed -log P value) will be used to rank genes for the GSEA.

4. Gene identifier. Select which column of the statistics table contains the gene identifiers and what type of identifiers these are (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol). Furthermore, indicate to which organism the gene identifiers belong.

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. Interactive table with GSEA statistics. This table can be downloaded. Click here for more information on this table.

2. Barchart of the most significantly enriched genesets. You can select the number of genesets in the chart. Click here for more information on this figure.

3. Network of the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network. Click here for more information on this figure.

4. Settings to ensure reproducibility. The overview of the chosen settings and the session info can be downloaded.

>

ArrayAnalysis Documentation

Microarray Analysis

Starting from raw data

Data upload

Pre-processing

Statistical Analysis

Gene Set Analysis

Overrepresentation Analysis (ORA)

Gene Set Enrichment Analysis (GSEA)

Starting from processed data

Data upload

Pre-processing

Statistical Analysis

Gene Set Analysis

Overrepresentation Analysis (ORA)

Gene Set Enrichment Analysis (GSEA)

RNA-seq Analysis

Starting from raw data

Data upload

Pre-processing

Statistical Analysis

Gene Set Analysis

Overrepresentation Analysis (ORA)

Gene Set Enrichment Analysis (GSEA)

Starting from processed data

Data upload

Pre-processing

Statistical Analysis

Gene Set Analysis

Overrepresentation Analysis (ORA)

Gene Set Enrichment Analysis (GSEA)