Microarray Analysis


CEL files

Data upload

Input

To begin the analysis, upload the following two files:

1. CEL files (.zip)

  • Provide a .zip archive containing .CEL or .CEL.gz files.
  • The filenames of the CEL files must match the sample names in the metadata file.
  • ArrayAnalysis will automatically detect the column in the metadata that corresponds to the CEL filenames.

2. Metadata file

  • The metadata table can be provided as a .tsv or .csv file or as a Series Matrix File available from the GEO website.
  • Ensure that the metadata includes a column of sample names that match the CEL file names without file extension.

👉 You can analyse an example dataset (GSE6955) by clicking on Run example.


Output

After uploading, a preview of both the expression matrix and the metadata table will be displayed. This allows you to verify that the data has been uploaded correctly.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional) 🗑️ Use this option to exclude outliers or other samples you do not want included in the analysis.

2. Select experimental group 👥 The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select normalization method 🔧 The default normalization method is RMA, but you can also choose for GCRMA and PLIER.

4. Select probeset annotation 🏷️ By default, ArrayAnalysis uses the custom ENTREZ gene annotation from brainarray. If you select No annotation, the standard Affymatrix probeset annotation will be used.


👉 ArrayAnalysis automatically checks for potential outliers and will display a warning if any are detected. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix. A table of normalized expression values is provided and can be downloaded. You can view the expression profile of individual genes by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots. The boxplots show the distribution of expression values across all samples. Click here for more information on this figure.

3. Density plots. Similar to the boxplots, the density plot shows the distribution of expression values across all samples. Click here for more information on this figure.

4. Sample-sample correlation heatmap. The heatmap shows the sample-wise correlations based on the normalized expression profiles. Click here for more information on this figure.

5. PCA plot. 2D and 3D PCA plots can be viewed to assess data quality and identify potential outliers. Click here for more information on this figure.

6. Settings. Download the overview of pre-processing settings and the session info to ensure reproducibility.


Statistical Analysis

Input

Statistical analysis is performed using the limma package. To run the analysis, you can provide three inputs:

1. Statistical comparison(s) 📊 Select one or more statistical comparisons of interest. The direction of the statistical comparison is usually defined as Case - Control.

2. Covariates ⚙️ Adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation 🔖 Add gene annotations from the Ensembl database using the biomaRt package. For example, if your data uses ENTREZ gene IDs, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that enabling this option increases the runtime of the analysis


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table. A table with the statistics from the differential expression analysis. Click here for more information on this table.

2. p-value and log2FC histograms. These plots show the distribution of p-values and log2FCs. Click for more information on the p-value histogram and log2FC histogram .

3. Volcano plot. Scatter plot of log2FC versus -log10 p-value. Click here for more information on this figure.

4. MA plot. Scatter plot of mean log2 expression versus log2FC. Click here for more information on this figure.

5. Summary. Get the number of differentially expressed genes for selected p-value and log2FC thresholds.

6. Settings. Download the overview of statistical analysis settings and the session info to ensure reproducibility.


Gene Set Analysis

After completing the statistical analysis, you can perform Gene Set Analysis (GSA) using various gene set collections, including Gene Ontology (GO), KEGG, and WikiPathways. The clusterProfiler package is used for the analysis. Two methods are available:

  • Overrepresentation Analysis (ORA)
  • Gene Set Enrichment Analysis (GSEA)

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Input genes 🧬 Specify whether ORA should be performed on upregulated, downregulated, or both sets of genes. You can define input genes by:

  • Applying a p-value and log2FC threshold, or
  • Selecting the top N most significantly up/downregulated genes.

👉 The background gene list includes all genes that passed QC.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the overrepresentation analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top overrepresented genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the ORA settings and session info to ensure reproducibility.


Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable 🔢 Specify which variable (log2FC, -log10 P value, or signed -log10 P value) will be used to rank genes for the GSEA.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. GSEA table. Table with the statistics of the gene set enrichment analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top enriched genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the GSEA settings and session info to ensure reproducibility.

</div>

Intensity matrix

Data upload

Input

To begin the analysis, upload the following two files:

1. Intensity matrix

  • Provide an intensity matrix as .tsv or .csv file.
  • Alternatively, the intensity matrix can be provided in a Series Matrix File format, which can be downloaded from the GEO website.
  • The column names of the intensity matrix must match the sample names in the metadata file.
  • ArrayAnalysis will automatically detect the column in the metadata that corresponds to the sample names in the intensity matrix.

2. Metadata file

  • The metadata table can be provided as a .tsv or .csv file or as a Series Matrix File available from the GEO website.
  • Ensure that the metadata includes a column of sample names that match the column names in the intensity matrix.

👉 You can analyse an example dataset (GSE6955) by clicking on Run example.


Output

After uploading, a preview of both the expression matrix and the metadata table will be displayed. This allows you to verify that the data has been uploaded correctly.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional) 🗑️ Use this option to exclude outliers or other samples you do not want included in the analysis.

2. Select experimental group 👥 The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional) 🔧 If necessary, perform log2-transformation to make the data more normally distributed.

4. Select normalization method (optional) 🔧 If necessary, perform quantile normalization to normalize the expression values across the samples.


👉 ArrayAnalysis automatically checks for potential outliers and will display a warning if any are detected. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix. A table of normalized expression values is provided and can be downloaded. You can view the expression profile of individual genes by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots. The boxplots show the distribution of expression values across all samples. Click here for more information on this figure.

3. Density plots. Similar to the boxplots, the density plot shows the distribution of expression values across all samples. Click here for more information on this figure.

4. Sample-sample correlation heatmap. The heatmap shows the sample-wise correlations based on the normalized expression profiles. Click here for more information on this figure.

5. PCA plot. 2D and 3D PCA plots can be viewed to assess data quality and identify potential outliers. Click here for more information on this figure.

6. Settings. Download the overview of pre-processing settings and the session info to ensure reproducibility.


Statistical Analysis

Input

Statistical analysis is performed using the limma package. To run the analysis, you can provide three inputs:

1. Statistical comparison(s) 📊 Select one or more statistical comparisons of interest. The direction of the statistical comparison is usually defined as Case - Control.

2. Covariates ⚙️ Adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation 🔖 Add gene annotations from the Ensembl database using the biomaRt package. For example, if your data uses ENTREZ gene IDs, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that enabling this option increases the runtime of the analysis


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table. A table with the statistics from the differential expression analysis. Click here for more information on this table.

2. p-value and log2FC histograms. These plots show the distribution of p-values and log2FCs. Click for more information on the p-value histogram and log2FC histogram .

3. Volcano plot. Scatter plot of log2FC versus -log10 p-value. Click here for more information on this figure.

4. MA plot. Scatter plot of mean log2 expression versus log2FC. Click here for more information on this figure.

5. Summary. Get the number of differentially expressed genes for selected p-value and log2FC thresholds.

6. Settings. Download the overview of statistical analysis settings and the session info to ensure reproducibility.


Gene Set Analysis

After completing the statistical analysis, you can perform Gene Set Analysis (GSA) using various gene set collections, including Gene Ontology (GO), KEGG, and WikiPathways. The clusterProfiler package is used for the analysis. Two methods are available:

  • Overrepresentation Analysis (ORA)
  • Gene Set Enrichment Analysis (GSEA)

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Input genes 🧬 Specify whether ORA should be performed on upregulated, downregulated, or both sets of genes. You can define input genes by:

  • Applying a p-value and log2FC threshold, or
  • Selecting the top N most significantly up/downregulated genes.

👉 The background gene list includes all genes that passed QC.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the overrepresentation analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top overrepresented genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the ORA settings and session info to ensure reproducibility.


Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable 🔢 Specify which variable (log2FC, -log10 P value, or signed -log10 P value) will be used to rank genes for the GSEA.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. GSEA table. Table with the statistics of the gene set enrichment analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top enriched genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the GSEA settings and session info to ensure reproducibility.

</div>

RNA-seq Analysis


Raw count matrix

Data upload

Input

To begin the analysis, upload the following two files:

1. Count matrix

  • Provide a raw count matrix as .tsv or .csv file.
  • The column names of the count matrix must match the sample names in the metadata file.
  • ArrayAnalysis will automatically detect the column in the metadata that corresponds to the sample names in the count matrix.

2. Metadata file

  • The metadata table can be provided as a .tsv or .csv file or as a Series Matrix File available from the GEO website.
  • Ensure that the metadata includes a column of sample names that match the column names in the count matrix.

👉 You can analyse an example dataset (GSE128380) by clicking on Run example.


Output

After uploading, a preview of both the expression matrix and the metadata table will be displayed. This allows you to verify that the data has been uploaded correctly.


Pre-processing

Input

The data preprocessing is performed using the DESeq2 package. The data undergoes DESeq2 normalization and additional correlation unblinded variance stabilizing transformation is performed to generate the sample-sample correlation heatmap and PCA plot. For the preprocessing, you can provide the following inputs:

1. Remove samples (optional) 🗑️ Use this option to exclude outliers or other samples you do not want included in the analysis.

2. Select experimental group 👥 The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select filtering threshold ✂️ A gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.


👉 ArrayAnalysis automatically checks for potential outliers and will display a warning if any are detected. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix. A table of normalized expression values is provided and can be downloaded. You can view the expression profile of individual genes by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots. The boxplots show the distribution of expression values across all samples. Click here for more information on this figure.

3. Density plots. Similar to the boxplots, the density plot shows the distribution of expression values across all samples. Click here for more information on this figure.

4. Raw Read Counts. The bar chart displays the total number of raw sequencing reads measured for each sample. Click here for more information on this figure.

5. Sample-sample correlation heatmap. The heatmap shows the sample-wise correlations based on the normalized expression profiles. Click here for more information on this figure.

6. PCA plot. 2D and 3D PCA plots can be viewed to assess data quality and identify potential outliers. Click here for more information on this figure.

6. Settings. Download the overview of pre-processing settings and the session info to ensure reproducibility.


Statistical Analysis

Input

Statistical analysis is performed using the DESeq2 package. To run the analysis, you can provide three inputs:

1. Statistical comparison(s) 📊 Select one or more statistical comparisons of interest. The direction of the statistical comparison is usually defined as Case - Control.

2. Covariates ⚙️ Adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. log2FC shrinkage 📉 Shrink imprecise log2FCs towards 0 with the apeglm method. This option is recommended to get more accurate log2FCs estimates.

4. Gene annotation 🔖 Add gene annotations from the Ensembl database using the biomaRt package. For example, if your data uses ENTREZ gene IDs, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that enabling this option increases the runtime of the analysis


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table. A table with the statistics from the differential expression analysis. Click here for more information on this table.

2. p-value and log2FC histograms. These plots show the distribution of p-values and log2FCs. Click for more information on the p-value histogram and log2FC histogram .

3. Volcano plot. Scatter plot of log2FC versus -log10 p-value. Click here for more information on this figure.

4. MA plot. Scatter plot of mean log2 expression versus log2FC. Click here for more information on this figure.

5. Summary. Get the number of differentially expressed genes for selected p-value and log2FC thresholds.

6. Settings. Download the overview of statistical analysis settings and the session info to ensure reproducibility.


Gene Set Analysis

After completing the statistical analysis, you can perform Gene Set Analysis (GSA) using various gene set collections, including Gene Ontology (GO), KEGG, and WikiPathways. The clusterProfiler package is used for the analysis. Two methods are available:

  • Overrepresentation Analysis (ORA)
  • Gene Set Enrichment Analysis (GSEA)

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Input genes 🧬 Specify whether ORA should be performed on upregulated, downregulated, or both sets of genes. You can define input genes by:

  • Applying a p-value and log2FC threshold, or
  • Selecting the top N most significantly up/downregulated genes.

👉 The background gene list includes all genes that passed QC.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the overrepresentation analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top overrepresented genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the ORA settings and session info to ensure reproducibility.


Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable 🔢 Specify which variable (log2FC, -log10 P value, or signed -log10 P value) will be used to rank genes for the GSEA.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. GSEA table. Table with the statistics of the gene set enrichment analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top enriched genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the GSEA settings and session info to ensure reproducibility.

</div>

Processed count matrix

Data upload

Input

To begin the analysis, upload the following two files:

1. Count matrix

  • Provide a processed count matrix as .tsv or .csv file.
  • The column names of the count matrix must match the sample names in the metadata file.
  • ArrayAnalysis will automatically detect the column in the metadata that corresponds to the sample names in the count matrix.

2. Metadata file

  • The metadata table can be provided as a .tsv or .csv file or as a Series Matrix File available from the GEO website.
  • Ensure that the metadata includes a column of sample names that match the column names in the count matrix.

👉 You can analyse an example dataset (GSE128380) by clicking on Run example.


Output

After uploading, a preview of both the expression matrix and the metadata table will be displayed. This allows you to verify that the data has been uploaded correctly.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional) 🗑️ Use this option to exclude outliers or other samples you do not want included in the analysis.

2. Select experimental group 👥 The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional) 🔧 If necessary, perform log2-transformation to make the data more normally distributed.

4. Select filtering threshold (optional) ✂️ If this option is selected, a gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.

5. Select normalization method (optional) 🔧 If necessary, perform quantile normalization to normalize the expression values across the samples.


👉 ArrayAnalysis automatically checks for potential outliers and will display a warning if any are detected. Please always check the QC plots carefully to decide whether samples should be excluded from further analysis.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix. A table of normalized expression values is provided and can be downloaded. You can view the expression profile of individual genes by clicking on the table. Click here for more information on this figure.

</p>

2. Boxplots. The boxplots show the distribution of expression values across all samples. Click here for more information on this figure.

3. Density plots. Similar to the boxplots, the density plot shows the distribution of expression values across all samples. Click here for more information on this figure.

4. Sample-sample correlation heatmap. The heatmap shows the sample-wise correlations based on the normalized expression profiles. Click here for more information on this figure.

5. PCA plot. 2D and 3D PCA plots can be viewed to assess data quality and identify potential outliers. Click here for more information on this figure.

6. Settings. Download the overview of pre-processing settings and the session info to ensure reproducibility.


Statistical Analysis

Input

Statistical analysis is performed using the limma package (limma-trend). To run the analysis, you can provide three inputs:

1. Statistical comparison(s) 📊 Select one or more statistical comparisons of interest. The direction of the statistical comparison is usually defined as Case - Control.

2. Covariates ⚙️ Adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation 🔖 Add gene annotations from the Ensembl database using the biomaRt package. For example, if your data uses ENTREZ gene IDs, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that enabling this option increases the runtime of the analysis


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table. A table with the statistics from the differential expression analysis. Click here for more information on this table.

2. p-value and log2FC histograms. These plots show the distribution of p-values and log2FCs. Click for more information on the p-value histogram and log2FC histogram .

3. Volcano plot. Scatter plot of log2FC versus -log10 p-value. Click here for more information on this figure.

4. MA plot. Scatter plot of mean log2 expression versus log2FC. Click here for more information on this figure.

5. Summary. Get the number of differentially expressed genes for selected p-value and log2FC thresholds.

6. Settings. Download the overview of statistical analysis settings and the session info to ensure reproducibility.


Gene Set Analysis

After completing the statistical analysis, you can perform Gene Set Analysis (GSA) using various gene set collections, including Gene Ontology (GO), KEGG, and WikiPathways. The clusterProfiler package is used for the analysis. Two methods are available:

  • Overrepresentation Analysis (ORA)
  • Gene Set Enrichment Analysis (GSEA)

Overrepresentation Analysis (ORA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Input genes 🧬 Specify whether ORA should be performed on upregulated, downregulated, or both sets of genes. You can define input genes by:

  • Applying a p-value and log2FC threshold, or
  • Selecting the top N most significantly up/downregulated genes.

👉 The background gene list includes all genes that passed QC.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the overrepresentation analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top overrepresented genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the ORA settings and session info to ensure reproducibility.


Gene Set Enrichment Analysis (GSEA)

Input

1. Statistical comparison 📊 Select the statistical comparison for which you want to perform ORA.

2. Geneset collection 📋 Select one of the following geneset collections: Gene Ontology-Biological Process (GO-BP), -Cellular Component (GO-CC), and -Molecular Function (GO-MF), KEGG, and WikiPathways geneset collections.

3. Ranking variable 🔢 Specify which variable (log2FC, -log10 P value, or signed -log10 P value) will be used to rank genes for the GSEA.

4. Gene identifier 🔖 Select which column in the statistics table contains the gene identifiers, and specify the identifier type (e.g., Entrez Gene, Ensembl Gene, or Gene Symbol) and organism.


Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. GSEA table. Table with the statistics of the gene set enrichment analysis. Click here for more information on this table.

2. Barchart. The plot displays statistics of the most top enriched genesets. Click here for more information on this figure.

3. Network. The network visualizes the relationship between the most significantly enriched genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). Click here for more information on this figure.

4. Settings. Download the overview of the GSEA settings and session info to ensure reproducibility.

</div>