Microarray Analysis


Starting from raw data

Data upload

Input

To start the analysis, the user should upload two files:

1. .zip folder with .CEL or .CEL.gz files. The filenames of the CEL files should match with the sample names in the meta data file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the CEL filenames.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE6955) by clicking on Run example.


Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select normalization method. The default normalization method is RMA. However, you can also choose for GCRMA and PLIER.

4. Select probeset annotation. By default, ArrayAnalysis will use the custom ENTREZ gene annotation from brainarray. If you select No annotation, the Affymatrix probeset annotation will be used.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table.

2. Boxplots (static). The boxplots show the distribution of expression values for all samples.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality.

6. Table of pre-processing settings (static). An overview of the selected pre-processing settings is provided and can be downloaded. This ensures reproducibility of your analysis.


Statistical Analysis

Input

Statistical analysis is performed using the limma package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by limma.

2. Histograms (interactive) of the p-value and logFC distribution.

3. Volcano plot (interactive).


Overrepresentation Analysis

Input

After the statistical analysis, you can perform geneset (i.e., Gene Ontology, KEGG and WikiPathways) overrepresentation analysis (ORA). For this, you can select from various options:

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Differentially expressed genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the differentially expressed genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes.

4. Gene identifier. Select which column contains the gene identifiers and what type of identifiers these are (e.g., ENTREZG, Ensembl, or gene symbol). Furthermore, indicate to which organism the gene identifiers belong.

</br>

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network.



Starting from processed data

Data upload

Input

To start the analysis, the user should upload two files:

1. Expression data in Series Matrix File format. Series Matrix Files can be downloaded from the GEO website.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File.

You can analyse an example dataset (GSE6955) by clicking on Run example.


Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional). You can perform log2-transformation to make the data more normally distributed.

4. Select normalization method (optional). You can perform quantile normalization to normalize the expression values across the samples.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table.

2. Boxplots (static). The boxplots show the distribution of expression values for all samples.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality.

6. Table of pre-processing settings (static). An overview of the selected pre-processing settings is provided and can be downloaded. This ensures reproducibility of your analysis.


Statistical Analysis

Input

Statistical analysis is performed using the limma package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by limma.

2. Histograms (interactive) of the p-value and logFC distribution.

3. Volcano plot (interactive).


Overrepresentation Analysis

Input

After the statistical analysis, you can perform geneset (i.e., Gene Ontology, KEGG and WikiPathways) overrepresentation analysis (ORA). For this, you can select from various options:

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Differentially expressed genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the differentially expressed genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes.

4. Gene identifier. Select which column contains the gene identifiers and what type of identifiers these are (e.g., ENTREZG, Ensembl, or gene symbol). Furthermore, indicate to which organism the gene identifiers belong.

</br>

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network.



RNA-seq Analysis


Starting from raw data

Data upload

Input

To start the analysis, the user should upload two files:

1. Expression matrix. This matrix can be provided as a .tsv/.csv file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the column names of the matrix.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE128380) by clicking on Run example.


Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.


Pre-processing

Input

The data preprocessing is performed using the DESeq2 package. For this, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select filtering threshold. A gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table.

2. Boxplots (static). The boxplots show the distribution of expression values for all samples.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality.

6. Table of pre-processing settings (static). An overview of the selected pre-processing settings is provided and can be downloaded. This ensures reproducibility of your analysis.


Statistical Analysis

Input

Statistical analysis is performed using the DESeq2 package. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by DESeq2. The log2FC estimates are shrunken using the apeglm method.

2. Histograms (interactive) of the p-value and logFC distribution.

3. Volcano plot (interactive).


Overrepresentation Analysis

Input

After the statistical analysis, you can perform geneset (i.e., Gene Ontology, KEGG and WikiPathways) overrepresentation analysis (ORA). For this, you can select from various options:

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Differentially expressed genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the differentially expressed genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes.

4. Gene identifier. Select which column contains the gene identifiers and what type of identifiers these are (e.g., ENTREZG, Ensembl, or gene symbol). Furthermore, indicate to which organism the gene identifiers belong.

</br>

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network.



Starting from processed data

Data upload

Input

To start the analysis, the user should upload two files:

1. Expression matrix. This matrix can be provided as a .tsv/.csv file. ArrayAnalysis will automatically look for the column in the metadata file that matches with the column names of the matrix.

2. Metadata file. The metadata table can be provided as a .tsv/.csv file or as a Series Matrix File available from the GEO website.

You can analyse an example dataset (GSE128380) by clicking on Run example.


Output

A preview of the expression matrix and metadata table will be shown. This way, you can check whether the data has been correctly uploaded.


Pre-processing

Input

For the data preprocessing, you can provide the following inputs:

1. Remove samples (optional). Via this option, you can remove outliers or samples from experimental groups that you would like to exclude from the analysis.

2. Select experimental group. The experimental group is the variable that defines the cases and controls (e.g., disease status). By selecting more than one variable, you can combine multiple variables into a single experimental variable. This can be useful if you, for example, want to compare cases and controls for different tissues (e.g., liver and brain). As such, you can create four different experimental groups: case_brain, control_brain, case_liver, and control_liver. The selection of the experimental group will be used for the normalization (if normalization per experimental group selected) and for the statistical analysis in the next tab.

3. Select transformation method (optional). You can perform log2-transformation to make the data more normally distributed.

3. Select filtering threshold (optional). If this option is selected, a gene is kept for the subsequent analysis if it has a count larger or equal to the selected filtering threshold in at least n samples, where n is the number of samples in the smallest experiment group.

4. Select normalization method (optional). You can perform quantile normalization to normalize the expression values across the samples.


Output

Data and preprocessing quality can be checked in the different output tables and figures:

1. Normalized expression matrix (interactive). The table of normalized expression values is provided and can be downloaded. The expression profile of a gene can be viewed by clicking on the table.

2. Boxplots (static). The boxplots show the distribution of expression values for all samples.

3. Density plots (interactive). Similar to the boxplots, the density plot shows the distribution of expression values for all samples.

4. Correlation plot (interactive). The correlation plot shows the sample-wise correlations. You can choose between different linkage and clustering methods.

5. PCA plot (interactive). 2D and 3D PCA plots can be viewed to assess data quality.

6. Table of pre-processing settings (static). An overview of the selected pre-processing settings is provided and can be downloaded. This ensures reproducibility of your analysis.


Statistical Analysis

Input

Statistical analysis is performed using the limma package, including Voom transformation. To perform statistical analysis, you can provide three inputs:

1. Statistical comparison(s). You can select one or more statistical comparisons of interest. Note that the direction of the statistical comparison is usually defined as Case - Control.

2. Covariates. You can adjust for continuous (e.g., age) and categorical (e.g., sex, tissue) covariates.

3. Gene annotation. You can add gene annotations from the Ensembl database using the biomaRt package. For example, if you have ENTREZ gene IDs annotations, you can use this option to add gene symbols or Ensembl gene IDs to the statistics output. Please note that if you select this option, the time required for the statistical analysis will increase significantly.


Output

For each of the selected statistical comparisons, the following output are provided:

1. Top table (interactive). A table with the relevant statistics as calculated by limma.

2. Histograms (interactive) of the p-value and logFC distribution.

3. Volcano plot (interactive).


Overrepresentation Analysis

Input

After the statistical analysis, you can perform geneset (i.e., Gene Ontology, KEGG and WikiPathways) overrepresentation analysis (ORA). For this, you can select from various options:

1. Statistical comparison. Select the statistical comparison for which you want to perform ORA.

2. Geneset collection. Select from Gene Ontology (GO)-Biological Process (BP), -Cellular Component (CC), and -Molecular Function (MF), as well as WikiPathways geneset collections.

3. Differentially expressed genes. Indicate whether you want to perform ORA on up- and/or downregulated genes. Furhermore, you can also indicate whether the differentially expressed genes should be selected based on a p-value/logFC threshold or whether ORA should simply be performed on the top most significantly up/downregulated genes.

4. Gene identifier. Select which column contains the gene identifiers and what type of identifiers these are (e.g., ENTREZG, Ensembl, or gene symbol). Furthermore, indicate to which organism the gene identifiers belong.

</br>

Output

Data and preprocessing quality can be checked in the different outputs and QC plots:

1. ORA table. Table with the statistics of the ORA. This table can be downloaded.

2. Barchart of the most significantly overrepresented genesets. You can select the number of genesets in the chart.

3. Network of the most significantly overrepresented genesets. The edge thickness is proportional to the Jaccard Index (i.e., number of shared genes/total number of gene in both genesets). You can select the network layout and number of genesets in the network.