The differential expression analysis performed with DESeq2 uses data from the TCGA Gene Expression Quantification HTSeq - counts. This analysis is based on the negative binomial distribution to identify differentially expressed genes.

All DESeq2 analyses provided by this web server are precomputed, ensuring that users do not need to spend time on computationally intensive tasks. This significantly reduces the time required to obtain results.

To perform the analysis,

Select the tumor of interest
Press the search button "Features"
This will display in the second select all the features for which it is possible to obtain the result of this analysis for that specific tumor.
Pressing "Submit" will display the results with graphs and related information of the data entered by the user.

A key detail provided in the results is the comparison of parameters, specifically the order in which the feature parameters are compared (e.g., Tumor vs. Control).

Results and Output

The analysis results in a downloadable table that includes:

Log2 Fold Change: The calculated fold change between conditions.
p-value: The statistical significance of the differential expression.
Adjusted p-value (padj):The p-value adjusted for multiple testing to control the false discovery rate.

Additionally, several graphical outputs are available to enhance interpretation, including:

Enhanced Volcano Plot:A plot that highlights significantly differentially expressed genes.
Heatmap: A visual representation of gene expression levels across samples.
PCA Plot: A Principal Component Analysis plot to visualize sample clustering.
Top 100 Heatmap: A heatmap showing the top 100 differentially expressed genes.

These plots can be both viewed directly on the page and downloaded for further analysis or presentation purposes.

Differential Expression Analysis Single Tumor

This analysis allows users to explore differences in the expression of a gene, miRNA, or protein between patients in two distinct conditions (based on the selected feature) within a given tumor type. The analysis leverages TCGA Gene Expression Quantification data (FPKM, normalized counts) to assess these differences.

The results are visualized in a box plot, where the x-axis represents the two conditions under comparison (e.g., male vs. female), and the y-axis displays the expression levels of the specified gene, miRNA, or protein.

A p-value, calculated using the Wilcoxon test, quantifies the significance of the observed expression differences between the two groups.

Pan-Cancer Differential Expression Analysis

This analysis is the differential expression under two conditions of a gene, miRNA, or protein for all tumors for which data for the selected feature is available.
This gives the user an overview of the expression of that particular gene in different tumors.
On the x-axis are shown all tumors for which analysis is available. On the y-axis the expression of the gene, miRNA or protein is shown.

Overall Survival

The analysis presented involves the investigation of overall survival (OS) data using Kaplan-Meier survival curves and log-rank tests to evaluate the association between gene, miRNA, and protein expression levels with patient outcomes. Specifically, OS data provided by The Cancer Genome Atlas (TCGA) is used as time-to-event information. The survival time, expressed in days, measures the duration from diagnosis or treatment initiation until the event of interest, such as death or last follow-up.

The expression data used are Gene Expression Quantification HTSeq - FPKM, miRNA Expression Quantification and Protein Expression Quantification

Choice of Time Variables:Users have the flexibility to select the time variable for the analysis based on their specific research needs. The two available options are:

OS.time: This variable uses overall survival time data as the primary measure for the time-to-event analysis.
DFI.time: This variable utilizes disease-free interval time data, which reflects the duration from diagnosis to the first occurrence of disease recurrence or progression.

The choice between these two time variables can impact the survival analysis results and their interpretations, allowing users to tailor the analysis to their research objectives.

The analysis consists of several key components:

1. Kaplan-Meier Survival Curves: Kaplan-Meier estimators are used to compute the survival probabilities over time for patients stratified by their expression levels. For each gene, miRNA, or protein, patients are divided into two groups: those with expression levels above the median (higher expression) and those below the median (lower expression). The survival curves for each group are plotted to visualize differences in survival probabilities over time.

2. Log-Rank Test: To statistically compare the survival distributions between the higher and lower expression groups, the log-rank test is employed. This test assesses whether there are significant differences in overall survival between the two groups, with the resulting p-value indicating the level of statistical significance.

3. Survival Time in Days: The x-axis of the Kaplan-Meier plot represents the time in days, reflecting the OS data. The y-axis displays the survival probability, which represents the conditional probability of surviving beyond a given time point.

4. Expression Level Impact on Survival: By linking the OS data to gene, miRNA, and protein expression levels, this analysis aims to identify potential biomarkers of survival, providing insights into the biological mechanisms driving cancer progression and patient prognosis.

This integrative approach combining transcriptomic, proteomic, and survival data allows for a comprehensive assessment of how molecular alterations impact patient outcomes, potentially guiding personalized cancer therapies and improving prognostic predictions.

Overall Survival with Pathway Activity Score

This analysis offers the opportunity to perform an overall survival analysis by going to study instead of the expression of a single gene, the expression of a set of genes (pathways).

The analysis for a given tumor consists of dividing samples into two groups based on pathway activity score (PAS) values pre-calculated with GSVA (Gene Set Variation Analysis). The PAS describes how active that pathway is in that given sample. GSVA is a nonparametric, unsupervised method for estimating the variation in gene set enrichment across samples in an expression dataset.
The implemented script uses OS time data provided by TCGA in the clinical data file as time-to-event information. The time data are expressed in days.

Survival Analysis with Gene Mutation Status

This analysis examines the overall survival (OS) of patients by evaluating the impact of specific gene mutations on survival outcomes within a particular tumor type. Utilizing the maftools R package, the analysis leverages mutation data from The Cancer Genome Atlas (TCGA) to provide insights into how the mutation status of a selected gene influences patient prognosis.

Tumor Mutation Analysis

This analysis uses the maftools library to analyze genomic data of a specific tumour from the TCGA database. It generates a mutation summary image (maf summary) and a graph comparing mutation transitions and transversions (TiTv). Each graph provides a visual analysis of the mutations, highlighting key statistics and distributions characteristic of the selected tumour.

Oncoplot

This analysis generates an oncoplot displaying the number of selected genes (top 10, 15, 20 or 25) most frequently mutated in a specific tumour type, using data from the TCGA database. Each column represents a sample, while the rows indicate the genes, with the mutations highlighted in distinctive colours.

This graph provides a clear overview of prevalent mutations, facilitating the identification of key genes associated with the disease.

Somatic Interaction Analysis

This analysis generates a graph of somatic interactions using the somaticInteractions function of the maftools library to analyse the interactions between mutated genes in a selected specific tumor. You can select the number of genes to be analysed

Gene Mutation Analysis

This analysis produces a lollipop graph using the maftools library to visualise mutations in a specific gene within a tumour selected from the TCGA genomic data. Mutations are annotated according to the Protein_Change column, which indicates the types of amino acid changes. The graph represents the frequency of mutations along the length of the gene, highlighting significant mutations at the corresponding points.

Differential Expression for Mutated Status - Deseq2

This analysis investigates the expression differences between mutated and wild-type samples of a specific gene in a chosen tumor type using TCGA data. It employs the DESeq2 package to perform differential expression analysis, generating results saved as a text file. Visualization includes a heatmap of sample distances, PCA plots, a heatmap of the top 50 most variable genes, and an Enhanced Volcano plot to highlight significant gene expression changes. The results provide insights into the impact of mutations on gene expression within the tumor context.

Differentially Mutated Gene by Clinical Feature

This analysis compares two cohorts of cancer patients using genomic data from TCGA to analyze genetic mutations by stratifying patients by clinical characteristics. It generates a forest plot highlighting the 10 genes with the most significant mutations between the groups, showing differences in mutational profiles. It also produces a coBarplot illustrating the distribution of mutations in the two groups.

These graphs provide a clear view of mutational differences, helping to identify key genes associated with different clinical responses. The results are also saved in a CSV file for further analysis.

Cell-mixture deconvolution

To estimate cellular proportions, we applied a computational deconvolution method using gene expression profiles from bulk tissue samples.

The core of this approach is the use of a pre-defined "basis matrix", which contains gene expression profiles of specific cell types and their corresponding marker genes. These marker genes, which are either uniquely or predominantly expressed by particular cell types, serve to identify and quantify the relative abundance of each cell type in the mixed sample.

We used the ImmunoStates basis matrix from the MetaIntegrator R package, which includes profiles for 20 immune cell types, to estimate the proportions of these cell types in our samples.

Mathematically, the deconvolution process involves solving a linear regression model, where the observed bulk expression values are modeled as a weighted sum of the reference expression profiles in the basis matrix. The weights correspond to the estimated proportions of each cell type, providing a cellular composition profile for each sample.

Correlation Cell Type and Pathways

This function serves to calculate the association between cellular abundance and pathway activity within tumor samples.

For cellular composition, we first estimated the relative proportions of immune cell types using cell-mixture deconvolution. This approach relies on the ImmunoStates basis matrix, which contains predefined gene expression profiles for 20 immune cell types.

To assess pathway activity, we applied single-sample Gene Set Enrichment Analysis (ssGSEA), which calculates an activity score for each pathway based on the expression of genes within a predefined set of cancer-related and immune-associated pathways. These pathway activity scores provide a quantitative measure of pathway activation in each tumor sample.

By correlating the cell-type proportions with the ssGSEA scores, we were able to identify significant associations between specific immune cell types and the activity of relevant biological pathways, offering insights into how cellular composition may influence tumor biology.

Analysis

PANDA Documentation

DESeq2 Analysis

Results and Output

Differential Expression Analysis Single Tumor

Differential Expression Analysis Single Tumor

Pan-Cancer Differential Expression Analysis

Pan-Cancer Differential Expression Analysis

Overall Survival

Overall Survival

Overall Survival with Pathway Activity Score

Overall Survival with Pathway Activity Score

Survival Analysis with Gene Mutation Status

Survival Analysis with Gene Mutation Status

Tumor Mutation Analysis

Tumor Mutation Analysis

Oncoplot

Oncoplot

Somatic Interaction Analysis

Somatic Interaction Analysis

Gene Mutation Analysis

Gene Mutation Analysis

Differential Expression for Mutated Status - Deseq2

Differential Expression for Mutated Status - Deseq2

Differentially Mutated Gene by Clinical Feature

Differentially Mutated Gene by Clinical Feature

Cell-mixture Deconvolution

Cell-mixture deconvolution

Correlation Cell Type and Pathways

Correlation Cell Type and Pathways