Key words
ovarian cancer - immune cell invasion - TCGA - bioinformatics - immunotherapy
TCGA The Cancer Genome Atlas
OS Overall survival
ICI Immune cell infiltration
OC Ovarian cancer
CIBERSORT Cell-type Identification By Estimating Relative
Subsets Of RNA Transcripts
DEGs Differentially expressed genes
PD-L1 Programmed death ligand-1
PD-1 Programmed death-1
TMB Tumor mutation burden
TME Tumor microenvironment
NGS Next-generation sequencing
FPKM Fragments per kilobase million
TPM Kilobase million
CDF Cumulative distribution function
Background
Ovarian cancer (OC) is one of the three most common gynecological malignancies
worldwide. The clinical symptoms of early OC are not obvious, and it is difficult to
diagnose. Most of these are already in an advanced stage at the time of diagnosis.
Their five-year overall survival rate is only approximately 40 to 45% [1], of which it is only 20–30%
for advanced patients [2]. At present, the
main method of treatment is surgery combined with radiotherapy and chemotherapy.
However, some patients experience drug resistance and recurrence after taking
chemotherapy drugs such as cisplatin leading to a poor prognosis [3].
Tumor immunotherapy is an antitumor therapy based on immunotherapy, which has become
a research highlight in tumor therapy. It achieves antitumor effects by activating
and enhancing the ability of the immune system to recognize and eliminate tumor
cells and mobilize autoimmunity [4]
[5]. Between them, programmed death ligand-1
(PD-L1), as a membrane inhibiting molecule on the surface of tumor cells, can
restrict T cells by activating the PD-1/PD-L1 inhibitory signaling pathway.
It creates an immunological microenvironment that favors the growth of tumor cell
growth and induces immune escape from tumors. Checkpoint inhibitor therapy targeting
programmed death-1 (PD-1)/PD-L1 can block this pathway, restore immune
activity of T cells, enhance the immune response and improve the immune system
response to tumor cells with its ability to recognize and kill tumor cells. This
therapy has achieved remarkable results in the treatment of various solid tumors.
Numerous studies have confirmed that immune checkpoint inhibitors have some effect
in patients with early-stage OC patients [6]
[7]. However, the treatment
effect in advanced patients is not satisfactory [8]
[9]. Tumor mutation burden (TMB)
is an immunophenotypic marker that describes the number of non-synonymous mutations
in a tumor sample. This represents the degree of genomic instability and the
probability of neoepitopes appearing on the cell surface [10]. TMB can be used as an independent
predictor of the PD-1/PD-L1 treatment effect, and the effectiveness of
immune checkpoint inhibitors can be predicted by screening suitable biomarkers in
order to filter the effective patient population for this method [11]. However, no study has shown an association
between TMB and the response to immunotherapy in OC.
Solid tumors can escape the surveillance of the immune system through various
mechanisms of immune evasion or resistance, creating an environment suitable for
their growth – the tumor microenvironment (TME) [12]. TME leads to the dysfunction of immune
cells such as T cells and NK cells, inhibits the activation of antigen-presenting
cells and promotes the formation of microvascular beds and matrix barriers, inducing
the formation of immunosuppressive effects and causing tumor immune escape [13]. Correspondingly, the expression of tumor
self-stimulatory molecules (CTLA-4 and PD-1/PDL-1) is also downregulated,
allowing tumor cells to prevent T cells from being killed [14]. It is important to elucidate the TME of
OCs from the functional perspective of immune cells by establishing an antitumor
response of immune cells to prevent occurrence, progression or recurrence of tumors.
Among them, immune cells, metabolic pathways, and intracellular signaling molecules
represent unique properties of the TME and have also become important indicators of
immunotherapeutic response and prognosis [15].
Next-generation sequencing (NGS) technology, also known as high-throughput sequencing
technology, breaks DNA molecules into short fragments of 500 to 800 bp and
simultaneously sequences tens of millions or even hundreds of millions of DNA
fragments on the same chip [16]. With the
advancement of NGS, gene expression profiling has become an effective method for
identifying differentially expressed genes in various diseases, thereby contributing
to the study of pathogenesis and the development of new biomarkers [17]. NGS-related algorithms are also being
successively updated. CIBERSORT is an analytical tool that uses microarray data or
RNA sequencing data to assess the expression of immune cells in a sample and obtain
different proportions of immune cells [18].
The ESTIMATE algorithm is used to describe the ratio of stromal cells to immune
cells in a tumor. This is an algorithm that can predict tumor purity and the ratio
of stromal cells to immune cells in tumor tissue [19].
The OC response to immunotherapy is limited. However, targeted
therapy-sensitive/resistant subgroup analysis based on tumor biomarker
stratification may help improve the predictive power of response to immunotherapy.
These biomarkers mainly include PD-1/PD-L1, TMB, ICI, etc. To find a
biomarker that predicts sensitivity to immunotherapy, we analyzed the gene
expression profiles of multiple tumor samples from the TCGA and GEO databases using
bioinformatics analysis combined with the CIBERSORT and ESTIMATE algorithms,
building a framework that can accurately predict OC patients’ prognosis and
response to immunotherapy.
Materials and Methods
Data collection and processing
The OC patients RNA dataset [fragments per kilobase million (FPKM)] and the
corresponding clinical data and SNP data were obtained from the TCGA database
(https://portal.gdc.cancer.gov), and GSE140082 microarray
sequencing data were obtained from the Gene Expression Omnibus (GEO) database
(https://www.ncbi.nlm.nih.gov/geo/). We
converted the TCGA data to a transcripts per kilobase million (TPM) matrix
because the expression profile (FPKM value) data of the TCGA OC dataset are
different from the microarray sequencing data [20]. The two expression datasets were combined for further analysis
using the limma and sav packages provided by BiocManager.
Consensus clustering of immune cell infiltration
CIBERSORT is an algorithm for deconvoluting the expression matrix of human immune
cell subtypes based on the principle of linear support vector regression. For
microarray expression matrices and sequencing expression matrices, deconvolution
analysis of unknown mixtures and expression matrices containing similar cell
types outperforms other methods. The algorithm provides a default set of gene
expression signatures for 22 immune cell subtypes [18]. ESTIMATE is an algorithm that can
estimate the stromal score and immune score of tumor samples from expression
data and can be used to estimate tumor purity. The CIBERSORT algorithm was used
to estimate the proportion of immune cells in samples from OC patients, and the
ESTIMATE algorithm was used to calculate stromal and immune scores. The
Consensus-ClusterPlus R package was used to perform unsupervised consensus
clustering analyzes based on the ICI pattern of each ovarian cancer sample [21].
Differentially expressed genes (DEGs) associated with the ICI
phenotype
This unsupervised consensus clustering algorithm was used to classify patients
into different ICI subgroups based on the content of immune cells in the OC
samples. The process was to extract a dataset with a certain sample size by
resampling and filter out the optimal K-value. The rationality principal
component analysis (PCA) algorithm was calculated the number of different
clusters [22]. The K-value is equal to the
number of groups. The selection principle of the optimal K-value is as follows:
(1) The cumulative distribution function (CDF) value is small, and the growth of
it is slow; (2) There are no small clusters or cross-clusters between different
subtypes; and (3) The intracluster correlation is high. Using this algorithm,
patients were divided into different ICI subgroups, and each subgroup
represented a different degree of tumor immune cell infiltration. A false
discovery rate<0.05 and absolute fold change<1 were taken as
criteria.
ICI score dimensionality reduction and generation
We performed unsupervised consensus clustering on all DEGs again to obtain the
optimal number of subgroups and then divided clusters according to this number.
The DEG values that were positively or negatively correlated with these clusters
were taken as ICI gene signatures A and B, and the dimensionality reduction of
ICI gene signatures A and B is (was) performed using the Boruta algorithm [23]. PCA extracts principal components as
feature scores, where PC1A represents the first component of signature A and
PC1B represents the first component of signature B. Finally, we used a method
similar to the Gene Expression Grading Index. The ICI score of each patient was
determined using the Boruta package according to Equation 1: ICI
score=ΣPC1A – ΣPC1B [24].
Analysis and processing of mutation data
Mutation data on OC patients were downloaded from the TCGA database
(https://portal.gdc.cancer.gov) to determine the mutational
burden of OCs. We counted the total number of asynchronous mutations in OC and
assessed somatic changes in OC driver genes to determine ICI scores. The driver
genes of OC were identified using the “maftool” software package
[25]. The top 20 driver genes with the
highest mutation frequency are displayed.
Statistical analysis of data
Statistical analysis in this study was performed using R version 4.2.0 software.
The Kruskal–Wallis test was used to compare the differences between more
than two groups, and the Wilcoxon test was used to compare the differences
between the two groups. Find the best cutoff value using X-tile software. The
rationale of it is to use different values as cutoffs for statistical tests. The
result with the smallest p-value can be considered the best cutoff value [24]. A Kaplan–Meier plotter was
used to generate subgroup survival curves for each dataset. The log-rank test
was used to assess significant differences between subgroups. Two-tailed
p<0.05 was considered statistically significant.
Results
Situation of immune cell infiltration in the OC TME
Combining 761 samples from two databases (array expression datasets: GSE140082
and TCGA) into one set, immune cell levels in OC tissues were quantified using
CIBERSORT and ESTIMATE algorithms. Information about the data set can be found
in [Table 1]. All metacohort samples were
found to be systematically clustered by CIBERSORT. Empirical cumulative
distribution function plots and delta area plots to represent the results of
consistent clustering, where K is the number of subgroups. When plotting these
two graphs to analyze the optimal K-value for sample distribution stability, we
chose K=2 as the optimal number of clusters, as shown in [Fig. 1a]. [Fig. 1b] shows the expression heatmap of
the correlation clustering results. The overall survival rates for the 2
independent ICI subtypes were significantly different (log-rank test,
p<0.05), and the results showed that patients from ICI cluster A had a
better prognosis than patients from ICI cluster B ([Fig. 1c]). We found that the expression of
PD-1 and PD-L1, two important immune checkpoints, was lower in ICI cluster B
than in A cluster ([Fig. 1d, e]). We
further defined the intrinsic biological differences leading to different
clinical subtypes by comparing the composition of immune cells in the TME, as
shown in [Fig. 1f] in ICI cluster A with
B: naive B cells, memory B cells, plasma cells, CD8 T cells, CD8 T cells, CD4
memory-activated T cells, follicular helper T cells, γδT cells,
M1 macrophages, M2 macrophages, and resting mast cells having the highest
levels. ICI cluster B had the highest levels of CD4 memory resting T cells,
resting NK cells, M0 macrophages, and activated mast cells. The immune score of
ICI cluster A was significantly higher than that of ICI cluster B
(p***), indicating a higher percentage of immune cells.
The results in [Fig. 1g] show the
correlation of 22 types of immune cells.
Fig. 1 Landscape of ICI in the TME of OC. a: Consensus
clustering was used to divide these samples into 2 ICI subgroups in
accordance with the content of immune cells in these samples. b:
Heatmap of the tumor-infiltrating immune cells in four independent OC
cohorts. Rows indicate tumor-infiltrating immune cells, and columns
indicate the OC samples. The color intensity value indicates the
fraction of different immune cells. c: Kaplan–Meier
survival analysis for overall survival (OS) of all OC patients from
three ICI clusters. PD-1 (d) and PDL-1 (e) expression
among different ICI clusters. f: The difference in the
tumor-infiltrating immune cell types between the different ICI clusters.
g: The proportion and correlation of tumor-infiltrating
immune cells in the three ICI clusters. *p<0.05.
**p<0.01.
***p<0.001.
****p<0.0001.
Table 1 Data sets.
Record
|
Platform
|
Country
|
Year
|
Number of normal samples
|
Number of OV samples
|
TCGA
|
IIIumina HiSeq
|
USA
|
2022
|
0
|
381
|
GSE140082
|
GPL14951 Illumina HumanHT-12 WG-DASL V4.0 R2 expression
beadchip
|
USA
|
2019
|
0
|
380
|
DEGs of immune gene subtypes
We also performed differential immunophenotyping analyses. The gene expression
levels of all samples were obtained by performing DEGs analysis, and after
correcting the p-value, unsupervised consistent clustering analysis was
performed according to DEGs, and obtained 3 gene clusters ([Fig. 2a]). Subsequently, survival curves
of different sample types and genotypes were drawn. We found that the patients
from the C gene cluster had the best prognosis, while the patients from the B
gene cluster had the worst prognosis. The prognosis of the 3 groups was
significantly different for each genotype (p***) ([Fig. 2b]). Analysis of differences between
immune cells by genotype is shown in [Fig.
2c]. Gene cluster A had higher levels of M2-type macrophages. Gene
cluster B has CD4 memory resting T cells, resting NK cells, M0 macrophages,
activated dendritic cells, and activated mast cells, and the C gene cluster had
a higher proportion of plasma cells, CD8+T cells, activated
CD4+memory T cells, γδT cells, M1 macrophages, and
resting dendritic cells. There was no significant difference in the effects of
the three gene clusters on naive B cells, memory B cells, naive CT4+T
cells, T helper follicular cells, Tregs, activated NK cells, monocytes, resting
mast cells and neutrophil learning differences. Furthermore, the C gene cluster
had the highest immune and stromal scores, while the B gene cluster had the
lowest immune and stromal scores. PD-1 and PD-L1 were highly expressed in the C
gene cluster, followed by the A gene cluster and the B gene cluster ([Fig. 2d, e]). The correlation between gene
expression and typing was shown in [Fig.
2f]. If the correlation was positive, it was divided into group A,
while if the correlation was negative, it was divided into group B.
Fig. 2 An Immunophenotyping differential analysis. a: After
differential analysis of all genes, the expression of a gene in all
samples and the corrected p value are obtained, and then the samples are
typed according to the differential genes, which are divided into 3
types here. b: Kaplan–Meier curves for the OS of the
three groups of patients. c: Analysis of genetic types of immune
cell differences. The horizontal coordinates are the names of immune
cells. The vertical coordinate is the content of immune cells. d,
e: Differences in PD1 and PD-L1 expression among different
ICI gene clusters. f: The genetic typical hot diagram, the
horizontal coordinates are samples, the vertical coordinates are gene
expression, the genetic expression and the classification are positively
correlated and divided into group A, and negative correlations are
divided into group B.
Correlation between ICI scores and immune checkpoint signals
Genomes samples A and B were evaluated separately using the PCA algorithm. The
final score of each sample was obtained from the score of group A minus the
score of group B (Supplementary
Table 1S). Patients were divided into high and low scoring groups based
on the selected optimal cutoff. The distribution of clinical information related
the OC among the 3 gene clusters is shown in [Fig. 3a]. CD274, CTLA4, PDCD1, HAVCR2, IDO1, and LAG3 are immune
checkpoint-related signals. By selecting CD8A, CXCL10, CXCL9, GZMA, GZMB, IFNG,
PRF1, TBX2, and TNF as signals associated with immune activity, we compared the
expression of these genes in the low and high ICI score groups ([Fig. 3b]) to assess immune activity and
tolerance. The results showed that all genes except TBX2 were significantly
overexpressed in patients with high ICI scores (p***).
In addition, gene set enrichment analysis revealed that the TOLL-like receptor
pathway, B-cell receptor pathway, antigen processing and presentation pathway,
NK-cell-mediated cytotoxicity pathway, and arginine-proline metabolism pathway
were significantly associated with high ICI scores. All pathways were not
significantly enriched in the low-scoring group ([Fig. 3c]). In the metacohort, patients
with high ICI scores had a better prognosis than patients with low ICI scores
([Fig. 3d]). This result was
statistically different in the survival analysis from GSE140082 ([Fig. 3e]) and TCGA database ([Fig. 3f]).
Fig. 3
a: Alluvial diagram of ICI gene cluster distribution in groups
with different ICI clusters. ICI scores, and survival outcomes.
b: The difference between the immune checkpoint-relevant genes
(CD274, CTLA4, PDCD1, HAVCR2, IDO1, LAG3) and immune activation-relevant
genes (CD8A, CXCL10, CXCL9, GZMA, GZMB, IFNG, PRF1, TBX2, TNF) expressed
in the high and low ICI score subgroups. c: GSEA results of high
and low ICI score subgroups. d: Kaplan–Meier curves for
high and low ICI score groups in the meta-cohort. e and f
represent Kaplan–Meier curves of groups with high and low ICI
scores in GSE140082 and TCGA metacohort, respectively.
ICI scores combined with tumor mutation burden (TMB) was used as a predictor
of prognosis in OC patients
TMB may predict the outcome of immunotherapy in patients with advanced cancer.
The results of Spearman correlation analysis showed that there was no
significant difference between the ICI score and TMB correlation analysis
(Spearman coefficient: R=0.043, p=0.5, [Fig. 4a]). We divided the patients into a
high TMB group and a low TMB group according to the best cutoff value and
analyzed the survival of patients in the high TMB group and the low TMB group.
However, we found that the TMB level was significantly associated with prognosis
(p***) ([Fig.
4b]). The results of TMB subgroup analysis showed that patients with high
ICI scores had a better prognosis than patients with high TMB, while patients
with low TMB and ICI scores had a poor prognosis ([Fig. 4c]). [Fig. 4d, e] showed that we further
analyzed the distribution of somatic variants in OC driver genes between
subgroups with high and low ICI score subgroups. The top 20 driver genes with
the highest mutation rates are displayed. Aside from that, we also found that
the expression of USH2A was significantly different between the high and low ICI
score groups (p<0.05). These results suggest that the ICI score can
serve as an effective predictor and can be combined with the TMB to provide a
new approach to study tumor ICI composition and the mechanisms of actions of
gene mutations in cancer immunotherapy.
Fig. 4
a: Scatter plot used to describe the correlation between the ICI
score and mutation load in the meta-cohort. b:
Kaplan–Meier curves for the high and low TMB groups of the
meta-cohort. c: Kaplan–Meier curves for patients in the
meta-cohort stratified by both TMB and ICI scores. d: Heatmap of
somatic variants for high ICI scores. e: Heatmap of somatic
variants for low ICI scores.
Correlation analysis between ICI scores and survival status of
patients
A correlation analysis between ICI scores and clinical outcome was performed. In
the TCGA and GSE140082 cohorts, a higher proportion of patients with high ICI
scores survived than those with low ICI scores ([Fig 5a]). Furthermore, patients who
survived had higher ICI scores ([Fig.
5b]). Survival analysis of clinical groups showed that the ICI scores was
significantly correlated with the prognosis of patients with OC grades
3–4, indicating that the ICI score was suitable for patients with high
grade ([Fig. 5c]).
Fig. 5 Correlation analysis between ICI scores and survival status
of patients. a: The percent weight of the ICI scores and the
proportion of survival/death of patients. b: The correlation
analysis between the ICI scores and patients’ Fustat. c:
Kaplan–Meier curves for G3-G4 OC patients with high and low ICI
scores.
Discussion
The mechanism of OC and immunity has received considerable attention [7]
[8]
[9]
[26]. It has important clinical value in
immunotherapy because the pathogenesis of OC is very complicated and involves a
variety of factors, such as gene mutations, endocrine changes, and microbial
infections [27]
[28]
[29].
Previous studies have established a transcription-factor-related prognostic model
for OC by consistent cluster analysis and found significant differences between the
risk score and the responsiveness of OC patients to immune checkpoint inhibitor
therapy [30]. Therefore, a better elucidation
of the interaction between tumor cells and immune cells and the molecular mechanisms
may provide new targets for immunotherapy in ovarian cancer patients. We merged the
OC dataset in the TCGA and GEO databases and estimated the immune cells content of
each sample using the CIBERSORT algorithm. By unsupervised consistent cluster
analysis. Two ICI subgroups were obtained, and the subgroup was analyzed for
differences in gene expression, obtaining 3 ICI gene clusters. Finally, the ICI
score was determined by genetic symbol analysis using through PCA. The group with
high ICI score had a good prognosis, which may be related to the TOLL-like body
pathway, B-cell receptor pathway, antigening and presence pathway, NK-cell-mediated
cytotoxic pathway, and arginine-proline metabolic pathway. In addition, patients
with high ICI score are more sensitive to immunotherapy, and patients with high ICI
scores and high TMB have a better prognosis. This means that the higher the immune
score, the better the future. After analyzing the DEGs of different ICI subgroups,
it was found that different ICI gene clusters were also associated with patient
prognosis. In conclusion, ICI score was related to the patient’s prognosis
and the sensitivity of immunotherapy according to scoring the ICI and grouping. Our
studies showed that 14 immune checkpoints and immune activation-related genes had
high expression among OC patients with high ICI scores.
The ICI mode requires individualized treatment due to different individual immune
environments. Tumor subtype-specific biomarkers have been uesd to predict the
prognosis of malignancies such as breast cancer, bladder cancer, and pancreatic
cancer [31]
[32]
[33]. The GSEA results showed
that immune-related signaling pathways were activated in the high ICI ratings score
group, but not in the low ICI scores group. For patients with higher grade, the
prognosis of high ICI rating score group patients were significantly better than
that of the low ICI score group patients. Activation of these pathways is associated
with the treatment of OC.
The TMB describes the number of nonsynonymous mutations in tumor samples and
represents the degree of the unstable genome and the possibility of new surface
positions on the cell surface. Under normal circumstances, TMB is divided into two
types, high and low, of which TMB is defined as Megabase [34]. High TMB phenotypes represent a large
number of mutant proteins expressed on the cell surface in the form of new surfaces
[35]. In physical tumor studies, TMB high
surface types can predict the response to ICI related immune treatment [36]. However, OC has always been considered a
“cold tumor” with low TMB. Numerous studies have confirmed that the
sensitivity of OC to immunotherapy has no significant association with TMB. TMB
cannot be used as an independent indicator to predict the sensitivity of OC
treatment [34]. However, some studies have
confirmed that TMB can be used in combination with PDL-1 expression level to predict
immunotherapy sensitivity of OC patients. High expression of PD-1/PD-L1 can
predict the response to immunotherapy independently of TMB status, but the
combination of the two biomarkers is better than either one alone [36]. The results of our study indicated that
although the expression level of TMB does not have a significant relationship with
the prognosis of patients, the prognosis in high ICI score groups in patients with
high TMB patients is better than the lower TMB group. It is concluded that ICI score
and TMB might play a role in different aspects of OC immunotherapy, and ICI score
can be used as a feasible auxiliary indicator to predict the prognosis of TMB
patients with different states.
These studies are limited to research data coming from the database, and there are no
experimental data to support the results. Therefore, while we need to further expand
the research samples, combine laboratory results to confirm the results of our
study, and further investigate the significance of the ICI score for clinical
patients undergoing immunotherapy, we need to conduct corresponding studies on the
intrinsic regulatory mechanism.