TY - JOUR
T1 - Distinct transcriptional programs stratify ovarian cancer cell lines into the five major histological subtypes
AU - Barnes, Bethany
AU - Nelson, Louisa
AU - Tighe, Anthony
AU - Burghel, George
AU - Lin, I-Hsuan
AU - Desai, Sudha
AU - Mcgrail, Joanne
AU - Morgan, Robert D.
AU - Taylor, Stephen
N1 - Funding Information:
The research was funded by a Cancer Research UK Programme Grant to S.S.T (C1422/A19842) with additional support from the Clinical Training Programme funded by the Cancer Research UK Manchester Centre award [C147/A25254]. Additional RNAseq was funded by the NIHR Manchester Biomedical Research Centre Precision Medicine Theme Pump Priming Project (R120700/CAA070107).
Funding Information:
We thank the patients for their commitment to research, the MCRC Biobank for the sample collection, the members of the Taylor lab for advice and comments on the manuscript, the Genomic Technologies Core Facility and the Bioinformatics Core Facility at The University of Manchester for RNAseq, and the CRUK MI Histology Facility.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Background: Epithelial ovarian cancer (OC) is a heterogenous disease consisting of five major histologically distinct subtypes: high-grade serous (HGSOC), low-grade serous (LGSOC), endometrioid (ENOC), clear cell (CCOC) and mucinous (MOC). Although HGSOC is the most prevalent subtype, representing 70–80% of cases, a 2013 landmark study by Domcke et al. found that the most frequently used OC cell lines are not molecularly representative of this subtype. This raises the question, if not HGSOC, from which subtype do these cell lines derive? Indeed, non-HGSOC subtypes often respond poorly to chemotherapy; therefore, representative models are imperative for developing new targeted therapeutics. Methods: Non-negative matrix factorisation (NMF) was applied to transcriptomic data from 44 OC cell lines in the Cancer Cell Line Encyclopedia, assessing the quality of clustering into 2–10 groups. Epithelial OC subtypes were assigned to cell lines optimally clustered into five transcriptionally distinct classes, confirmed by integration with subtype-specific mutations. A transcriptional subtype classifier was then developed by trialling three machine learning algorithms using subtype-specific metagenes defined by NMF. The ability of classifiers to predict subtype was tested using RNA sequencing of a living biobank of patient-derived OC models. Results: Application of NMF optimally clustered the 44 cell lines into five transcriptionally distinct groups. Close inspection of orthogonal datasets revealed this five-cluster delineation corresponds to the five major OC subtypes. This NMF-based classification validates the Domcke et al. analysis, in identifying lines most representative of HGSOC, and additionally identifies models representing the four other subtypes. However, NMF of the cell lines into two clusters did not align with the dualistic model of OC and suggests this classification is an oversimplification. Subtype designation of patient-derived models by a random forest transcriptional classifier aligned with prior diagnosis in 76% of unambiguous cases. In cases where there was disagreement, this often indicated potential alternative diagnosis, supported by a review of histological, molecular and clinical features. Conclusions: This robust classification informs the selection of the most appropriate models for all five histotypes. Following further refinement on larger training cohorts, the transcriptional classification may represent a useful tool to support the classification of new model systems of OC subtypes.
AB - Background: Epithelial ovarian cancer (OC) is a heterogenous disease consisting of five major histologically distinct subtypes: high-grade serous (HGSOC), low-grade serous (LGSOC), endometrioid (ENOC), clear cell (CCOC) and mucinous (MOC). Although HGSOC is the most prevalent subtype, representing 70–80% of cases, a 2013 landmark study by Domcke et al. found that the most frequently used OC cell lines are not molecularly representative of this subtype. This raises the question, if not HGSOC, from which subtype do these cell lines derive? Indeed, non-HGSOC subtypes often respond poorly to chemotherapy; therefore, representative models are imperative for developing new targeted therapeutics. Methods: Non-negative matrix factorisation (NMF) was applied to transcriptomic data from 44 OC cell lines in the Cancer Cell Line Encyclopedia, assessing the quality of clustering into 2–10 groups. Epithelial OC subtypes were assigned to cell lines optimally clustered into five transcriptionally distinct classes, confirmed by integration with subtype-specific mutations. A transcriptional subtype classifier was then developed by trialling three machine learning algorithms using subtype-specific metagenes defined by NMF. The ability of classifiers to predict subtype was tested using RNA sequencing of a living biobank of patient-derived OC models. Results: Application of NMF optimally clustered the 44 cell lines into five transcriptionally distinct groups. Close inspection of orthogonal datasets revealed this five-cluster delineation corresponds to the five major OC subtypes. This NMF-based classification validates the Domcke et al. analysis, in identifying lines most representative of HGSOC, and additionally identifies models representing the four other subtypes. However, NMF of the cell lines into two clusters did not align with the dualistic model of OC and suggests this classification is an oversimplification. Subtype designation of patient-derived models by a random forest transcriptional classifier aligned with prior diagnosis in 76% of unambiguous cases. In cases where there was disagreement, this often indicated potential alternative diagnosis, supported by a review of histological, molecular and clinical features. Conclusions: This robust classification informs the selection of the most appropriate models for all five histotypes. Following further refinement on larger training cohorts, the transcriptional classification may represent a useful tool to support the classification of new model systems of OC subtypes.
KW - Ovarian cancer
KW - Non-negative matrix factorization
KW - RNA sequencing
KW - subtype classification
KW - machine-learning
KW - transcriptomics
UR - https://pubmed.ncbi.nlm.nih.gov/34470661/
U2 - 10.1186/s13073-021-00952-5
DO - 10.1186/s13073-021-00952-5
M3 - Article
SN - 1756-994X
VL - 13
JO - Genome Medicine
JF - Genome Medicine
IS - 1
M1 - 140
ER -