1
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
2
|
Chen Y, Ma S, Lin C, Zhu Z, Bai J, Yin Z, Sun Y, Mao F, Xue L, Ma S. Integrative analysis of DNA methylomes reveals novel cell-free biomarkers in lung adenocarcinoma. Front Genet 2023; 14:1175784. [PMID: 37396036 PMCID: PMC10311559 DOI: 10.3389/fgene.2023.1175784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 06/07/2023] [Indexed: 07/04/2023] Open
Abstract
Lung cancer is a leading cause of cancer-related deaths worldwide, with a low 5-year survival rate due in part to a lack of clinically useful biomarkers. Recent studies have identified DNA methylation changes as potential cancer biomarkers. The present study identified cancer-specific CpG methylation changes by comparing genome-wide methylation data of cfDNA from lung adenocarcinomas (LUAD) patients and healthy donors in the discovery cohort. A total of 725 cell-free CpGs associated with LUAD risk were identified. Then XGBoost algorithm was performed to identify seven CpGs associated with LUAD risk. In the training phase, the 7-CpGs methylation panel was established to classify two different prognostic subgroups and showed a significant association with overall survival (OS) in LUAD patients. We found that the methylation of cg02261780 was negatively correlated with the expression of its representing gene GNA11. The methylation and expression of GNA11 were significantly associated with LAUD prognosis. Based on bisulfite PCR, the methylation levels of five CpGs (cg02261780, cg09595050, cg20193802, cg15309457, and cg05726109) were further validated in tumor tissues and matched non-malignant tissues from 20 LUAD patients. Finally, validation of the seven CpGs with RRBS data of cfDNA methylation was conducted and further proved the reliability of the 7-CpGs methylation panel. In conclusion, our study identified seven novel methylation markers from cfDNA methylation data which may contribute to better prognosis for LUAD patients.
Collapse
Affiliation(s)
- Yifan Chen
- Department of Thoracic Surgery, Peking University Third Hospital, Beijing, China
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
- Biobank, Peking University Third Hospital, Beijing, China
| | - Shanwu Ma
- Department of Thoracic Surgery, Peking University Third Hospital, Beijing, China
| | - Chutong Lin
- Department of Thoracic Surgery, Peking University Third Hospital, Beijing, China
| | - Zhipeng Zhu
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
| | - Jie Bai
- Department of Thoracic Surgery, Peking University Third Hospital, Beijing, China
| | - Zhongnan Yin
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
- Biobank, Peking University Third Hospital, Beijing, China
| | - Yan Sun
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
- Biobank, Peking University Third Hospital, Beijing, China
| | - Fengbiao Mao
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
| | - Lixiang Xue
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing, China
- Cancer Center of Peking University Third Hospital, Peking University Third Hospital, Beijing, China
- Biobank, Peking University Third Hospital, Beijing, China
| | - Shaohua Ma
- Beijing Cancer Hospital and Institute, Peking University School of Oncology, Beijing, China
| |
Collapse
|
3
|
Wang T, Liu L, Fan T, Xia K, Sun Z. Shared and divergent contribution of vitamin A and oxytocin to the aetiology of autism spectrum disorder. Comput Struct Biotechnol J 2023; 21:3109-3123. [PMID: 38213898 PMCID: PMC10782014 DOI: 10.1016/j.csbj.2023.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 05/15/2023] [Accepted: 05/15/2023] [Indexed: 01/13/2024] Open
Abstract
Rare genetic variations contribute to the heterogeneity of autism spectrum disorder (ASD) and the responses to various interventions for ASD probands. However, the associated molecular underpinnings remain unclear. Herein, we estimated the association between rare genetic variations in 410 vitamin A (VA)-related genes (VARGs) and ASD aetiology using publicly available de novo mutations (DNMs), rare inherited variants, and copy number variations (CNVs) from about 50,000 ASD probands and 20,000 normal controls (discovery and validation cohorts). Additionally, given the functional relevance of VA and oxytocin, we systematically compared the similarities and differences between VA and oxytocin with respect to ASD aetiology and evaluated their potential for clinical applications. Functional DNMs and pathogenic CNVs in VARGs contributed to ASD pathogenesis in the discovery and validation cohorts. Additionally, 324 potential VA-related biomarkers were identified, 243 of which were shared with previously identified oxytocin-related biomarkers, while 81 were unique VA biomarkers. Moreover, multivariable logistic regression analysis revealed that both VA- and oxytocin-related biomarkers were able to predict ASD aetiology for individuals carrying functional DNM in corresponding biomarkers with an average precision of 0.94. As well as, convergent and divergent functions were also identified between VA- and oxytocin-related biomarkers. The findings of this study provide a basis for future studies aimed at understanding the pathophysiological mechanisms underlying ASD while also defining a set of potential molecular biomarkers for adjuvant diagnosis and intervention in ASD.
Collapse
Affiliation(s)
- Tao Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Kaifu District, Changsha, Hunan 410078, China
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Liqiu Liu
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Tianda Fan
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Kaifu District, Changsha, Hunan 410078, China
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Kaifu District, Changsha, Hunan 410078, China
- CAS Center for Excellence in Brain Science and Intelligences Technology (CEBSIT), Shanghai 200031, China
- Hengyang Medical School, University of South China, Hengyang, Hunan 410078, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
4
|
Wang L, Sun J, Ma S, Xia J, Li X. PredDSMC: A predictor for driver synonymous mutations in human cancers. Front Genet 2023; 14:1164593. [PMID: 37051593 PMCID: PMC10083435 DOI: 10.3389/fgene.2023.1164593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Introduction: Driver mutations play a critical role in the occurrence and development of human cancers. Most studies have focused on missense mutations that function as drivers in cancer. However, accumulating experimental evidence indicates that synonymous mutations can also act as driver mutations.Methods: Here, we proposed a computational method called PredDSMC to accurately predict driver synonymous mutations in human cancers. We first systematically explored four categories of multimodal features, including sequence features, splicing features, conservation scores, and functional scores. Further feature selection was carried out to remove redundant features and improve the model performance. Finally, we utilized the random forest classifier to build PredDSMC.Results: The results of two independent test sets indicated that PredDSMC outperformed the state-of-the-art methods in differentiating driver synonymous mutations from passenger mutations.Discussion: In conclusion, we expect that PredDSMC, as a driver synonymous mutation prediction method, will be a valuable method for gaining a deeper understanding of synonymous mutations in human cancers.
Collapse
|
5
|
Ren Z, Li Q, Cao K, Li MM, Zhou Y, Wang K. Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics 2023; 24:43. [PMID: 36759776 PMCID: PMC9909865 DOI: 10.1186/s12859-023-05141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
Collapse
Affiliation(s)
- Zilin Ren
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Quan Li
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Kajia Cao
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Marilyn M Li
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
6
|
Wang H, Sun J, Liu M, Zheng CH, Xia J, Cheng N. frDSM: An Ensemble Predictor With Effective Feature Representation for Deleterious Synonymous Mutation in Human Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:371-377. [PMID: 35420988 DOI: 10.1109/tcbb.2022.3167468] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
With the discovery of causality between synonymous mutations and diseases, it has become increasingly important to identify deleterious synonymous mutations for better understanding of their functional mechanisms. Although several machine learning methods have been proposed to solve the task, an effective feature representation method that can make use of the inner difference and relevance between deleterious and benign synonymous mutations is still challenging considering the vast number of synonymous mutations in human genome. In this work, we developed a robust and accurate predictor called frDSM for deleterious synonymous mutation prediction using logistic regression. More specifically, we introduced an effective feature representation learning method which exploits multiple feature descriptors from different perspectives including functional scores obtained from previously computational methods, evolutionary conservation, splicing and sequence feature descriptors, and these features descriptors were input into the 76 XGBoost classifiers to obtain the predictive probabilities values. These probabilities were concatenated to generate the 76-dimension new feature vector, and feature selection method was used to remove redundant and irrelevant features. Experimental results show that frDSM enables robust and accurate prediction than the competing prediction methods with 31 optimal features, which demonstrated the effectiveness of the feature representation learning method. frDSM is freely available at http://frdsm.xialab.info.
Collapse
|
7
|
Nussinov R, Tsai CJ, Jang H. A New View of Activating Mutations in Cancer. Cancer Res 2022; 82:4114-4123. [PMID: 36069825 PMCID: PMC9664134 DOI: 10.1158/0008-5472.can-22-2125] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/16/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022]
Abstract
A vast effort has been invested in the identification of driver mutations of cancer. However, recent studies and observations call into question whether the activating mutations or the signal strength are the major determinant of tumor development. The data argue that signal strength determines cell fate, not the mutation that initiated it. In addition to activating mutations, factors that can impact signaling strength include (i) homeostatic mechanisms that can block or enhance the signal, (ii) the types and locations of additional mutations, and (iii) the expression levels of specific isoforms of genes and regulators of proteins in the pathway. Because signal levels are largely decided by chromatin structure, they vary across cell types, states, and time windows. A strong activating mutation can be restricted by low expression, whereas a weaker mutation can be strengthened by high expression. Strong signals can be associated with cell proliferation, but too strong a signal may result in oncogene-induced senescence. Beyond cancer, moderate signal strength in embryonic neural cells may be associated with neurodevelopmental disorders, and moderate signals in aging may be associated with neurodegenerative diseases, like Alzheimer's disease. The challenge for improving patient outcomes therefore lies in determining signaling thresholds and predicting signal strength.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
| |
Collapse
|
8
|
Fan J, Shi L, Liu Q, Zhu Z, Wang F, Song R, Su J, Zhou D, Chen X, Li K, Xue L, Sun L, Mao F. Annotation and evaluation of base editing outcomes in multiple cell types using CRISPRbase. Nucleic Acids Res 2022; 51:D1249-D1256. [PMID: 36350608 PMCID: PMC9825451 DOI: 10.1093/nar/gkac967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/09/2022] [Accepted: 10/13/2022] [Indexed: 11/10/2022] Open
Abstract
CRISPR-Cas base editing (BE) system is a powerful tool to expand the scope and efficiency of genome editing with single-nucleotide resolution. The editing efficiency, product purity, and off-target effect differ among various BE systems. Herein, we developed CRISPRbase (http://crisprbase.maolab.org), by integrating 1 252 935 records of base editing outcomes in more than 50 cell types from 17 species. CRISPRbase helps to evaluate the putative editing precision of different BE systems by integrating multiple annotations, functional predictions and a blasting system for single-guide RNA sequences. We systematically assessed the editing window, editing efficiency and product purity of various BE systems. Intensive efforts were focused on increasing the editing efficiency and product purity of base editors since the byproduct could be detrimental in certain applications. Remarkably, more than half of cancer-related off-target mutations were non-synonymous and extremely damaging to protein functions in most common tumor types. Luckily, most of these cancer-related mutations were passenger mutations (4840/5703, 84.87%) rather than cancer driver mutations (863/5703, 15.13%), indicating a weak effect of off-target mutations on carcinogenesis. In summary, CRISPRbase is a powerful and convenient tool to study the outcomes of different base editors and help researchers choose appropriate BE designs for functional studies.
Collapse
Affiliation(s)
| | | | | | - Zhipeng Zhu
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100191, China,Cancer Center, Peking University Third Hospital, Beijing 100191, China
| | - Fan Wang
- College of Animal Science and Technology, Yangzhou University, Yangzhou, Jiangsu Province 225009, China
| | - Runxian Song
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China,State Key Laboratory of Tree Genetics and Breeding, Forestry College, Northeast Forestry University, Harbin 150040, China
| | - Jimeng Su
- College of Animal Science and Technology, Yangzhou University, Yangzhou, Jiangsu Province 225009, China
| | - Degui Zhou
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China,Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China,Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Xiao Chen
- Laboratory of Marine Protozoan Biodiversity & Evolution, Marine College, Shandong University, Weihai 264209, China
| | - Kailong Li
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Lixiang Xue
- Correspondence may also be addressed to Lixiang Xue.
| | - Lichao Sun
- Correspondence may also be addressed to Lichao Sun.
| | - Fengbiao Mao
- To whom correspondence should be addressed. Tel: +86 10 87132318;
| |
Collapse
|
9
|
Abstract
![]()
AlphaFold has burst into our lives. A powerful algorithm
that underscores
the strength of biological sequence data and artificial intelligence
(AI). AlphaFold has appended projects and research directions. The
database it has been creating promises an untold number of applications
with vast potential impacts that are still difficult to surmise. AI
approaches can revolutionize personalized treatments and usher in
better-informed clinical trials. They promise to make giant leaps
toward reshaping and revamping drug discovery strategies, selecting
and prioritizing combinations of drug targets. Here, we briefly overview
AI in structural biology, including in molecular dynamics simulations
and prediction of microbiota–human protein–protein interactions.
We highlight the advancements accomplished by the deep-learning-powered
AlphaFold in protein structure prediction and their powerful impact
on the life sciences. At the same time, AlphaFold does not resolve
the decades-long protein folding challenge, nor does it identify the
folding pathways. The models that AlphaFold provides do not capture
conformational mechanisms like frustration and allostery, which are
rooted in ensembles, and controlled by their dynamic distributions.
Allostery and signaling are properties of populations. AlphaFold also
does not generate ensembles of intrinsically disordered proteins and
regions, instead describing them by their low structural probabilities.
Since AlphaFold generates single ranked structures, rather than conformational
ensembles, it cannot elucidate the mechanisms of allosteric activating
driver hotspot mutations nor of allosteric drug resistance. However,
by capturing key features, deep learning techniques can use the single
predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
10
|
Integrative analysis prioritised oxytocin-related biomarkers associated with the aetiology of autism spectrum disorder. EBioMedicine 2022; 81:104091. [PMID: 35665681 PMCID: PMC9301877 DOI: 10.1016/j.ebiom.2022.104091] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 05/17/2022] [Accepted: 05/17/2022] [Indexed: 12/26/2022] Open
Abstract
Background Autism spectrum disorder (ASD) is a neurodevelopmental disorder with high phenotypic and genetic heterogeneity. The common variants of specific oxytocin-related genes (OTRGs), particularly OXTR, are associated with the aetiology of ASD. The contribution of rare genetic variations in OTRGs to ASD aetiology remains unclear. Methods We catalogued publicly available de novo mutations (DNMs) [from 6,511 patients with ASD and 3,391 controls], rare inherited variants (RIVs) [from 1,786 patients with ASD and 1,786 controls], and both de novo copy number variations (dnCNVs) and inherited CNVs (ihCNVs) [from 15,581 patients with ASD and 6,017 controls] in 963 curated OTRGs to explore their contribution to ASD pathology, respectively. Finally, a combined model was designed to prioritise the contribution of each gene to ASD aetiology by integrating DNMs and CNVs. Findings The rare genetic variations of OTRGs were significantly associated with ASD aetiology, in the order of dnCNVs > ihCNVs > DNMs. Furthermore, 172 OTRGs and their connected 286 ASD core genes were prioritised to positively contribute to ASD aetiology, including top-ranked MAPK3. Probands carrying rare disruptive variations in these genes were estimated to account for 10∼11% of all ASD probands. Interpretation Our findings suggest that rare disruptive variations in 172 OTRGs and their connected 286 ASD core genes are associated with ASD aetiology and may be potential biomarkers predicting the effects of oxytocin treatment. Funding Guangdong Key Project, National Natural Science Foundation of China, Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province.
Collapse
|
11
|
Genetic association and single-cell transcriptome analyses reveal distinct features connecting autoimmunity with cancers. iScience 2022; 25:104631. [PMID: 35800769 PMCID: PMC9254016 DOI: 10.1016/j.isci.2022.104631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 05/08/2022] [Accepted: 06/13/2022] [Indexed: 11/20/2022] Open
Abstract
Autoimmune diseases (ADs) are at a significantly higher risk of cancers with unclear mechanism. By searching GWAS catalog database and Medline, susceptible genes for five common ADs, including systemic lupus erythematosus (SLE), rheumatoid arthritis, Sjögren syndrome, systemic sclerosis, and idiopathic inflammatory myopathies, were collected and then were overlapped with cancer driver genes. Single-cell transcriptome analysis was performed in the comparation between SLE and related cancer. We identified 45 carcinogenic autoimmune disease risk (CAD) genes, which were mainly enriched in T cell signaling pathway and B cell signaling pathway. Integrated single-cell analysis revealed immune cell signaling was significantly downregulated in renal cancer compared with SLE, while stemness signature was significantly enriched in both renal cancer or lymphoma and SLE in specific subpopulations. Drugs targeting CAD genes were shared between ADs and cancer. Our study highlights the common and specific features between ADs and related cancers, and sheds light on a new discovery of treatments.
Collapse
|
12
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
13
|
Shi X, Teng H, Shi L, Bi W, Wei W, Mao F, Sun Z. Comprehensive evaluation of computational methods for predicting cancer driver genes. Brief Bioinform 2022; 23:6509048. [PMID: 35037014 PMCID: PMC8921613 DOI: 10.1093/bib/bbab548] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 11/19/2021] [Accepted: 11/29/2021] [Indexed: 12/17/2022] Open
Abstract
Optimal methods could effectively improve the accuracy of predicting and identifying candidate driver genes. Various computational methods based on mutational frequency, network and function approaches have been developed to identify mutation driver genes in cancer genomes. However, a comprehensive evaluation of the performance levels of network-, function- and frequency-based methods is lacking. In the present study, we assessed and compared eight performance criteria for eight network-based, one function-based and three frequency-based algorithms using eight benchmark datasets. Under different conditions, the performance of approaches varied in terms of network, measurement and sample size. The frequency-based driverMAPS and network-based HotNet2 methods showed the best overall performance. Network-based algorithms using protein–protein interaction networks outperformed the function- and the frequency-based approaches. Precision, F1 score and Matthews correlation coefficient were low for most approaches. Thus, most of these algorithms require stringent cutoffs to correctly distinguish driver and non-driver genes. We constructed a website named Cancer Driver Catalog (http://159.226.67.237/sun/cancer_driver/), wherein we integrated the gene scores predicted by the foregoing software programs. This resource provides valuable guidance for cancer researchers and clinical oncologists prioritizing cancer driver gene candidates by using an optimal tool.
Collapse
Affiliation(s)
- Xiaohui Shi
- Beijing Institutes of Life Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100080, China
| | - Huajing Teng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education) at Peking University Cancer Hospital and Institute, Beijing 100080, China
| | - Leisheng Shi
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100080, China
| | - Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing 100080, China
| | - Wenqing Wei
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100080, China
| | - Fengbiao Mao
- Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100080, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, CAS Center for Excellence in Biotic Interactions and State Key Laboratory of Integrated Management of Pest Insects and Rodents, University of Chinese Academy of Sciences, Institute of Genomic Medicine, Wenzhou Medical University, IBMC-BGI Center, the Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Beijing 100080, China
| |
Collapse
|
14
|
Wang T, Ruan S, Zhao X, Shi X, Teng H, Zhong J, You M, Xia K, Sun Z, Mao F. OncoVar: an integrated database and analysis platform for oncogenic driver variants in cancers. Nucleic Acids Res 2021; 49:D1289-D1301. [PMID: 33179738 PMCID: PMC7778899 DOI: 10.1093/nar/gkaa1033] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/15/2020] [Accepted: 10/19/2020] [Indexed: 12/13/2022] Open
Abstract
The prevalence of neutral mutations in cancer cell population impedes the distinguishing of cancer-causing driver mutations from passenger mutations. To systematically prioritize the oncogenic ability of somatic mutations and cancer genes, we constructed a useful platform, OncoVar (https://oncovar.org/), which employed published bioinformatics algorithms and incorporated known driver events to identify driver mutations and driver genes. We identified 20 162 cancer driver mutations, 814 driver genes and 2360 pathogenic pathways with high-confidence by reanalyzing 10 769 exomes from 33 cancer types in The Cancer Genome Atlas (TCGA) and 1942 genomes from 18 cancer types in International Cancer Genome Consortium (ICGC). OncoVar provides four points of view, 'Mutation', 'Gene', 'Pathway' and 'Cancer', to help researchers to visualize the relationships between cancers and driver variants. Importantly, identification of actionable driver alterations provides promising druggable targets and repurposing opportunities of combinational therapies. OncoVar provides a user-friendly interface for browsing, searching and downloading somatic driver mutations, driver genes and pathogenic pathways in various cancer types. This platform will facilitate the identification of cancer drivers across individual cancer cohorts and helps to rank mutations or genes for better decision-making among clinical oncologists, cancer researchers and the broad scientific community interested in cancer precision medicine.
Collapse
Affiliation(s)
- Tao Wang
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410083, China
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Shasha Ruan
- Department of Clinical Oncology, Renmin Hospital of Wuhan University, Wuhan, Hubei 430072, China
| | - Xiaolu Zhao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China
| | - Xiaohui Shi
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Huajing Teng
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Jianing Zhong
- Key Laboratory of Prevention and Treatment of Cardiovascular and Cerebrovascular Diseases of Ministry of Education, Gannan Medical University, Ganzhou 341000, China
| | | | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410083, China
- CAS Center for Excellence in Brain Science and Intelligences Technology (CEBSIT), Shanghai 200031, China
- School of Basic Medical Science, Central South University, Changsha, Hunan 410078, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
- CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Chinese Academy of Sciences, Beijing 100101, China
| | - Fengbiao Mao
- Center of Basic Medical Research, Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100191, China
| |
Collapse
|
15
|
Li G, Ruan S, Zhao X, Liu Q, Dou Y, Mao F. Transcriptomic signatures and repurposing drugs for COVID-19 patients: findings of bioinformatics analyses. Comput Struct Biotechnol J 2020; 19:1-15. [PMID: 33312453 PMCID: PMC7719282 DOI: 10.1016/j.csbj.2020.11.056] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 11/24/2020] [Accepted: 11/28/2020] [Indexed: 12/16/2022] Open
Abstract
The novel coronavirus SARS-CoV-2 is damaging the world's social and economic fabrics seriously. Effective drugs are urgently needed to decrease the high mortality rate of COVID-19 patients. Unfortunately, effective antiviral drugs or vaccines are currently unavailable. Herein, we systematically evaluated the effect of SARS-CoV-2 on gene expression of both lung tissue and blood from COVID-19 patients using transcriptome profiling. Differential gene expression analysis revealed potential core mechanism of COVID-19-induced pneumonia in which IFN-α, IFN-β, IFN-γ, TNF and IL6 triggered cytokine storm mediated by neutrophil, macrophage, B and DC cells. Weighted gene correlation network analysis identified two gene modules that are highly correlated with clinical traits of COVID-19 patients, and confirmed that over-activation of immune system-mediated cytokine release syndrome is the underlying pathogenic mechanism for acute phase of COVID-19 infection. It suggested that anti-inflammatory therapies may be promising regimens for COVID-19 patients. Furthermore, drug repurposing analysis of thousands of drugs revealed that TNFα inhibitor etanercept and γ-aminobutyric acid-B receptor (GABABR) agonist baclofen showed most significant reversal power to COVID-19 gene signature, so we are highly optimistic about their clinical use for COVID-19 treatment. In addition, our results suggested that adalimumab, tocilizumab, rituximab and glucocorticoids may also have beneficial effects in restoring normal transcriptome, but not chloroquine, hydroxychloroquine or interferons. Controlled clinical trials of these candidate drugs are needed in search of effective COVID-19 treatment in current crisis.
Collapse
Affiliation(s)
- Guobing Li
- Center of Basic Medical Research, Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100191, China
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Shasha Ruan
- Department of Clinical Oncology, Renmin Hospital of Wuhan University, Wuhan, Hubei 430060, China
- The First Clinical College of Wuhan University, Wuhan, Hubei 430060, China
| | - Xiaolu Zhao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China
| | - Qi Liu
- Stem Cell Program, Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Yali Dou
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Fengbiao Mao
- Center of Basic Medical Research, Institute of Medical Innovation and Research, Peking University Third Hospital, Beijing 100191, China
| |
Collapse
|