1
|
Sharma D, Rawat P, Greiff V, Janakiraman V, Gromiha MM. Predicting the immune escape of SARS-CoV-2 neutralizing antibodies upon mutation. Biochim Biophys Acta Mol Basis Dis 2024; 1870:166959. [PMID: 37967796 DOI: 10.1016/j.bbadis.2023.166959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/17/2023]
Abstract
COVID-19 has resulted in millions of deaths and severe impact on economies worldwide. Moreover, the emergence of SARS-CoV-2 variants presented significant challenges in controlling the pandemic, particularly their potential to avoid the immune system and evade vaccine immunity. This has led to a growing need for research to predict how mutations in SARS-CoV-2 reduces the ability of antibodies to neutralize the virus. In this study, we assembled a set of 1813 mutations from the interface of SARS-CoV-2 spike protein's receptor binding domain (RBD) and neutralizing antibody complexes and developed a machine learning model to classify high or low escape mutations using interaction energy, inter-residue contacts and predicted binding free energy change. Our approach achieved an Area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.91 using the Random Forest classifier on the test dataset with 217 mutations. The model was further utilized to predict the escape mutations on a dataset of 29,165 mutations located at the interface of 83 RBD-neutralizing antibody complexes. A small subset of this dataset was also validated based on available experimental data. We found that top 10 % high escape mutations were dominated by charged to nonpolar mutations whereas low escape mutations were dominated by polar to nonpolar mutations. We believe that the present method will allow prioritization of high/low escape mutations in the context of neutralizing antibodies targeting SARS-CoV-2 RBD region and assist antibody design for current and emerging variants.
Collapse
Affiliation(s)
- Divya Sharma
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Puneet Rawat
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Victor Greiff
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Vani Janakiraman
- Infection Biology Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - M Michael Gromiha
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
2
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
3
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
4
|
Meirson T, Bomze D, Schueler-Furman O, Stemmer SM, Markel G. Systemic structural analysis of alterations reveals a common structural basis of driver mutations in cancer. NAR Cancer 2023; 5:zcac040. [PMID: 36683915 PMCID: PMC9846427 DOI: 10.1093/narcan/zcac040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/17/2022] [Accepted: 12/04/2022] [Indexed: 01/19/2023] Open
Abstract
A major effort in cancer research is to organize the complexities of the disease into fundamental traits. Despite conceptual progress in the last decades and the synthesis of hallmark features, no organizing principles governing cancer beyond cellular features exist. We analyzed experimentally determined structures harboring the most significant and prevalent driver missense mutations in human cancer, covering 73% (n = 168178) of the Catalog of Somatic Mutation in Cancer tumor samples (COSMIC). The results reveal that a single structural element-κ-helix (polyproline II helix)-lies at the core of driver point mutations, with significant enrichment in all major anatomical sites, suggesting that a small number of molecular traits are shared by most and perhaps all types of cancer. Thus, we uncovered the lowest possible level of organization at which carcinogenesis takes place at the protein level. This framework provides an initial scheme for a mechanistic understanding underlying the development of tumors and pinpoints key vulnerabilities.
Collapse
Affiliation(s)
- Tomer Meirson
- Davidoff Cancer Center, Rabin Medical Center-Beilinson Hospital, Petah Tikva, 49100, Israel
| | - David Bomze
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Biomedical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112001, Israel
| | - Salomon M Stemmer
- Davidoff Cancer Center, Rabin Medical Center-Beilinson Hospital, Petah Tikva, 49100, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Gal Markel
- Davidoff Cancer Center, Rabin Medical Center-Beilinson Hospital, Petah Tikva, 49100, Israel
- Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel Aviv University, Tel-Aviv, 6997801, Israel
| |
Collapse
|
5
|
Ren Z, Li Q, Cao K, Li MM, Zhou Y, Wang K. Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics 2023; 24:43. [PMID: 36759776 PMCID: PMC9909865 DOI: 10.1186/s12859-023-05141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
Collapse
Affiliation(s)
- Zilin Ren
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Quan Li
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, M5G2C1, Canada
| | - Kajia Cao
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Marilyn M Li
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
6
|
Predicting the Prognostic Value of POLI Expression in Different Cancers via a Machine Learning Approach. Int J Mol Sci 2022; 23:ijms23158571. [PMID: 35955705 PMCID: PMC9369001 DOI: 10.3390/ijms23158571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 07/22/2022] [Accepted: 07/25/2022] [Indexed: 11/17/2022] Open
Abstract
Translesion synthesis (TLS) is a cell signaling pathway that facilitates the tolerance of replication stress. Increased TLS activity, the particularly elevated expression of TLS polymerases, has been linked to resistance to cancer chemotherapeutics and significantly altered patient outcomes. Building upon current knowledge, we found that the expression of one of these TLS polymerases (POLI) is associated with significant differences in cervical and pancreatic cancer survival. These data led us to hypothesize that POLI expression is associated with cancer survival more broadly. However, when cancers were grouped cancer type, POLI expression did not have a significant prognostic value. We presented a binary cancer random forest classifier using 396 genes that influence the prognostic characteristics of POLI in cervical and pancreatic cancer selected via graphical least absolute shrinkage and selection operator. The classifier was then used to cluster patients with bladder, breast, colorectal, head and neck, liver, lung, ovary, melanoma, stomach, and uterus cancer when high POLI expression was associated with worsened survival (Group I) or with improved survival (Group II). This approach allowed us to identify cancers where POLI expression is a significant prognostic factor for survival (p = 0.028 in Group I and p = 0.0059 in Group II). Multiple independent validation approaches, including the gene ontology enrichment analysis and visualization tool and network visualization support the classification scheme. The functions of the selected genes involving mitochondrial translational elongation, Wnt signaling pathway, and tumor necrosis factor-mediated signaling pathway support their association with TLS and replication stress. Our multidisciplinary approach provides a novel way of identifying tumors where increased TLS polymerase expression is associated with significant differences in cancer survival.
Collapse
|
7
|
Duan YY, Qin J, Qiu WQ, Li SY, Li C, Liu AS, Chen X, Zhang CX. Performance of a generative adversarial network using ultrasound images to stage liver fibrosis and predict cirrhosis based on a deep-learning radiomics nomogram. Clin Radiol 2022; 77:e723-e731. [PMID: 35811157 DOI: 10.1016/j.crad.2022.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 05/31/2022] [Accepted: 06/07/2022] [Indexed: 12/18/2022]
Abstract
AIM To investigate the performance of a generative adversarial network (GAN) model for staging liver fibrosis and its radiomics-based nomogram for predicting cirrhosis. MATERIALS AND METHODS This two-centre retrospective study included 434 patients for whom input data of ultrasound images and histopathological data (obtained within 1 month of ultrasound examinations) were assigned to the training cohort (249 patients), the internal cohort (92 patients), and the external (93 patients) cohort. A data augmentation method based on a GAN model was used. The discriminative performance was evaluated for classifying fibrosis of S4 and ≥S3. Deep-learning radiomics features were extracted for the prediction of cirrhosis (S4). To perform feature reduction and selection, the least absolute shrinkage and selection operator (LASSO) algorithm was applied. Radiomics scores, along with clinical factors, were incorporated into a nomogram using multivariable logistic regression analysis. The performance of the models was estimated with respect to discrimination power, calibration, and clinical benefits. RESULTS The areas under the receiver operating characteristic curve (AUCs) values of the GAN were 0.832/0.762 (≥S3), and 0.867/0.835 (S4) for internal/external test sets, respectively. The radiomics nomogram that intergrated radiomics scores and clinical factors showed good calibration and discrimination ability of 0.922 (AUC) in the training dataset, 0.896 in the internal dataset, and 0.861 in the external dataset. Decision curve analysis (DCA) demonstrated that the nomogram outperformed radiologist and haematological indices in terms of the most clinical benefits. CONCLUSIONS The GAN model could be applied to discriminate fibrosis stages, and a favourable predictive accuracy for diagnosing cirrhosis was achieved using a deep-learning radiomics nomogram.
Collapse
Affiliation(s)
- Y-Y Duan
- Department of Ultrasound, The First Affiliated Hospital of Anhui Medical University, No. 218 Jixi Road, Shushan District, Hefei 230022, Anhui Province, China
| | - J Qin
- Department of Ultrasound, The First Affiliated Hospital of Anhui Medical University, No. 218 Jixi Road, Shushan District, Hefei 230022, Anhui Province, China
| | - W-Q Qiu
- Department of Ultrasound, The First Affiliated Hospital of Anhui Medical University, No. 218 Jixi Road, Shushan District, Hefei 230022, Anhui Province, China
| | - S-Y Li
- Department of Ultrasound, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, No. 20 Yuhuangdingdong Road, Zhifu District, Yantai 264099, Shandong Province, China
| | - C Li
- Department of Biomedical Engineering, Hefei University of Technology, No. 193 Tunxi Road, Baohe District, Hefei 230009, Anhui Province, China
| | - A-S Liu
- Department of Ultrasound, The First Affiliated Hospital of Anhui University of Chinese Medicine, No. 117 Meishan Road, Shushan District, Hefei 230022, Anhui Province, China
| | - X Chen
- Department of Electronic Engineering and Information Science, University of Science and Technology of China, No. 93 Jinzhai Road, Baohe District, Hefei 230026, Anhui Province, China
| | - C-X Zhang
- Department of Ultrasound, The First Affiliated Hospital of Anhui Medical University, No. 218 Jixi Road, Shushan District, Hefei 230022, Anhui Province, China.
| |
Collapse
|
8
|
Liu C, Dai Y, Yu K, Zhang ZK. Enhancing Cancer Driver Gene Prediction by Protein-Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2231-2240. [PMID: 33656997 DOI: 10.1109/tcbb.2021.3063532] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With the advances in gene sequencing technologies, millions of somatic mutations have been reported in the past decades, but mining cancer driver genes with oncogenic mutations from these data remains a critical and challenging area of research. In this study, we proposed a network-based classification method for identifying cancer driver genes with merging the multi-biological information. In this method, we construct a cancer specific genetic network from the human protein-protein interactome (PPI) to mine the network structure attributes, and combine biological information such as mutation frequency and differential expression of genes to achieve accurate prediction of cancer driver genes. Across seven different cancer types, the proposed algorithm always achieves high prediction accuracy, which is superior to the existing advanced methods. In the analysis of the predicted results, about 40 percent of the top 10 candidate genes overlap with the Cancer Gene Census database. Interestingly, the feature comparison indicates that the network based features are still more important than the biological features, including the mutation frequency and genetic differential expression. Further analyses also show that the integration of network structure attributes and biological information is valuable for predicting new cancer driver genes.
Collapse
|
9
|
A hybrid approach for lung cancer diagnosis using optimized random forest classification and K-means visualization algorithm. HEALTH AND TECHNOLOGY 2022. [DOI: 10.1007/s12553-022-00679-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
10
|
Weighted Gene Coexpression Network Analysis Identifies TBC1D10C as a New Prognostic Biomarker for Breast Cancer. Anal Cell Pathol 2022; 2022:5259187. [PMID: 35425695 PMCID: PMC9005324 DOI: 10.1155/2022/5259187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 11/30/2021] [Accepted: 03/15/2022] [Indexed: 12/09/2022] Open
Abstract
Background Immune checkpoint inhibitors are a promising therapeutic strategy for breast cancer (BRCA) patients. The tumor microenvironment (TME) can downregulate the immune response to cancer therapy. Our study is aimed at finding a TME-related biomarker to identify patients who might respond to immunotherapy. Method We downloaded raw data from several databases including TCGA and MDACC to identify TME hub genes associated with overall survival (OS) and the progression-free interval (PFI) by WGCNA. Correlations between hub genes and either tumor-infiltrating immune cells or immune checkpoints were conducted by ssGSEA. Result TME-related green and black modules were selected by WGCNA to further screen hub genes. Random forest and univariate and multivariate Cox regressions were applied to screen hub genes (MYO1G, TBC1D10C, SELPLG, and LRRC15) and construct a nomogram to predict the survival of BRCA patients. The C-index for the nomogram was 0.713. A DCA of the predictive model revealed that the net benefit of the nomogram was significantly higher than others and the calibration curve demonstrated a good performance by the nomogram. Only TBC1D10C was correlated with both OS and the PFI (both p values < 0.05). TBC1D10C also had a high positive association with tumor-infiltrating immune cells and common immune checkpoints (PD-1, CTLA-4, and TIGIT). Conclusion We constructed a TME-related gene signature model to predict the survival probability of BRCA patients. We also identified a hub gene, TBC1D10C, which was correlated with both OS and the PFI and had a high positive association with tumor-infiltrating immune cells and common immune checkpoints. TBC1D10C may be a new biomarker to select patients who may benefit from immunotherapy.
Collapse
|
11
|
Xiao F, Zhou Z, Song X, Gan M, Long J, Verkhivker G, Hu G. Dissecting mutational allosteric effects in alkaline phosphatases associated with different Hypophosphatasia phenotypes: An integrative computational investigation. PLoS Comput Biol 2022; 18:e1010009. [PMID: 35320273 PMCID: PMC8979438 DOI: 10.1371/journal.pcbi.1010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 04/04/2022] [Accepted: 03/10/2022] [Indexed: 11/18/2022] Open
Abstract
Hypophosphatasia (HPP) is a rare inherited disorder characterized by defective bone mineralization and is highly variable in its clinical phenotype. The disease occurs due to various loss-of-function mutations in ALPL, the gene encoding tissue-nonspecific alkaline phosphatase (TNSALP). In this work, a data-driven and biophysics-based approach is proposed for the large-scale analysis of ALPL mutations-from nonpathogenic to severe HPPs. By using a pipeline of synergistic approaches including sequence-structure analysis, network modeling, elastic network models and atomistic simulations, we characterized allosteric signatures and effects of the ALPL mutations on protein dynamics and function. Statistical analysis of molecular features computed for the ALPL mutations showed a significant difference between the control, mild and severe HPP phenotypes. Molecular dynamics simulations coupled with protein structure network analysis were employed to analyze the effect of single-residue variation on conformational dynamics of TNSALP dimers, and the developed machine learning model suggested that the topological network parameters could serve as a robust indicator of severe mutations. The results indicated that the severity of disease-associated mutations is often linked with mutation-induced modulation of allosteric communications in the protein. This study suggested that ALPL mutations associated with mild and more severe HPPs can exert markedly distinct effects on the protein stability and long-range network communications. By linking the disease phenotypes with dynamic and allosteric molecular signatures, the proposed integrative computational approach enabled to characterize and quantify the allosteric effects of ALPL mutations and role of allostery in the pathogenesis of HPPs.
Collapse
Affiliation(s)
- Fei Xiao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ziyun Zhou
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Xingyu Song
- Department of Chemistry, Multiscale Research Institute of Complex Systems and Institute of Biomedical Sciences, Fudan University, Shanghai, China
| | - Mi Gan
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Jie Long
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Gennady Verkhivker
- Department of Computational and Data Sciences, Chapman University, One University Drive, Orange, California, United States of America
- Department of Biomedical and Pharmaceutical Sciences, Chapman University Pharmacy School 9401 Jeronimo Rd, Irvine, California, United States of America
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
- * E-mail:
| |
Collapse
|
12
|
Nagy M, Radakovich N, Nazha A. Machine Learning in Oncology: What Should Clinicians Know? JCO Clin Cancer Inform 2021; 4:799-810. [PMID: 32926637 DOI: 10.1200/cci.20.00049] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The volume and complexity of scientific and clinical data in oncology have grown markedly over recent years, including but not limited to the realms of electronic health data, radiographic and histologic data, and genomics. This growth holds promise for a deeper understanding of malignancy and, accordingly, more personalized and effective oncologic care. Such goals require, however, the development of new methods to fully make use of the wealth of available data. Improvements in computer processing power and algorithm development have positioned machine learning, a branch of artificial intelligence, to play a prominent role in oncology research and practice. This review provides an overview of the basics of machine learning and highlights current progress and challenges in applying this technology to cancer diagnosis, prognosis, and treatment recommendations, including a discussion of current takeaways for clinicians.
Collapse
Affiliation(s)
- Matthew Nagy
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Nathan Radakovich
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Aziz Nazha
- Center for Clinical Artificial Intelligence, Cleveland Clinic, Cleveland, OH.,Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, OH
| |
Collapse
|
13
|
Computational studies of anaplastic lymphoma kinase mutations reveal common mechanisms of oncogenic activation. Proc Natl Acad Sci U S A 2021; 118:2019132118. [PMID: 33674381 PMCID: PMC7958353 DOI: 10.1073/pnas.2019132118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
High-risk tumors are genomically heterogeneous, harboring gene amplifications and mutations. The activation status of mutated proteins in cancer can profoundly impact disease progression, patient response, and drug sensitivity. Yet, outside of a few hotspot mutations, functional studies of clinically observed mutations are not commonly pursued. We report a combined experimental profiling and computational analysis of the effects of clinically observed and “test” mutations in the kinase domain of anaplastic lymphoma kinase (ALK), a known oncogenic driver in pediatric neuroblastoma. We find that the activation status of the mutated protein is a good indicator of the transforming ability in NIH 3T3 cells. We also report biophysical as well as data-driven models with predictive power to profile these mutant kinases in silico. Kinases play important roles in diverse cellular processes, including signaling, differentiation, proliferation, and metabolism. They are frequently mutated in cancer and are the targets of a large number of specific inhibitors. Surveys of cancer genome atlases reveal that kinase domains, which consist of 300 amino acids, can harbor numerous (150 to 200) single-point mutations across different patients in the same disease. This preponderance of mutations—some activating, some silent—in a known target protein make clinical decisions for enrolling patients in drug trials challenging since the relevance of the target and its drug sensitivity often depend on the mutational status in a given patient. We show through computational studies using molecular dynamics (MD) as well as enhanced sampling simulations that the experimentally determined activation status of a mutated kinase can be predicted effectively by identifying a hydrogen bonding fingerprint in the activation loop and the αC-helix regions, despite the fact that mutations in cancer patients occur throughout the kinase domain. In our study, we find that the predictive power of MD is superior to a purely data-driven machine learning model involving biochemical features that we implemented, even though MD utilized far fewer features (in fact, just one) in an unsupervised setting. Moreover, the MD results provide key insights into convergent mechanisms of activation, primarily involving differential stabilization of a hydrogen bond network that engages residues of the activation loop and αC-helix in the active-like conformation (in >70% of the mutations studied, regardless of the location of the mutation).
Collapse
|
14
|
Rogers MF, Gaunt TR, Campbell C. Prediction of driver variants in the cancer genome via machine learning methodologies. Brief Bioinform 2021; 22:bbaa250. [PMID: 33094325 PMCID: PMC8293831 DOI: 10.1093/bib/bbaa250] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 09/04/2020] [Accepted: 09/06/2020] [Indexed: 01/18/2023] Open
Abstract
Sequencing technologies have led to the identification of many variants in the human genome which could act as disease-drivers. As a consequence, a variety of bioinformatics tools have been proposed for predicting which variants may drive disease, and which may be causatively neutral. After briefly reviewing generic tools, we focus on a subset of these methods specifically geared toward predicting which variants in the human cancer genome may act as enablers of unregulated cell proliferation. We consider the resultant view of the cancer genome indicated by these predictors and discuss ways in which these types of prediction tools may be progressed by further research.
Collapse
Affiliation(s)
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, University of Bristol
| | - Colin Campbell
- University of Bristol with interests in machine learning and medical bioinformatics
| |
Collapse
|
15
|
Banerjee S, Raman K, Ravindran B. Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes. Cancers (Basel) 2021; 13:cancers13102366. [PMID: 34068918 PMCID: PMC8156421 DOI: 10.3390/cancers13102366] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 04/30/2021] [Indexed: 12/11/2022] Open
Abstract
Simple Summary Cancer is caused by the accumulation of somatic mutations, some of which are responsible for the disease’s progression (drivers) while others are functionally neutral (passengers). Although several methods have been developed to distinguish between the two classes of mutations, very few have concentrated on using the neighborhood nucleotide sequences as potential discrimination features. In this study, we show that driver mutations’ neighborhood is significantly different from that of passengers. We further develop a novel machine learning tool, NBDriver, which is highly efficient at identifying pathogenic variants from multiple independent test datasets. Efficient and accurate identification of novel pathogenic variants from sequenced cancer genomes would help facilitate more effective therapies tailored to patients’ mutational profiles. Abstract Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.
Collapse
Affiliation(s)
- Shayantan Banerjee
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology (IIT) Madras, Chennai 600 036, India;
- Initiative for Biological Systems Engineering, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
| | - Karthik Raman
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology (IIT) Madras, Chennai 600 036, India;
- Initiative for Biological Systems Engineering, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Correspondence: (K.R.); (B.R.)
| | - Balaraman Ravindran
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology (IIT) Madras, Chennai 600 036, India;
- Initiative for Biological Systems Engineering, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Correspondence: (K.R.); (B.R.)
| |
Collapse
|
16
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
17
|
Radhakrishnan R. A survey of multiscale modeling: Foundations, historical milestones, current status, and future prospects. AIChE J 2021; 67:e17026. [PMID: 33790479 PMCID: PMC7988612 DOI: 10.1002/aic.17026] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 08/09/2020] [Accepted: 08/13/2020] [Indexed: 01/14/2023]
Abstract
Research problems in the domains of physical, engineering, biological sciences often span multiple time and length scales, owing to the complexity of information transfer underlying mechanisms. Multiscale modeling (MSM) and high-performance computing (HPC) have emerged as indispensable tools for tackling such complex problems. We review the foundations, historical developments, and current paradigms in MSM. A paradigm shift in MSM implementations is being fueled by the rapid advances and emerging paradigms in HPC at the dawn of exascale computing. Moreover, amidst the explosion of data science, engineering, and medicine, machine learning (ML) integrated with MSM is poised to enhance the capabilities of standard MSM approaches significantly, particularly in the face of increasing problem complexity. The potential to blend MSM, HPC, and ML presents opportunities for unbound innovation and promises to represent the future of MSM and explainable ML that will likely define the fields in the 21st century.
Collapse
Affiliation(s)
- Ravi Radhakrishnan
- Department of Chemical and Biomolecular EngineeringPenn Institute for Computational Science, University of PennsylvaniaPhiladelphiaPhiladelphiaUSA
- Department of BioengineeringPenn Institute for Computational Science, University of PennsylvaniaPhiladelphiaPhiladelphiaUSA
| |
Collapse
|
18
|
Yang Y, Yang J, Liang Y, Liao B, Zhu W, Mo X, Huang K. Identification and Validation of Efficacy of Immunological Therapy for Lung Cancer From Histopathological Images Based on Deep Learning. Front Genet 2021; 12:642981. [PMID: 33633793 PMCID: PMC7900553 DOI: 10.3389/fgene.2021.642981] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 01/18/2021] [Indexed: 12/26/2022] Open
Abstract
Cancer immunotherapy, as a novel treatment against cancer metastasis and recurrence, has brought a significantly promising and effective therapy for cancer treatments. At present, programmed death 1 (PD-1) and programmed cell death-Ligand 1 (PD-L1) treatment for lung cancer is primarily recognized as an immune checkpoint inhibitor (ICI) to play an anti-tumor effect; however, it remains uncertain regarding of its efficacy though. Thereafter, tumor mutation burden (TMB) was recognized as a high-potential to be a predictive marker for the immune therapy, but it is invasive and costly. Therefore, discovering more immune-related biomarkers that have a guiding role in immunotherapy is a crucial step in the development of immunotherapy. In our study, we proposed a deep convolutional neural network (CNN)-based framework, DeepLRHE, which can efficiently analyze immunological stained pathological images of lung cancer tissues, as well as to identify and explore pathogenesis which can be used for immunological treatment in clinical field. In this study, we used 180 whole slice images (WSIs) of lung cancer downloaded from TCGA which was model training and validation. After two cross-validation used for this model, we compared with the area under the curve (AUC) of multiple mutant genes, TP53 had highest AUC, which reached 0.87, and EGFR, DNMT3A, PBRM1, STK11 also reached ranged from 0.71 to 0.84. The study results showed that the deep learning can used to assist health professionals for target-therapy as well as immunotherapies, therefore to improve the disease prognosis.
Collapse
Affiliation(s)
- Yachao Yang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education (Hainan Normal University) Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Jialiang Yang
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis (Beijing) Co., Ltd., Beijing, China
| | - Yuebin Liang
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis (Beijing) Co., Ltd., Beijing, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education (Hainan Normal University) Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education (Hainan Normal University) Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Xiaofei Mo
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis (Beijing) Co., Ltd., Beijing, China
| | - Kaimei Huang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education (Hainan Normal University) Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
19
|
López-Cortés XA, Matamala F, Maldonado C, Mora-Poblete F, Scapim CA. A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize. Front Genet 2020; 11:543459. [PMID: 33329691 PMCID: PMC7732446 DOI: 10.3389/fgene.2020.543459] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 10/19/2020] [Indexed: 11/16/2022] Open
Abstract
Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignment of maize inbred lines, i.e., dent field corn (n = 97) and popcorn (n = 86). The results revealed that the HC method in combination with DeepAE-based data preprocessing (DeepAE-HC) was the most effective method to assign individuals to clusters (with 96% of correct individual assignments), whereas DeepAE-KM, PCA-HC, and PCA-KM were assigned correctly 92, 89, and 81% of the lines, respectively. These findings were consistent with both Silhouette Coefficient (SC) and Davies-Bouldin validation indexes. Notably, DeepAE-HC also had better accuracy than the Bayesian clustering method implemented in InStruct. The results of this study showed that deep learning (DL)-based dimensional reduction combined with ML clustering methods is a useful tool to determine genetically differentiated groups and to assign individuals into subpopulations in genome-wide studies without having to consider previous genetic assumptions.
Collapse
Affiliation(s)
| | - Felipe Matamala
- Department of Computer Sciences and Industries, Catholic University of the Maule, Talca, Chile
| | - Carlos Maldonado
- Instituto de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O’Higgins, San Fernando, Chile
| | | | | |
Collapse
|
20
|
Lin W, Hasenstab K, Moura Cunha G, Schwartzman A. Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment. Sci Rep 2020; 10:20336. [PMID: 33230152 PMCID: PMC7683555 DOI: 10.1038/s41598-020-77264-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
We propose a random forest classifier for identifying adequacy of liver MR images using handcrafted (HC) features and deep convolutional neural networks (CNNs), and analyze the relative role of these two components in relation to the training sample size. The HC features, specifically developed for this application, include Gaussian mixture models, Euler characteristic curves and texture analysis. Using HC features outperforms the CNN for smaller sample sizes and with increased interpretability. On the other hand, with enough training data, the combined classifier outperforms the models trained with HC features or CNN features alone. These results illustrate the added value of HC features with respect to CNNs, especially when insufficient data is available, as is often found in clinical studies.
Collapse
Affiliation(s)
- Wenyi Lin
- Division of Biostatistics, Department of Family Medicine and Public Health, University of California San Diego, La Jolla, 92093, USA.
| | - Kyle Hasenstab
- Department of Mathematics and Statistics, San Diego State University, San Diego, CA, 92182, USA
| | - Guilherme Moura Cunha
- Liver Imaging Group, Department of Radiology, University of California San Diego, La Jolla, 92093, USA
| | - Armin Schwartzman
- Division of Biostatistics, Department of Family Medicine and Public Health, University of California San Diego, La Jolla, 92093, USA.,Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, 92093, USA
| |
Collapse
|
21
|
Ter-Levonian AS, Koshechkin KA. Review of Machine Learning Technologies and Neural Networks in Drug Synergy Combination pharmacological research. RESEARCH RESULTS IN PHARMACOLOGY 2020. [DOI: 10.3897/rrpharmacology.6.49591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Introduction: Nowadays an increase in the amount of information creates the need to replace and update data processing technologies. One of the tasks of clinical pharmacology is to create the right combination of drugs for the treatment of a particular disease. It takes months and even years to create a treatment regimen. Using machine learning (in silico) allows predicting how to get the right combination of drugs and skip the experimental steps in a study that take a lot of time and financial expenses. Gradual preparation is needed for the Deep Learning of Drug Synergy, starting from creating a base of drugs, their characteristics and ways of interacting.
Aim: Our review aims to draw attention to the prospect of the introduction of Deep Learning technology to predict possible combinations of drugs for the treatment of various diseases.
Materials and methods: Literary review of articles based on the PUBMED project and related bibliographic resources over the past 5 years (2015–2019).
Results and discussion: In the analyzed articles, Machine or Deep Learning completed the assigned tasks. It was able to determine the most appropriate combinations for the treatment of certain diseases, select the necessary regimen and doses. In addition, using this technology, new combinations have been identified that may be further involved in preclinical studies.
Conclusions: From the analysis of the articles, we obtained evidence of the positive effects of Deep Learning to select “key” combinations for further stages of preclinical research.
Collapse
|
22
|
Singh VK, Maurya NS, Mani A, Yadav RS. Machine learning method using position-specific mutation based classification outperforms one hot coding for disease severity prediction in haemophilia 'A'. Genomics 2020; 112:5122-5128. [PMID: 32927010 DOI: 10.1016/j.ygeno.2020.09.020] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 08/10/2020] [Accepted: 09/08/2020] [Indexed: 01/20/2023]
Abstract
Haemophilia is an X-linked genetic disorder in which A and B types are the most common that occur due to absence or lack of protein factors VIII and IX, respectively. Severity of the disease depends on mutation. Available Machine Learning (ML) methods that predict the mutational severity by using traditional encoding approaches, generally have high time complexity and compromised accuracy. In this study, Haemophilia 'A' patient mutation dataset containing 7784 mutations was processed by the proposed Position-Specific Mutation (PSM) and One-Hot Encoding (OHE) technique to predict the disease severity. The dataset processed by PSM and OHE methods was analyzed and trained for classification of mutation severity level using various ML algorithms. Surprisingly, PSM outperformed OHE, both in terms of time efficiency and accuracy, with training and prediction time improvement in the range of approximately 91 to 98% and 80 to 99% respectively. The severity prediction accuracy also improved by using PSM with different ML algorithms.
Collapse
Affiliation(s)
- Vikalp Kumar Singh
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India
| | - Neha Shree Maurya
- Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India
| | - Ashutosh Mani
- Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India.
| | - Rama Shankar Yadav
- Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India
| |
Collapse
|
23
|
Rangaswamy U, Dharshini SAP, Yesudhas D, Gromiha MM. VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning. Comput Biol Med 2020; 124:103933. [PMID: 32828070 DOI: 10.1016/j.compbiomed.2020.103933] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 07/25/2020] [Accepted: 07/25/2020] [Indexed: 12/26/2022]
Abstract
INTRODUCTION Alzheimer's disease (AD) is a complex and heterogeneous disease that affects neuronal cells over time and it is prevalent among all neurodegenerative diseases. Next Generation Sequencing (NGS) techniques are widely used for developing high-throughput screening methods to identify biomarkers and variants, which help early diagnosis and treatments. OBJECTIVE The primary purpose of this study is to develop a classification model using machine learning for predicting the deleterious effect of variants with respect to AD. METHODS We have constructed a set of 20,401 deleterious and 37,452 control variants from Genome-Wide Association Study (GWAS) and Genotype-Tissue Expression (GTEx) portals, respectively. Recursive feature elimination using cross-validation (RFECV) followed by a forward feature selection method was utilized to select the important features and a random forest classifier was used for distinguishing between deleterious and neutral variants. RESULTS Our method showed an accuracy of 81.21% on 10-fold cross-validation and 70.63% on a test set of 5785 variants. The same test set was used to compare the performance of CADD and FATHMM and their accuracies are in the range of 54%-62%. CONCLUSION Our model is freely available as the Variant Effect Predictor for Alzheimer's Disease (VEPAD) at http://web.iitm.ac.in/bioinfo2/vepad/. VEPAD can be used to predict the effect of new variants associated with AD.
Collapse
Affiliation(s)
- Uday Rangaswamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - S Akila Parvathy Dharshini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - Dhanusha Yesudhas
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, India; School of Computing, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Kanagawa, 226-8503, Yokohama, Japan.
| |
Collapse
|
24
|
Verkhivker GM, Agajanian S, Hu G, Tao P. Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning. Front Mol Biosci 2020; 7:136. [PMID: 32733918 PMCID: PMC7363947 DOI: 10.3389/fmolb.2020.00136] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022] Open
Abstract
Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the "second secret of life." The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets.
Collapse
Affiliation(s)
- Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Peng Tao
- Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
25
|
Wang J, Deng F, Zeng F, Shanahan AJ, Li WV, Zhang L. Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model. Am J Cancer Res 2020; 10:1344-1355. [PMID: 32509383 PMCID: PMC7269775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 04/07/2020] [Indexed: 06/11/2023] Open
Abstract
The majority of patients with prostate cancer die of non-cancer causes of death (COD). It is thus important to accurately predict multi-category COD in these patients. Random forest (RF), a popular machine learning model, has been shown useful for predicting binary cancer-specific deaths. However, its accuracy for predicting multi-category COD in cancer patients is unclear. We included patients in Surveillance, Epidemiology, and End Results-18 cancer registry-program with prostate cancer diagnosed in 2004 (followed-up through 2016). They were randomly divided into training and testing sets with equal sizes. We evaluated prediction accuracies of RF and conventional statistical/multinomial models for 6-category COD by data-encoding types using the 2-fold cross-validation approach. Among 49,864 prostate cancer patients, 29,611 (59.4%) were alive at the end of follow-up, and 5,448 (10.9%) died of cardiovascular disease, 4,607 (9.2%) of prostate cancer, 3,681 (7.4%) of non-prostate cancer, 717 (1.4%) of infection, and 5,800 (11.6%) of other causes. We predicted 6-category COD among these patients with a mean accuracy of 59.1% (n=240, 95% CI, 58.7%-59.4%) in RF models with one-hot encoding, and 50.4% (95% CI, 49.7%-51.0%) in multinomial models. Tumor characteristics, prostate-specific antigen level, and diagnosis confirmation-method were important in RF and multinomial models. In RF models, no statistical differences were found between the accuracies of training versus cross-validation phases, and those of categorical versus one-hot encoding. We here report that RF models can outperform multinomial logistic models (absolute accuracy-difference, 8.7%) in predicting long-term 6-category COD among prostate cancer patients, while pathology diagnosis itself and tumor pathology remain important factors.
Collapse
Affiliation(s)
- Jianwei Wang
- Department of Urology, Beijing Jishuitan Hospital, The Fourth Medical College of Peking UniversityBeijing, China
| | - Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of TechnologyShanghai, China
| | - Fuqing Zeng
- Department of Urology, Wuhan Union Hospital of Tongji Medical Collage, Huazhong University of Science and TechnologyWuhan, China
| | | | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public HealthPiscataway, NJ, USA
| | - Lanjing Zhang
- Department of Pathology, Princeton Medical CenterPlainsboro, NJ, USA
- Department of Biological Sciences, Rutgers UniversityNewark, NJ, USA
- Rutgers Cancer Institute of New JerseyNew Brunswick, NJ, USA
- Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers UniversityPiscataway, NJ, USA
| |
Collapse
|
26
|
Fully automated plaque characterization in intravascular OCT images using hybrid convolutional and lumen morphology features. Sci Rep 2020; 10:2596. [PMID: 32054895 PMCID: PMC7018759 DOI: 10.1038/s41598-020-59315-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 01/17/2020] [Indexed: 11/28/2022] Open
Abstract
For intravascular OCT (IVOCT) images, we developed an automated atherosclerotic plaque characterization method that used a hybrid learning approach, which combined deep-learning convolutional and hand-crafted, lumen morphological features. Processing was done on innate A-line units with labels fibrolipidic (fibrous tissue followed by lipidous tissue), fibrocalcific (fibrous tissue followed by calcification), or other. We trained/tested on an expansive data set (6,556 images), and performed an active learning, relabeling step to improve noisy ground truth labels. Conditional random field was an important post-processing step to reduce classification errors. Sensitivities/specificities were 84.8%/97.8% and 91.4%/95.7% for fibrolipidic and fibrocalcific plaques, respectively. Over lesions, en face classification maps showed automated results that agreed favorably to manually labeled counterparts. Adding lumen morphological features gave statistically significant improvement (p < 0.05), as compared to classification with convolutional features alone. Automated assessments of clinically relevant plaque attributes (arc angle and length), compared favorably to those from manual labels. Our hybrid approach gave statistically improved results as compared to previous A-line classification methods using deep learning or hand-crafted features alone. This plaque characterization approach is fully automated, robust, and promising for live-time treatment planning and research applications.
Collapse
|
27
|
Matsuzaka Y, Uesawa Y. DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance. Front Bioeng Biotechnol 2020; 7:485. [PMID: 32039185 PMCID: PMC6987043 DOI: 10.3389/fbioe.2019.00485] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/30/2019] [Indexed: 12/16/2022] Open
Abstract
The progesterone receptor (PR) is important therapeutic target for many malignancies and endocrine disorders due to its role in controlling ovulation and pregnancy via the reproductive cycle. Therefore, the modulation of PR activity using its agonists and antagonists is receiving increasing interest as novel treatment strategy. However, clinical trials using the PR modulators have not yet been found conclusive evidences. Recently, increasing evidence from several fields shows that the classification of chemical compounds, including agonists and antagonists, can be done with recent improvements in deep learning (DL) using deep neural network. Therefore, we recently proposed a novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists. By employing this novel approach, referred as DeepSnap-DL method, which uses images captured from 3-dimension (3D) chemical structure with multiple angles as input data into the DL classification, we constructed prediction models of the PR antagonists in this study. Here, the DeepSnap-DL method showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures. Furthermore, comparison of the prediction models from this approach with conventional machine learnings (MLs) indicated the DeepSnap-DL method outperformed these MLs. Therefore, the models predicted by DeepSnap-DL would be powerful tool for not only QSAR field in predicting physiological and agonist/antagonist activities, toxicity, and molecular bindings; but also for identifying biological or pathological phenomena.
Collapse
Affiliation(s)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo, Japan
| |
Collapse
|