1
|
Tzavella K, Diaz A, Olsen C, Vranken W. Combining evolution and protein language models for an interpretable cancer driver mutation prediction with D2Deep. Brief Bioinform 2024; 26:bbae664. [PMID: 39708841 DOI: 10.1093/bib/bbae664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/15/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
The mutations driving cancer are being increasingly exposed through tumor-specific genomic data. However, differentiating between cancer-causing driver mutations and random passenger mutations remains challenging. State-of-the-art homology-based predictors contain built-in biases and are often ill-suited to the intricacies of cancer biology. Protein language models have successfully addressed various biological problems but have not yet been tested on the challenging task of cancer driver mutation prediction at a large scale. Additionally, they often fail to offer result interpretation, hindering their effective use in clinical settings. The AI-based D2Deep method we introduce here addresses these challenges by combining two powerful elements: (i) a nonspecialized protein language model that captures the makeup of all protein sequences and (ii) protein-specific evolutionary information that encompasses functional requirements for a particular protein. D2Deep relies exclusively on sequence information, outperforms state-of-the-art predictors, and captures intricate epistatic changes throughout the protein caused by mutations. These epistatic changes correlate with known mutations in the clinical setting and can be used for the interpretation of results. The model is trained on a balanced, somatic training set and so effectively mitigates biases related to hotspot mutations compared to state-of-the-art techniques. The versatility of D2Deep is illustrated by its performance on non-cancer mutation prediction, where most variants still lack known consequences. D2Deep predictions and confidence scores are available via https://tumorscope.be/d2deep to help with clinical interpretation and mutation prioritization.
Collapse
Affiliation(s)
- Konstantina Tzavella
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Adrian Diaz
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), Vrije Universiteit Brussel (VUB), Université Libre de Bruxelles (ULB), Laarbeeklaan 101, Brussels 1090, Belgium
- Clinical Sciences, Research Group Genetics, Reproduction and Development (GRAD), Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Laarbeeklaan 101, Brussels 1090, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics (IB2), Université Libre de Bruxelles, Vrije Universiteit Brussel (ULB-VUB), Triomflaan, Brussels 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Pleinlaan 2, Brussels 1050, Belgium
- Chemistry Department, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- AI Lab, Vrije Universtiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
- Biomedical sciences, Vrije Universiteit Brussel, Laarbeeklaan 101, Brussels 1090, Belgium
| |
Collapse
|
2
|
Ahmad RM, Ali BR, Al-Jasmi F, Al Dhaheri N, Al Turki S, Kizhakkedath P, Mohamad MS. AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes. Hum Genomics 2024; 18:99. [PMID: 39256852 PMCID: PMC11389290 DOI: 10.1186/s40246-024-00667-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 08/22/2024] [Indexed: 09/12/2024] Open
Abstract
Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Saeed Al Turki
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Praseetha Kizhakkedath
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates.
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.
| |
Collapse
|
3
|
Passi G, Lieberman S, Zahdeh F, Murik O, Renbaum P, Beeri R, Linial M, May D, Levy-Lahad E, Schneidman-Duhovny D. Discovering predisposing genes for hereditary breast cancer using deep learning. Brief Bioinform 2024; 25:bbae346. [PMID: 39038933 PMCID: PMC11262808 DOI: 10.1093/bib/bbae346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 04/18/2024] [Accepted: 07/04/2024] [Indexed: 07/24/2024] Open
Abstract
Breast cancer (BC) is the most common malignancy affecting Western women today. It is estimated that as many as 10% of BC cases can be attributed to germline variants. However, the genetic basis of the majority of familial BC cases has yet to be identified. Discovering predisposing genes contributing to familial BC is challenging due to their presumed rarity, low penetrance, and complex biological mechanisms. Here, we focused on an analysis of rare missense variants in a cohort of 12 families of Middle Eastern origins characterized by a high incidence of BC cases. We devised a novel, high-throughput, variant analysis pipeline adapted for family studies, which aims to analyze variants at the protein level by employing state-of-the-art machine learning models and three-dimensional protein structural analysis. Using our pipeline, we analyzed 1218 rare missense variants that are shared between affected family members and classified 80 genes as candidate pathogenic. Among these genes, we found significant functional enrichment in peroxisomal and mitochondrial biological pathways which segregated across seven families in the study and covered diverse ethnic groups. We present multiple evidence that peroxisomal and mitochondrial pathways play an important, yet underappreciated, role in both germline BC predisposition and BC survival.
Collapse
Affiliation(s)
- Gal Passi
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Sari Lieberman
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Fouad Zahdeh
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Omer Murik
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Paul Renbaum
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Rachel Beeri
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem 91904, Israel
| | - Dalit May
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Clalit Health Services, Jerusalem, Israel
| | - Ephrat Levy-Lahad
- The Fuld Family Medical Genetics Institute, Shaare Zedek Medical Center 12 Bayit St., Jerusalem 9103101, Israel
- The Eisenberg R&D Authority, Shaare Zedek Medical Center, 12 Bayit St., Jerusalem 9103101, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Ein Kerem PO Box 12271 Jerusalem 9112102, Israel
| | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
4
|
Pukhta IR, Rout RK. Identification and segregation of genes with improved recurrent neural network trained with optimal gene level and mutation level features. Comput Methods Biomech Biomed Engin 2024:1-16. [PMID: 38424698 DOI: 10.1080/10255842.2024.2311322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/20/2024] [Indexed: 03/02/2024]
Abstract
Even though many different approaches have been employed to address the complex mutational heterogeneity of cancer, finding driver genes is still problematic since other genomic factors cannot be fully integrated for combined analyses. This research paper presents a novel gene identification and segregation model with five key processes (a) pre-processing, (b) treatment of class imbalances, (c) feature extraction, (d) feature selection, and (e) gene classification. To increase the data quality, the gathered initial information is first pre-processed utilizing data cleaning and data normalization. This turns the raw data into something that is both useful and effective. In actuality, the sample is skewed against drivers because passenger mutation markers appear in proportionally less instances than drivers do. To address the Class Imbalance Problem, improved K-Means + SMOTE are applied to the preprocessed data. The most crucial characteristics, including those at the gene and mutation levels, are then extracted from the balanced dataset. To lessen the computational load in terms of time, the best features from the retrieved features are selected using Forensic interpretation tailored hunger food search optimization (FIHFSO). The ideal features are used to train the deep learning classifier that conducts the separation procedure. In this research, an Improved Recurrent Neural Network (I-RNN) is used to make a final decision about genes. At 90% of learning percentage, the accuracy of the proposed method achieves 0.98% of 0.83, 0.81, 0.65, 0.80, 0.92 and 0.63% which is compared to the other methods like HGS, FBIO, AOA, AO, GOA and PRO respectively.
Collapse
Affiliation(s)
- Irfan Rashid Pukhta
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| | - Ranjeet Kumar Rout
- Assistant Professor, Department of Computer Science and Engineering National Institute of Technology, Srinagar, Jammu and Kashmir 190006, India
| |
Collapse
|
5
|
Youssef R, Maniar R, Khan J, Mesa H. Metabolic Interplay in the Tumor Microenvironment: Implications for Immune Function and Anticancer Response. Curr Issues Mol Biol 2023; 45:9753-9767. [PMID: 38132455 PMCID: PMC10742411 DOI: 10.3390/cimb45120609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 11/26/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
Malignant tumors exhibit rapid growth and high metabolic rates, similar to embryonic stem cells, and depend on aerobic glycolysis, known as the "Warburg effect". This understanding has enabled the use of radiolabeled glucose analogs in tumor staging and therapeutic response assessment via PET scans. Traditional treatments like chemotherapy and radiotherapy target rapidly dividing cells, causing significant toxicity. Despite immunotherapy's impact on solid tumor treatment, gaps remain, leading to research on cancer cell evasion of immune response and immune tolerance induction via interactions with the tumor microenvironment (TME). The TME, consisting of immune cells, fibroblasts, vessels, and the extracellular matrix, regulates tumor progression and therapy responses. TME-targeted therapies aim to transform this environment from supporting tumor growth to impeding it and fostering an effective immune response. This review examines the metabolic disparities between immune cells and cancer cells, their impact on immune function and therapeutic targeting, the TME components, and the complex interplay between cancer cells and nontumoral cells. The success of TME-targeted therapies highlights their potential to achieve better cancer control or even a cure.
Collapse
Affiliation(s)
- Reem Youssef
- Department of Laboratory Medicine and Pathology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Rohan Maniar
- Division of Hematology/Oncology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Jaffar Khan
- Department of Laboratory Medicine and Pathology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Hector Mesa
- Department of Laboratory Medicine and Pathology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
6
|
Xu X, Li Y, Chen T, Hou C, Yang L, Zhu P, Zhang Y, Li T. VIPpred: a novel model for predicting variant impact on phosphorylation events driving carcinogenesis. Brief Bioinform 2023; 25:bbad480. [PMID: 38156562 PMCID: PMC10782907 DOI: 10.1093/bib/bbad480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 11/14/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023] Open
Abstract
Disrupted protein phosphorylation due to genetic variation is a widespread phenomenon that triggers oncogenic transformation of healthy cells. However, few relevant phosphorylation disruption events have been verified due to limited biological experimental methods. Because of the lack of reliable benchmark datasets, current bioinformatics methods primarily use sequence-based traits to study variant impact on phosphorylation (VIP). Here, we increased the number of experimentally supported VIP events from less than 30 to 740 by manually curating and reanalyzing multi-omics data from 916 patients provided by the Clinical Proteomic Tumor Analysis Consortium. To predict VIP events in cancer cells, we developed VIPpred, a machine learning method characterized by multidimensional features that exhibits robust performance across different cancer types. Our method provided a pan-cancer landscape of VIP events, which are enriched in cancer-related pathways and cancer driver genes. We found that variant-induced increases in phosphorylation events tend to inhibit the protein degradation of oncogenes and promote tumor suppressor protein degradation. Our work provides new insights into phosphorylation-related cancer biology as well as novel avenues for precision therapy.
Collapse
Affiliation(s)
- Xiaofeng Xu
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Ying Li
- Department of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China
| | - Taoyu Chen
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Chao Hou
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Liang Yang
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Peiyu Zhu
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Yi Zhang
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Tingting Li
- Department of Medical Bioinformatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
- Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China, Peking University, Beijing 100191, China
| |
Collapse
|
7
|
Ostroverkhova D, Przytycka TM, Panchenko AR. Cancer driver mutations: predictions and reality. Trends Mol Med 2023:S1471-4914(23)00067-9. [PMID: 37076339 DOI: 10.1016/j.molmed.2023.03.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/17/2023] [Accepted: 03/23/2023] [Indexed: 04/21/2023]
Abstract
Cancer cells accumulate many genetic alterations throughout their lifetime, but only a few of them drive cancer progression, termed driver mutations. Driver mutations may vary between cancer types and patients, can remain latent for a long time and become drivers at particular cancer stages, or may drive oncogenesis only in conjunction with other mutations. The high mutational, biochemical, and histological tumor heterogeneity makes driver mutation identification very challenging. In this review we summarize recent efforts to identify driver mutations in cancer and annotate their effects. We underline the success of computational methods to predict driver mutations in finding novel cancer biomarkers, including in circulating tumor DNA (ctDNA). We also report on the boundaries of their applicability in clinical research.
Collapse
Affiliation(s)
- Daria Ostroverkhova
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, ON, Canada
| | - Teresa M Przytycka
- National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD, USA.
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, ON, Canada; Department of Biology and Molecular Sciences, Queen's University, Kingston, ON, Canada; School of Computing, Queen's University, Kingston, ON, Canada; Ontario Institute of Cancer Research, Toronto, ON, Canada.
| |
Collapse
|
8
|
Pati SK, Gupta MK, Shai R, Banerjee A, Ghosh A. Missing value estimation of microarray data using Sim-GAN. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01718-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
9
|
Ozturk K, Carter H. Predicting functional consequences of mutations using molecular interaction network features. Hum Genet 2022; 141:1195-1210. [PMID: 34432150 PMCID: PMC8873243 DOI: 10.1007/s00439-021-02329-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 07/31/2021] [Indexed: 12/13/2022]
Abstract
Variant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets to demonstrate their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
10
|
Yazar M, Ozbek P. Assessment of 13 in silico pathogenicity methods on cancer-related variants. Comput Biol Med 2022; 145:105434. [DOI: 10.1016/j.compbiomed.2022.105434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/04/2022] [Accepted: 03/20/2022] [Indexed: 11/03/2022]
|
11
|
Raimondi D, Codicè F, Orlando G, Schymkowitz J, Rousseau F, Moreau Y. HPMPdb: a machine learning-ready database of protein molecular phenotypes associated to human missense variants. Curr Res Struct Biol 2022; 4:167-174. [PMID: 35669450 PMCID: PMC9166469 DOI: 10.1016/j.crstbi.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/24/2022] [Accepted: 04/25/2022] [Indexed: 11/10/2022] Open
Abstract
Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies on un-realistic assumptions such as the perfect Mendelian behavior of each SAV and considers only dichotomic phenotypes. Moreover, the link between the effect of a SAV on a protein (its molecular phenotype) and the individual phenotype is often very complex, because multiple level of biological abstraction connect the protein and individual level phenotypes. Here we present HPMPdb, a manually curated database containing human SAVs associated with the detailed description of the molecular phenotype they cause on the affected proteins. With particular regards to machine learning (ML), this database can be used to let researchers go beyond the existing Deleterious/Neutral prediction paradigm, allowing them to build molecular phenotype predictors instead. Our class labels describe in a succinct way the effects that each SAV has on 15 protein molecular phenotypes, such as protein-protein interaction, small molecules binding, function, post-translational modifications (PTMs), sub-cellular localization, mimetic PTM, folding and protein expression. Moreover, we provide researchers with all necessary means to re-producibly train and test their models on our database. The webserver and the data described in this paper are available at hpmp.esat.kuleuven.be. Current variant-effect predictors perform a coarse-grained modeling and rely on unrealistic assumptions. The link between the effect of a variant and the individual phenotype is complex. It would be more intuitive to predict the molecular phenotype that each variant causes on the carrier protein. HPMP is a manually curated database containing human variants associated with the molecular phenotype they cause on the affected proteins. We manually translated variants from Uniprot into 15 Machine Learning-ready labels describing the affected protein molecular phenotype. The goal of HPMP is to allow researchers to go beyond the existing variant-effect prediction paradigm and allow them to build molecular phenotype predictors instead. The webserver and the data described in this paper are available at hpmp.esat.kuleuven.be
Collapse
|
12
|
Clever Hans effect found in a widely used brain tumour MRI dataset. Med Image Anal 2022; 77:102368. [DOI: 10.1016/j.media.2022.102368] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 12/19/2021] [Accepted: 01/10/2022] [Indexed: 12/11/2022]
|
13
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
14
|
Javed Z, Khan K, Herrera-Bravo J, Naeem S, Iqbal MJ, Sadia H, Qadri QR, Raza S, Irshad A, Akbar A, Reiner Ž, Al-Harrasi A, Al-Rawahi A, Satmbekova D, Butnariu M, Bagiu IC, Bagiu RV, Sharifi-Rad J. Genistein as a regulator of signaling pathways and microRNAs in different types of cancers. Cancer Cell Int 2021; 21:388. [PMID: 34289845 PMCID: PMC8296701 DOI: 10.1186/s12935-021-02091-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/13/2021] [Indexed: 12/18/2022] Open
Abstract
Cancers are complex diseases orchestrated by a plethora of extrinsic and intrinsic factors. Research spanning over several decades has provided better understanding of complex molecular interactions responsible for the multifaceted nature of cancer. Recent advances in the field of next generation sequencing and functional genomics have brought us closer towards unravelling the complexities of tumor microenvironment (tumor heterogeneity) and deregulated signaling cascades responsible for proliferation and survival of tumor cells. Phytochemicals have begun to emerge as potent beneficial substances aimed to target deregulated signaling pathways. Isoflavonoid genistein is an essential phytochemical involved in regulation of key biological processes including those in different types of cancer. Emerging preclinical evidence have shown its anti-cancer, anti-inflammatory and anti-oxidant properties. Testing of this substance is in various phases of clinical trials. Comprehensive preclinical and clinical trials data is providing insight on genistein as a modulator of various signaling pathways both at transcription and translation levels. In this review we have explained the mechanistic regulation of several key cellular pathways by genistein. We have also addressed in detail various microRNAs regulated by genistein in different types of cancer. Moreover, application of nano-formulations to increase the efficiency of genistein is also discussed. Understanding the pleiotropic potential of genistein to regulate key cellular pathways and development of efficient drug delivery system will bring us a step towards designing better chemotherapeutics.
Collapse
Affiliation(s)
- Zeeshan Javed
- Office of Research Innovation and Commercialization (ORIC), Lahore Garrison University, Sector-C, DHA Phase-VI, Lahore, Pakistan
| | - Khushbukhat Khan
- Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, 44000, Pakistan
| | - Jesús Herrera-Bravo
- Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomas, Santiago, Chile.,Center of Molecular Biology and Pharmacogenetics, Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, 4811230, Temuco, Chile
| | - Sajid Naeem
- School of Life Sciences, Lanzhuo University, Lanzhou, 730000, People's Republic of China
| | - Muhammad Javed Iqbal
- Department of Biotechnology, Faculty of Sciences, University of Sialkot, Sialkot, Pakistan.
| | - Haleema Sadia
- Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Qamar Raza Qadri
- Institute of Biochemistry and Biotechnology, University of Veterinary and Animal Sciences, Lahore, Punjab, Pakistan
| | - Shahid Raza
- Office of Research Innovation and Commercialization (ORIC), Lahore Garrison University, Sector-C, DHA Phase-VI, Lahore, Pakistan
| | - Asma Irshad
- Department of Life Sciences, University of Management Sciences, Lahore, Pakistan
| | - Ali Akbar
- Department of Microbiology, University of Balochistan, Quetta, Pakistan
| | - Željko Reiner
- Department of Internal Medicine, University Hospital Centre Zagreb, School of Medicine, University of Zagreb, Zagreb, Croatia
| | - Ahmed Al-Harrasi
- Natural and Medical Sciences Research Centre, University of Nizwa, Birkat Almouz, Nizwa, 616, Oman
| | - Ahmed Al-Rawahi
- Natural and Medical Sciences Research Centre, University of Nizwa, Birkat Almouz, Nizwa, 616, Oman
| | - Dinara Satmbekova
- High School of Medicine, Al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Monica Butnariu
- Banat's University of Agricultural Sciences and Veterinary Medicine "King Michael I of Romania" From Timisoara, Timisoara, Romania.
| | - Iulia Cristina Bagiu
- Victor Babes University of Medicine and Pharmacy of Timisoara Discipline of Microbiology, Timisoara, Romania.,Multidisciplinary Research Center on Antimicrobial Resistance, Timisoara, Romania
| | - Radu Vasile Bagiu
- Victor Babes University of Medicine and Pharmacy of Timisoara Discipline of Microbiology, Timisoara, Romania.,Preventive Medicine Study Center, Timisoara, Romania
| | - Javad Sharifi-Rad
- Phytochemistry Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|