1
|
Beust C, Valdeolivas A, Baptista A, Brière G, Lévy N, Ozisik O, Baudot A. The Molecular Landscape of Premature Aging Diseases Defined by Multilayer Network Exploration. Adv Biol (Weinh) 2024; 8:e2400134. [PMID: 39123285 DOI: 10.1002/adbi.202400134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 06/26/2024] [Indexed: 08/12/2024]
Abstract
Premature Aging (PA) diseases are rare genetic disorders that mimic some aspects of physiological aging at an early age. Various causative genes of PA diseases have been identified in recent years, providing insights into some dysfunctional cellular processes. However, the identification of PA genes also revealed significant genetic heterogeneity and highlighted the gaps in this understanding of PA-associated molecular mechanisms. Furthermore, many patients remain undiagnosed. Overall, the current lack of knowledge about PA diseases hinders the development of effective diagnosis and therapies and poses significant challenges to improving patient care. Here, a network-based approach to systematically unravel the cellular functions disrupted in PA diseases is presented. Leveraging a network community identification algorithm, it is delved into a vast multilayer network of biological interactions to extract the communities of 67 PA diseases from their 132 associated genes. It is found that these communities can be grouped into six distinct clusters, each reflecting specific cellular functions: DNA repair, cell cycle, transcription regulation, inflammation, cell communication, and vesicle-mediated transport. That these clusters collectively represent the landscape of the molecular mechanisms that are perturbed in PA diseases, providing a framework for better understanding their pathogenesis is proposed. Intriguingly, most clusters also exhibited a significant enrichment in genes associated with physiological aging, suggesting a potential overlap between the molecular underpinnings of PA diseases and natural aging.
Collapse
Affiliation(s)
- Cécile Beust
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
| | - Alberto Valdeolivas
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
| | - Anthony Baptista
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
| | - Galadriel Brière
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
- Aix Marseille Univ, CNRS, I2M, Marseille, France
| | - Nicolas Lévy
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
| | - Ozan Ozisik
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics (MMG), Marseille, France
- CNRS, Marseille, France
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| |
Collapse
|
2
|
Eshel YD, Sharaha U, Beck G, Cohen-Logasi G, Lapidot I, Huleihel M, Mordechai S, Kapelushnik J, Salman A. Monitoring the efficacy of antibiotic therapy in febrile pediatric oncology patients with bacteremia using infrared spectroscopy of white blood cells-based machine learning. Talanta 2024; 270:125619. [PMID: 38199122 DOI: 10.1016/j.talanta.2023.125619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
Bacteremia refers to the presence of bacteria in the bloodstream, which can lead to a serious and potentially life-threatening condition. In oncology patients, individuals undergoing cancer treatment have a higher risk of developing bacteremia due to a weakened immune system resulting from the disease itself or the treatments they receive. Prompt and accurate detection of bacterial infections and monitoring the effectiveness of antibiotic therapy are essential for enhancing patient outcomes and preventing the development and dissemination of multidrug-resistant bacteria. Traditional methods of infection monitoring, such as blood cultures and clinical observations, are time-consuming, labor-intensive, and often subject to limitations. This manuscript presents an innovative application of infrared spectroscopy of leucocytes of pediatric oncology patients with bacteremia combined with machine learning to diagnose the etiology of infection as bacterial and simultaneously monitor the efficacy of the antibiotic therapy in febrile pediatric oncology patients with bacteremia infections. Through the implementation of effective monitoring, it becomes possible to promptly identify any indications of treatment failure. This, in turn, indirectly serves to limit the progression of antibiotic resistance. The logistic regression (LR) classifier was able to differentiate the samples as bacterial or control within an hour, after receiving the blood samples with a success rate of over 95 %. Additionally, initial findings indicate that employing infrared spectroscopy of white blood cells (WBCs) along with machine learning is viable for monitoring the success of antibiotic therapy. Our follow up results demonstrate an accuracy of 87.5 % in assessing the effectiveness of the antibiotic treatment.
Collapse
Affiliation(s)
- Yotam D Eshel
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Uraib Sharaha
- Department of Microbiology, Immunology, and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel; Department of Biology, Science and Technology College, Hebron University, Hebron, P760, Palestine
| | - Guy Beck
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Gal Cohen-Logasi
- Department of Green Engineering, SCE-Sami Shamoon College of Engineering, Beer-Sheva, 84100, Israel
| | - Itshak Lapidot
- Department of Electrical and Electronics Engineering, ACLP-Afeka Center for Language Processing, Afeka Tel-Aviv Academic College of Engineering, Tel-Aviv, 69107, Israel; LIA Avignon Université, 339 Chemin des Meinajaries, Avignon, 84000, France
| | - Mahmoud Huleihel
- Department of Microbiology, Immunology, and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
| | - Shaul Mordechai
- Department of Physics, Ben-Gurion University, Beer-Sheva, 84105, Israel
| | - Joseph Kapelushnik
- Department of Hematology and Oncology, Saban Pediatric Medical Center Soroka University Medical Center and Faculty of Health Sciences, Beer-Sheva, 84105, Israel
| | - Ahmad Salman
- Department of Physics, SCE-Sami Shamoon College of Engineering, Beer-Sheva, 84100, Israel.
| |
Collapse
|
3
|
Szatmári EZ, Csordás A, Kerepesi C. Unique Patterns in Amino Acid Sequences of Aging-Related Proteins. Adv Biol (Weinh) 2024; 8:e2300436. [PMID: 37880927 DOI: 10.1002/adbi.202300436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/14/2023] [Indexed: 10/27/2023]
Abstract
Aging has strong genetic components and the list of genes that may regulate the aging process is collected in the GenAge database. There may be characteristic patterns in the amino acid sequences of aging-related proteins that distinguish them from other proteins and this information will lead to a better understanding of the aging process. To test this hypothesis, human protein sequences are extracted from the UniProt database and the relative frequency of every amino acid residue in aging-related proteins and the remaining proteins is calculated. The main observation is that the mean relative frequency of aspartic acid (D) is consistently higher, while the mean relative frequencies of tryptophan (W) and leucine (L) are consistently lower in aging-related proteins compared to the non-aging-related proteins for the human and four examined model organisms. It is also observed that the mean relative frequency of aspartic acid is higher, while the mean relative frequency of tryptophan is lower in pro-longevity proteins compared to anti-longevity proteins in model organisms. Finally, it is found that aging-related proteins tend to be longer than non-aging-related proteins. It is hoped that this analysis initiates further computational and experimental research to explore the underlying mechanisms of these findings.
Collapse
Affiliation(s)
- Eszter Zita Szatmári
- Institute for Computer Science and Control (SZTAKI), Hungarian Research Network (HUN-REN), Budapest, 1111, Hungary
- Department of Applied Analysis and Computational Mathematics, Eötvös Loránd University (ELTE), Pázmány Péter sétány 1/C, Budapest, 1117, Hungary
| | | | - Csaba Kerepesi
- Institute for Computer Science and Control (SZTAKI), Hungarian Research Network (HUN-REN), Budapest, 1111, Hungary
| |
Collapse
|
4
|
Manyilov VD, Ilyinsky NS, Nesterov SV, Saqr BMGA, Dayhoff GW, Zinovev EV, Matrenok SS, Fonin AV, Kuznetsova IM, Turoverov KK, Ivanovich V, Uversky VN. Chaotic aging: intrinsically disordered proteins in aging-related processes. Cell Mol Life Sci 2023; 80:269. [PMID: 37634152 PMCID: PMC11073068 DOI: 10.1007/s00018-023-04897-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 07/03/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023]
Abstract
The development of aging is associated with the disruption of key cellular processes manifested as well-established hallmarks of aging. Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) have no stable tertiary structure that provide them a power to be configurable hubs in signaling cascades and regulate many processes, potentially including those related to aging. There is a need to clarify the roles of IDPs/IDRs in aging. The dataset of 1702 aging-related proteins was collected from established aging databases and experimental studies. There is a noticeable presence of IDPs/IDRs, accounting for about 36% of the aging-related dataset, which is however less than the disorder content of the whole human proteome (about 40%). A Gene Ontology analysis of the used here aging proteome reveals an abundance of IDPs/IDRs in one-third of aging-associated processes, especially in genome regulation. Signaling pathways associated with aging also contain IDPs/IDRs on different hierarchical levels, revealing the importance of "structure-function continuum" in aging. Protein-protein interaction network analysis showed that IDPs present in different clusters associated with different aging hallmarks. Protein cluster with IDPs enrichment has simultaneously high liquid-liquid phase separation (LLPS) probability, "nuclear" localization and DNA-associated functions, related to aging hallmarks: genomic instability, telomere attrition, epigenetic alterations, and stem cells exhaustion. Intrinsic disorder, LLPS, and aggregation propensity should be considered as features that could be markers of pathogenic proteins. Overall, our analyses indicate that IDPs/IDRs play significant roles in aging-associated processes, particularly in the regulation of DNA functioning. IDP aggregation, which can lead to loss of function and toxicity, could be critically harmful to the cell. A structure-based analysis of aging and the identification of proteins that are particularly susceptible to disturbances can enhance our understanding of the molecular mechanisms of aging and open up new avenues for slowing it down.
Collapse
Affiliation(s)
- Vladimir D Manyilov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Nikolay S Ilyinsky
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia.
| | - Semen V Nesterov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | - Baraa M G A Saqr
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Guy W Dayhoff
- Department of Chemistry, University of South Florida, Tampa, FL, USA
| | - Egor V Zinovev
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Simon S Matrenok
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Alexander V Fonin
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | - Irina M Kuznetsova
- Institute of Cytology, Russian Academy of Sciences, Saint Petersburg, 194064, Russia
| | | | - Valentin Ivanovich
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia
| | - Vladimir N Uversky
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy Pereulok, 9, Dolgoprudny, 141700, Russia.
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
| |
Collapse
|
5
|
Marino N, Putignano G, Cappilli S, Chersoni E, Santuccione A, Calabrese G, Bischof E, Vanhaelen Q, Zhavoronkov A, Scarano B, Mazzotta AD, Santus E. Towards AI-driven longevity research: An overview. FRONTIERS IN AGING 2023; 4:1057204. [PMID: 36936271 PMCID: PMC10018490 DOI: 10.3389/fragi.2023.1057204] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 02/06/2023] [Indexed: 03/06/2023]
Abstract
While in the past technology has mostly been utilized to store information about the structural configuration of proteins and molecules for research and medical purposes, Artificial Intelligence is nowadays able to learn from the existing data how to predict and model properties and interactions, revealing important knowledge about complex biological processes, such as aging. Modern technologies, moreover, can rely on a broader set of information, including those derived from the next-generation sequencing (e.g., proteomics, lipidomics, and other omics), to understand the interactions between human body and the external environment. This is especially relevant as external factors have been shown to have a key role in aging. As the field of computational systems biology keeps improving and new biomarkers of aging are being developed, artificial intelligence promises to become a major ally of aging research.
Collapse
Affiliation(s)
- Nicola Marino
- Women’s Brain Project (WBP), Gunterhausen, Switzerland
| | | | - Simone Cappilli
- Dermatology, Catholic University of the Sacred Heart, Rome, Italy
- UOC of Dermatology, Department of Abdominal and Endocrine Metabolic Medical and Surgical Sciences, A. Gemelli University Hospital Foundation-IRCCS, Rome, Italy
| | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | | | - Giuliana Calabrese
- Department of Translational Medicine and Surgery, CatholicUniversity of the Sacred Heart, Rome, Italy
| | - Evelyne Bischof
- Insilico Medicine Hong Kong Ltd., New Territories, Hong Kong SAR, China
| | - Quentin Vanhaelen
- Insilico Medicine Hong Kong Ltd., New Territories, Hong Kong SAR, China
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd., New Territories, Hong Kong SAR, China
| | - Bryan Scarano
- Department of Translational Medicine and Surgery, CatholicUniversity of the Sacred Heart, Rome, Italy
| | - Alessandro D. Mazzotta
- Department of Digestive, Oncological and Metabolic Surgery, Institute Mutualiste Montsouris, Paris, France
- Biorobotics Institute, Scuola Superiore Sant’anna, Pisa, Italy
| | | |
Collapse
|
6
|
Bairakdar MD, Tewari A, Truttmann MC. A meta-analysis of RNA-Seq studies to identify novel genes that regulate aging. Exp Gerontol 2023; 173:112107. [PMID: 36731807 PMCID: PMC10653729 DOI: 10.1016/j.exger.2023.112107] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/17/2023] [Accepted: 01/23/2023] [Indexed: 02/04/2023]
Abstract
Aging is a ubiquitous biological process that limits the maximal lifespan of most organisms. Significant efforts by many groups have identified mechanisms that, when triggered by natural or artificial stimuli, are sufficient to either enhance or decrease maximal lifespan. Previous aging studies using the nematode Caenorhabditis elegans (C. elegans) generated a wealth of publicly available transcriptomics datasets linking changes in gene expression to lifespan regulation. However, a comprehensive comparison of these datasets across studies in the context of aging biology is missing. Here, we carry out a systematic meta-analysis of over 1200 bulk RNA sequencing (RNASeq) samples obtained from 74 peer-reviewed publications on aging-related transcriptomic changes in C. elegans. Using both differential expression analyses and machine learning approaches, we mine the pooled data for novel pro-longevity genes. We find that both approaches identify known and propose novel pro-longevity genes. Further, we find that inter-lab experimental variance complicates the application of machine learning algorithms, a limitation that was not solved using bulk RNA-Seq batch correction and normalization techniques. Taken as a whole, our results indicate that machine learning approaches may hold promise for the identification of genes that regulate aging but will require more sophisticated batch correction strategies or standardized input data to reliably identify novel pro-longevity genes.
Collapse
Affiliation(s)
- Mohamad D Bairakdar
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ambuj Tewari
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA; Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Matthias C Truttmann
- Department of Molecular & Integrative Physiology, University of Michigan, Ann Arbor, MI, 48109, USA; Geriatrics Center, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
7
|
Sharaha U, Abu-Aqil G, Suleiman M, Riesenberg K, Lapidot I, Huleihel M, Salman A. Rapid determination of Proteus mirabilis susceptibility to antibiotics using infrared spectroscopy in tandem with random forest. JOURNAL OF BIOPHOTONICS 2023; 16:e202200198. [PMID: 36169094 DOI: 10.1002/jbio.202200198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 09/24/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Bacterial infections cause serious illnesses that are treated with antibiotics. Currently used methods for detecting bacterial antibiotic susceptibility consume 48-72 h, leading to overuse of antibiotics. Thus, many bacterial species have acquired resistance to a broad range of available antibiotics. There is an urgent need to develop efficient methods for rapid determination of bacterial susceptibility to antibiotics. The combination of machine learning and Fourier-transform infrared (FTIR) spectroscopy has generated a promising diagnostic approach in medicine and biology. Our main goal is to examine the potential of FTIR spectroscopy to determine the susceptibility of urinary tract infection-Proteus mirabilis to a specific range of antibiotics, within about 20 min after 24 h culture and identification. We measured the infrared spectra of 489 different P. mirabilis isolates and used random forest to analyze this spectral database. A classification success rate of ~84% was achieved in differentiating between the resistant and sensitive isolates based on their susceptibility to ceftazidime, ceftriaxone, cefuroxime, cefuroxime axetil, cephalexin, ciprofloxacin, gentamicin, and sulfamethoxazole antibiotics in a time span of 24 h instead of 48 h.
Collapse
Affiliation(s)
- Uraib Sharaha
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - George Abu-Aqil
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Manal Suleiman
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Klaris Riesenberg
- Internal Medicine E, Soroka University Medical Center, Beer-Sheva, Israel
| | - Itshak Lapidot
- Department of Electrical and Electronics Engineering, ACLP-Afeka Center for Language Processing, Afeka Tel-Aviv Academic College of Engineering, Tel-Aviv, Israel
| | - Mahmoud Huleihel
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Ahmad Salman
- Department of Physics, SCE - Shamoon College of Engineering, Beer-Sheva, Israel
| |
Collapse
|
8
|
Li Q, Newaz K, Milenković T. Towards future directions in data-integrative supervised prediction of human aging-related genes. BIOINFORMATICS ADVANCES 2022; 2:vbac081. [PMID: 36699345 PMCID: PMC9710570 DOI: 10.1093/bioadv/vbac081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/23/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022]
Abstract
Motivation Identification of human genes involved in the aging process is critical due to the incidence of many diseases with age. A state-of-the-art approach for this purpose infers a weighted dynamic aging-specific subnetwork by mapping gene expression (GE) levels at different ages onto the protein-protein interaction network (PPIN). Then, it analyzes this subnetwork in a supervised manner by training a predictive model to learn how network topologies of known aging- versus non-aging-related genes change across ages. Finally, it uses the trained model to predict novel aging-related gene candidates. However, the best current subnetwork resulting from this approach still yields suboptimal prediction accuracy. This could be because it was inferred using outdated GE and PPIN data. Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data. Results Unexpectedly, we find that not to be the case. To understand this, we perform aging-related pathway and Gene Ontology term enrichment analyses. We find that the suboptimal prediction accuracy, regardless of which GE or PPIN data is used, may be caused by the current knowledge about which genes are aging-related being incomplete, or by the current methods for inferring or analyzing an aging-specific subnetwork being unable to capture all of the aging-related knowledge. These findings can potentially guide future directions towards improving supervised prediction of aging-related genes via -omics data integration. Availability and implementation All data and code are available at zenodo, DOI: 10.5281/zenodo.6995045. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA
| | - Khalique Newaz
- Department of Computer Science and Engineering, Lucy Family Institute for Data & Society, and Eck Institute for Global Health (EIGH), University of Notre Dame, Notre Dame, IN 46556, USA,Center for Data and Computing in Natural Sciences (CDCS), Institute for Computational Systems Biology, Universität Hamburg, Hamburg 20146, Germany
| | | |
Collapse
|
9
|
Watanabe N, Yamamoto M, Murata M, Vavricka CJ, Ogino C, Kondo A, Araki M. Comprehensive Machine Learning Prediction of Extensive Enzymatic Reactions. J Phys Chem B 2022; 126:6762-6770. [PMID: 36053051 DOI: 10.1021/acs.jpcb.2c03287] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
New enzyme functions exist within the increasing number of unannotated protein sequences. Novel enzyme discovery is necessary to expand the pathways that can be accessed by metabolic engineering for the biosynthesis of functional compounds. Accordingly, various machine learning models have been developed to predict enzymatic reactions. However, the ability to predict unknown reactions that are not included in the training data has not been clarified. In order to cover uncertain and unknown reactions, a wider range of reaction types must be demonstrated by the models. Here, we establish 16 expanded enzymatic reaction prediction models developed using various machine learning algorithms, including deep neural network. Improvements in prediction performances over that of our previous study indicate that the updated methods are more effective for the prediction of enzymatic reactions. Overall, the deep neural network model trained with combined substrate-enzyme-product information exhibits the highest prediction accuracy with Macro F1 scores up to 0.966 and with robust prediction of unknown enzymatic reactions that are not included in the training data. This model can predict more extensive enzymatic reactions in comparison to previously reported models. This study will facilitate the discovery of new enzymes for the production of useful substances.
Collapse
Affiliation(s)
- Naoki Watanabe
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Masaki Yamamoto
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Masahiro Murata
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Christopher J Vavricka
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Chiaki Ogino
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Akihiko Kondo
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Michihiro Araki
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan.,National Institutes of Biomedical Innovation, Health and Nutrition, National Institute of Health and Nutrition, 1-23-1 Toyama, Shinjuku-ku, Tokyo 162-8638, Japan.,National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita, Osaka 564-8565, Japan
| |
Collapse
|
10
|
Li Q, Milenkovic T. Supervised Prediction of Aging-Related Genes From a Context-Specific Protein Interaction Subnetwork. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2484-2498. [PMID: 33929964 DOI: 10.1109/tcbb.2021.3076961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human aging is linked to many prevalent diseases. The aging process is highly influenced by genetic factors. Hence, it is important to identify human aging-related genes. We focus on supervised prediction of such genes. Gene expression-based methods for this purpose study genes in isolation from each other. While protein-protein interaction (PPI) network-based methods for this purpose account for interactions between genes' protein products, current PPI network data are context-unspecific, spanning different biological conditions. Instead, here, we focus on an aging-specific subnetwork of the entire PPI network, obtained by integrating aging-specific gene expression data and PPI network data. The potential of such data integration has been recognized but mostly in the context of cancer. So, we are the first to propose a supervised learning framework for predicting aging-related genes from an aging-specific PPI subnetwork. In a systematic and comprehensive evaluation, we find that in many of the evaluation tests: (i) using an aging-specific subnetwork indeed yields more accurate aging-related gene predictions than using the entire network, and (ii) predictive methods from our framework that have not previously been used for supervised prediction of aging-related genes outperform existing prominent methods for the same purpose. These results justify the need for our framework.
Collapse
|
11
|
Oliviero G, Kovalchuk S, Rogowska-Wrzesinska A, Schwämmle V, Jensen ON. Distinct and diverse chromatin-proteomes of ageing mouse organs reveal protein signatures that correlate with physiological functions. eLife 2022; 11:73524. [PMID: 35259090 PMCID: PMC8933006 DOI: 10.7554/elife.73524] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
Temporal molecular changes in ageing mammalian organs are of relevance to disease aetiology because many age-related diseases are linked to changes in the transcriptional and epigenetic machinery that regulate gene expression. We performed quantitative proteome analysis of chromatin-enriched protein extracts to investigate the dynamics of the chromatin proteomes of the mouse brain, heart, lung, kidney, liver, and spleen at 3, 5, 10, and 15 months of age. Each organ exhibited a distinct chromatin proteome and sets of unique proteins. The brain and spleen chromatin proteomes were the most extensive, diverse, and heterogenous among the six organs. The spleen chromatin proteome appeared static during the lifespan, presenting a young phenotype that reflects the permanent alertness state and important role of this organ in physiological defence and immunity. We identified a total of 5928 proteins, including 2472 nuclear or chromatin-associated proteins across the six mouse organs. Up to 3125 proteins were quantified in each organ, demonstrating distinct and organ-specific temporal protein expression timelines and regulation at the post-translational level. Bioinformatics meta-analysis of these chromatin proteomes revealed distinct physiological and ageing-related features for each organ. Our results demonstrate the efficiency of organelle-specific proteomics for in vivo studies of a model organism and consolidate the hypothesis that chromatin-associated proteins are involved in distinct and specific physiological functions in ageing organs.
Collapse
Affiliation(s)
- Giorgio Oliviero
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Sergey Kovalchuk
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | | | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Ole N Jensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
12
|
Németh Á, Daróczy B, Juhász L, Fülöp P, Harangi M, Paragh G. Assessment of Associations Between Serum Lipoprotein (a) Levels and Atherosclerotic Vascular Diseases in Hungarian Patients With Familial Hypercholesterolemia Using Data Mining and Machine Learning. Front Genet 2022; 13:849197. [PMID: 35222552 PMCID: PMC8864223 DOI: 10.3389/fgene.2022.849197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 01/24/2022] [Indexed: 11/30/2022] Open
Abstract
Background and aims: Premature mortality due to atherosclerotic vascular disease is very high in Hungary in comparison with international prevalence rates, though the estimated prevalence of familial hypercholesterolemia (FH) is in line with the data of other European countries. Previous studies have shown that high lipoprotein(a)- Lp(a) levels are associated with an increased risk of atherosclerotic vascular diseases in patients with FH. We aimed to assess the associations of serum Lp(a) levels and such vascular diseases in FH using data mining methods and machine learning techniques in the Northern Great Plain region of Hungary. Methods: Medical records of 590,500 patients were included in our study. Based on the data from previously diagnosed FH patients using the Dutch Lipid Clinic Network scores (≥7 was evaluated as probable or definite FH), we trained machine learning models to identify FH patients. Results: We identified 459 patients with FH and 221 of them had data available on Lp(a). Patients with FH had significantly higher Lp(a) levels compared to non-FH subjects [236 (92.5; 698.5) vs. 167 (80.2; 431.5) mg/L, p < .01]. Also 35.3% of FH patients had Lp(a) levels >500 mg/L. Atherosclerotic complications were significantly more frequent in FH patients compared to patients without FH (46.6 vs. 13.9%). However, contrary to several other previous studies, we could not find significant associations between serum Lp(a) levels and atherosclerotic vascular diseases in the studied Hungarian FH patient group. Conclusion: The extremely high burden of vascular disease is mainly explained by the unhealthy lifestyle of our patients (i.e., high prevalence of smoking, unhealthy diet and physical inactivity resulting in obesity and hypertension). The lack of associations between serum Lp(a) levels and atherosclerotic vascular diseases in Hungarian FH patients may be due to the high prevalence of these risk factors, that mask the deleterious effect of Lp(a).
Collapse
Affiliation(s)
- Ákos Németh
- Department of Internal Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- Doctoral School of Health Sciences, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Bálint Daróczy
- Institute for Computer Science and Control, Hungarian Academy of Sciences, (MTA SZTAKI), Budapest, Hungary
- Université Catholique de Louvain, INMA, Louvain-la-Neuve, Belgium
| | - Lilla Juhász
- Department of Internal Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- Doctoral School of Health Sciences, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Péter Fülöp
- Department of Internal Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Mariann Harangi
- Department of Internal Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - György Paragh
- Department of Internal Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- *Correspondence: György Paragh,
| |
Collapse
|
13
|
DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8393498. [PMID: 35111213 PMCID: PMC8803417 DOI: 10.1155/2022/8393498] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 11/20/2021] [Accepted: 12/22/2021] [Indexed: 12/28/2022]
Abstract
Purpose. Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. Methods. The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson’s correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. Results. Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. Conclusion. These results showed that the proposed model can predict age for healthy and diseased samples.
Collapse
|
14
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
15
|
da Silva PN, Plastino A, Fabris F, Freitas AA. A Novel Feature Selection Method for Uncertain Features: An Application to the Prediction of Pro-/Anti-Longevity Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2230-2238. [PMID: 32324561 DOI: 10.1109/tcbb.2020.2988450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Understanding the ageing process is a very challenging problem for biologists. To help in this task, there has been a growing use of classification methods (from machine learning) to learn models that predict whether a gene influences the process of ageing or promotes longevity. One type of predictive feature often used for learning such classification models is Protein-Protein Interaction (PPI) features. One important property of PPI features is their uncertainty, i.e., a given feature (PPI annotation) is often associated with a confidence score, which is usually ignored by conventional classification methods. Hence, we propose the Lazy Feature Selection for Uncertain Features (LFSUF) method, which is tailored for coping with the uncertainty in PPI confidence scores. In addition, following the lazy learning paradigm, LFSUF selects features for each instance to be classified, making the feature selection process more flexible. We show that our LFSUF method achieves better predictive accuracy when compared to other feature selection methods that either do not explicitly take PPI confidence scores into account or deal with uncertainty globally rather than using a per-instance approach. Also, we interpret the results of the classification process using the features selected by LFSUF, showing that the number of selected features is significantly reduced, assisting the interpretability of the results. The datasets used in the experiments and the program code of the LFSUF method are freely available on the web at http://github.com/pablonsilva/FSforUncertainFeatureSpaces.
Collapse
|
16
|
Li Q, Newaz K, Milenković T. Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC Bioinformatics 2021; 22:520. [PMID: 34696741 PMCID: PMC8543111 DOI: 10.1186/s12859-021-04439-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/12/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND This study focuses on the task of supervised prediction of aging-related genes from -omics data. Unlike gene expression methods for this task that capture aging-specific information but ignore interactions between genes (i.e., their protein products), or protein-protein interaction (PPI) network methods for this task that account for PPIs but the PPIs are context-unspecific, we recently integrated the two data types into an aging-specific PPI subnetwork, which yielded more accurate aging-related gene predictions. However, a dynamic aging-specific subnetwork did not improve prediction performance compared to a static aging-specific subnetwork, despite the aging process being dynamic. This could be because the dynamic subnetwork was inferred using a naive Induced subgraph approach. Instead, we recently inferred a dynamic aging-specific subnetwork using a methodologically more advanced notion of network propagation (NP), which improved upon Induced dynamic aging-specific subnetwork in a different task, that of unsupervised analyses of the aging process. RESULTS Here, we evaluate whether our existing NP-based dynamic subnetwork will improve upon the dynamic as well as static subnetwork constructed by the Induced approach in the considered task of supervised prediction of aging-related genes. The existing NP-based subnetwork is unweighted, i.e., it gives equal importance to each of the aging-specific PPIs. Because accounting for aging-specific edge weights might be important, we additionally propose a weighted NP-based dynamic aging-specific subnetwork. We demonstrate that a predictive machine learning model trained and tested on the weighted subnetwork yields higher accuracy when predicting aging-related genes than predictive models run on the existing unweighted dynamic or static subnetworks, regardless of whether the existing subnetworks were inferred using NP or the Induced approach. CONCLUSIONS Our proposed weighted dynamic aging-specific subnetwork and its corresponding predictive model could guide with higher confidence than the existing data and models the discovery of novel aging-related gene candidates for future wet lab validation.
Collapse
Affiliation(s)
- Qi Li
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Khalique Newaz
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Center for Network and Data Science (CNDS), and Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
17
|
Freitas AA. Investigating the role of Simpson's paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets. Brief Bioinform 2021; 21:421-428. [PMID: 30629111 DOI: 10.1093/bib/bby126] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/16/2018] [Accepted: 12/04/2018] [Indexed: 01/05/2023] Open
Abstract
An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning-based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson's paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson's paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson's paradox involving top-ranked predictors are much more common for one of the feature ranking methods.
Collapse
|
18
|
Analysis and Prediction of Adverse Reaction of Drugs with Machine Learning Models for Tracking the Severity. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05999-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
19
|
Mercatelli D, Pedace E, Veltri P, Giorgi FM, Guzzi PH. Exploiting the molecular basis of age and gender differences in outcomes of SARS-CoV-2 infections. Comput Struct Biotechnol J 2021; 19:4092-4100. [PMID: 34306570 PMCID: PMC8271029 DOI: 10.1016/j.csbj.2021.07.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 07/06/2021] [Accepted: 07/06/2021] [Indexed: 12/15/2022] Open
Abstract
Motivation: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection (coronavirus disease, 2019; COVID-19) is associated with adverse outcomes in patients. It has been observed that lethality seems to be related to the age of patients. While ageing has been extensively demonstrated to be accompanied by some modifications at the gene expression level, a possible link with COVID-19 manifestation still need to be investigated at the molecular level. Objectives: This study aims to shed out light on a possible link between the increased COVID-19 lethality and the molecular changes that occur in elderly people. Methods: We considered public datasets of ageing-related genes and their expression at the tissue level. We selected human proteins interacting with viral ones that are known to be related to the ageing process. Finally, we investigated changes in the expression level of coding genes at the tissue, gender and age level. Results: We observed a significant intersection between some SARS-CoV-2 interactors and ageing-related genes, suggesting that those genes are particularly affected by COVID-19 infection. Our analysis evidenced that virus infection particularly involves ageing molecular mechanisms centred around proteins EEF2, NPM1, HMGA1, HMGA2, APEX1, CHEK1, PRKDC, and GPX4. We found that HMGA1 and NPM1 have different expressions in the lung of males, while HMGA1, APEX1, CHEK1, EEF2, and NPM1 present changes in expression in males due to ageing effects. Conclusion: Our study generated a mechanistic framework to clarify the correlation between COVID-19 incidence in elderly patients and molecular mechanisms of ageing. We also provide testable hypotheses for future investigation and pharmacological solutions tailored to specific age ranges.
Collapse
Affiliation(s)
| | | | - Pierangelo Veltri
- University of Catanzaro, Department of Medical and Surgical Sciences, Italy
| | | | - Pietro Hiram Guzzi
- University of Catanzaro, Department of Medical and Surgical Sciences, Italy
| |
Collapse
|
20
|
Application of artificial intelligence for detection of chemico-biological interactions associated with oxidative stress and DNA damage. Chem Biol Interact 2021; 345:109533. [PMID: 34051207 DOI: 10.1016/j.cbi.2021.109533] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 05/17/2021] [Accepted: 05/24/2021] [Indexed: 12/16/2022]
Abstract
In recent years, various AI-based methods have been developed in order to uncover chemico-biological interactions associated with DNA damage and oxidative stress. Various decision trees, bayesian networks, random forests, logistic regression models, support vector machines as well as deep learning tools, have great potential in the area of molecular biology and toxicology, and it is estimated that in the future, they will greatly contribute to our understanding of molecular and cellular mechanisms associated with DNA damage and repair. In this concise review, we discuss recent attempts to build machine learning tools for assessment of radiation - induced DNA damage as well as algorithms that can analyze the data from the most frequently used DNA damage assays in molecular biology. We also review recent works on the detection of antioxidant proteins with machine learning, and the use of AI-related methods for prediction and evaluation of noncoding DNA sequences. Finally, we discuss previously published research on the potential application of machine learning tools in aging research.
Collapse
|
21
|
Analysis of aging-related protein interactome and cross-network module comparisons across tissues provide new insights into aging. Comput Biol Chem 2021; 92:107506. [PMID: 34020164 DOI: 10.1016/j.compbiolchem.2021.107506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 04/09/2021] [Accepted: 05/05/2021] [Indexed: 11/22/2022]
Abstract
Delaying the human aging process and thus eliminating the risk factors for age-related diseases is one of the prime objectives. While various aging-associated genes and proteins have been characterized, which provide a significant understanding of the human aging process, a significant success in regulating aging is not achieved yet. Understanding how aging proteins interact with each other and also with other proteins could provide important insights into the underlying mechanisms governing the aging process. Therefore, in this work, information of gene expression was included to the static aging-related protein interactome to understand the network-based relationships among aging-related essential (AE) proteins, aging-related non-essential (ANE) proteins, and housekeeping-proteins that could regulate or influence aging. Comprehensive analyses provided various systems-level insights into the regulatory characteristics of aging; for example, (i) network-based correlation analysis predicted functional relationships among AE proteins and ANE proteins; (ii) network variability analysis predicted aging to affect different tissues in strikingly different ways by differentially regulating various regulatory interactions; (iii) cross-network comparisons identified two aging-related modules to be significantly conserved across most of the tissues. Overall, the findings obtained during this study could be helpful for researchers to delay, prevent, or even reverse various aspects of the aging.
Collapse
|
22
|
Kim SK, Goughnour PC, Lee EJ, Kim MH, Chae HJ, Yun GY, Kim YR, Choi JW. Identification of drug combinations on the basis of machine learning to maximize anti-aging effects. PLoS One 2021; 16:e0246106. [PMID: 33507975 PMCID: PMC7843016 DOI: 10.1371/journal.pone.0246106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 01/13/2021] [Indexed: 11/19/2022] Open
Abstract
Aging is a multifactorial process that involves numerous genetic changes, so identifying anti-aging agents is quite challenging. Age-associated genetic factors must be better understood to search appropriately for anti-aging agents. We utilized an aging-related gene expression pattern-trained machine learning system that can implement reversible changes in aging by linking combinatory drugs. In silico gene expression pattern-based drug repositioning strategies, such as connectivity map, have been developed as a method for unique drug discovery. However, these strategies have limitations such as lists that differ for input and drug-inducing genes or constraints to compare experimental cell lines to target diseases. To address this issue and improve the prediction success rate, we modified the original version of expression profiles with a stepwise-filtered method. We utilized a machine learning system called deep-neural network (DNN). Here we report that combinational drug pairs using differential expressed genes (DEG) had a more enhanced anti-aging effect compared with single independent treatments on leukemia cells. This study shows potential drug combinations to retard the effects of aging with higher efficacy using innovative machine learning techniques.
Collapse
Affiliation(s)
- Sun Kyung Kim
- College of Pharmacy, Kyung Hee University, Seoul, Republic of Korea
| | | | - Eui Jin Lee
- College of Pharmacy, Kyung Hee University, Seoul, Republic of Korea
| | - Myeong Hyun Kim
- Center for Research and Development, Oncocross Ltd., Seoul, Republic of Korea
| | - Hee Jin Chae
- Center for Research and Development, Oncocross Ltd., Seoul, Republic of Korea
| | - Gwang Yeul Yun
- Center for Research and Development, Oncocross Ltd., Seoul, Republic of Korea
| | - Yi Rang Kim
- Center for Research and Development, Oncocross Ltd., Seoul, Republic of Korea
- Department of Hematology/Oncology, Yuseong Sun Hospital, Daejeon, Republic of Korea
- * E-mail: (YRK); (JWC)
| | - Jin Woo Choi
- College of Pharmacy, Kyung Hee University, Seoul, Republic of Korea
- Department of Life and Nano-pharmaceutical Sciences, Kyung Hee University, Seoul, Republic of Korea
- * E-mail: (YRK); (JWC)
| |
Collapse
|
23
|
Agbaria AH, Beck G, Lapidot I, Rich DH, Kapelushnik J, Mordechai S, Salman A, Huleihel M. Diagnosis of inaccessible infections using infrared microscopy of white blood cells and machine learning algorithms. Analyst 2020; 145:6955-6967. [PMID: 32852502 DOI: 10.1039/d0an00752h] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Physicians diagnose subjectively the etiology of inaccessible infections where sampling is not feasible (such as, pneumonia, sinusitis, cholecystitis, peritonitis), as bacterial or viral. The diagnosis is based on their experience with some medical markers like blood counts and medical symptoms since it is harder to obtain swabs and reliable laboratory results for most cases. In this study, infrared spectroscopy with machine learning algorithms was used for the rapid and objective diagnosis of the etiology of inaccessible infections and enables an assessment of the error for the subjective diagnosis of the etiology of these infections by physicians. Our approach allows for diagnoses of the etiology of both accessible and inaccessible infections as based on an analysis of the innate immune system response through infrared spectroscopy measurements of white blood cell (WBC) samples. In the present study, we examined 343 individuals involving 113 controls, 89 inaccessible bacterial infections, 54 accessible bacterial infections, 60 inaccessible viral infections, and 27 accessible viral infections. Using our approach, the results show that it is possible to differentiate between controls and infections (combined bacterial and viral) with 95% accuracy, and enabling the diagnosis of the etiology of accessible infections as bacterial or viral with >94% sensitivity and > 90% specificity within one hour after the collection of the blood sample with error rate <6%. Based on our approach, the error rate of the physicians' subjective diagnosis of the etiology of inaccessible infections was found to be >23%.
Collapse
Affiliation(s)
- Adam H Agbaria
- Department of Physics, Ben-Gurion University, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Soleimani Zakeri NS, Pashazadeh S, MotieGhader H. Gene biomarker discovery at different stages of Alzheimer using gene co-expression network approach. Sci Rep 2020; 10:12210. [PMID: 32699331 PMCID: PMC7376049 DOI: 10.1038/s41598-020-69249-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 07/08/2020] [Indexed: 12/24/2022] Open
Abstract
Alzheimer's disease (AD) is a chronic neurodegenerative disorder. It is the most common type of dementia that has remained as an incurable disease in the world, which destroys the brain cells irreversibly. In this study, a systems biology approach was adopted to discover novel micro-RNA and gene-based biomarkers of the diagnosis of Alzheimer's disease. The gene expression data from three AD stages (Normal, Mild Cognitive Impairment, and Alzheimer) were used to reconstruct co-expression networks. After preprocessing and normalization, Weighted Gene Co-Expression Network Analysis (WGCNA) was used on a total of 329 samples, including 145 samples of Alzheimer stage, 80 samples of Mild Cognitive Impairment (MCI) stage, and 104 samples of the Normal stage. Next, three gene-miRNA bipartite networks were reconstructed by comparing the changes in module groups. Then, the functional enrichment analyses of extracted genes of three bipartite networks and miRNAs were done, respectively. Finally, a detailed analysis of the authentic studies was performed to discuss the obtained biomarkers. The outcomes addressed proposed novel genes, including MBOAT1, ARMC7, RABL2B, HNRNPUL1, LAMTOR1, PLAGL2, CREBRF, LCOR, and MRI1and novel miRNAs comprising miR-615-3p, miR-4722-5p, miR-4768-3p, miR-1827, miR-940 and miR-30b-3p which were related to AD. These biomarkers were proposed to be related to AD for the first time and should be examined in future clinical studies.
Collapse
Affiliation(s)
| | - Saeid Pashazadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.
| | - Habib MotieGhader
- Department of Computer Engineering, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| |
Collapse
|
25
|
Mendik P, Dobronyi L, Hári F, Kerepesi C, Maia-Moço L, Buszlai D, Csermely P, Veres DV. Translocatome: a novel resource for the analysis of protein translocation between cellular organelles. Nucleic Acids Res 2020; 47:D495-D505. [PMID: 30380112 PMCID: PMC6324082 DOI: 10.1093/nar/gky1044] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 10/25/2018] [Indexed: 01/02/2023] Open
Abstract
Here we present Translocatome, the first dedicated database of human translocating proteins (URL: http://translocatome.linkgroup.hu). The core of the Translocatome database is the manually curated data set of 213 human translocating proteins listing the source of their experimental validation, several details of their translocation mechanism, their local compartmentalized interactome, as well as their involvement in signalling pathways and disease development. In addition, using the well-established and widely used gradient boosting machine learning tool, XGBoost, Translocatome provides translocation probability values for 13 066 human proteins identifying 1133 and 3268 high- and low-confidence translocating proteins, respectively. The database has user-friendly search options with a UniProt autocomplete quick search and advanced search for proteins filtered by their localization, UniProt identifiers, translocation likelihood or data complexity. Download options of search results, manually curated and predicted translocating protein sets are available on its website. The update of the database is helped by its manual curation framework and connection to the previously published ComPPI compartmentalized protein–protein interaction database (http://comppi.linkgroup.hu). As shown by the application examples of merlin (NF2) and tumor protein 63 (TP63) Translocatome allows a better comprehension of protein translocation as a systems biology phenomenon and can be used as a discovery-tool in the protein translocation field.
Collapse
Affiliation(s)
- Péter Mendik
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Levente Dobronyi
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Ferenc Hári
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Csaba Kerepesi
- Institute for Computer Science and Control (MTA SZTAKI), Hungarian Academy of Sciences, Budapest, Hungary.,Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary
| | - Leonardo Maia-Moço
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.,Cancer Biology and Epigenetics Group, Research Center of Portuguese Oncology Institute of Porto, Portugal
| | - Donát Buszlai
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Peter Csermely
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary
| | - Daniel V Veres
- Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.,Turbine Ltd., Budapest, Hungary
| |
Collapse
|
26
|
Salman A, Lapidot I, Shufan E, Agbaria AH, Porat Katz BS, Mordechai S. Potential of infrared microscopy to differentiate between dementia with Lewy bodies and Alzheimer's diseases using peripheral blood samples and machine learning algorithms. JOURNAL OF BIOMEDICAL OPTICS 2020; 25:1-15. [PMID: 32329265 PMCID: PMC7177186 DOI: 10.1117/1.jbo.25.4.046501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/09/2020] [Indexed: 06/11/2023]
Abstract
SIGNIFICANCE Accurate and objective identification of Alzheimer's disease (AD) and dementia with Lewy bodies (DLB) is of major clinical importance due to the current lack of low-cost and noninvasive diagnostic tools to differentiate between the two. Developing an approach for such identification can have a great impact in the field of dementia diseases as it would offer physicians a routine objective test to support their diagnoses. The problem is especially acute because these two dementias have some common symptoms and characteristics, which can lead to misdiagnosis of DLB as AD and vice versa, mainly at their early stages. AIM The aim is to evaluate the potential of mid-infrared (IR) spectroscopy in tandem with machine learning algorithms as a sensitive method to detect minor changes in the biochemical structures that accompany the development of AD and DLB based on a simple peripheral blood test, thus improving the diagnostic accuracy of differentiation between DLB and AD. APPROACH IR microspectroscopy was used to examine white blood cells and plasma isolated from 56 individuals: 26 controls, 20 AD patients, and 10 DLB patients. The measured spectra were analyzed via machine learning. RESULTS Our encouraging results show that it is possible to differentiate between dementia (AD and DLB) and controls with an ∼86 % success rate and between DLB and AD patients with a success rate of better than 93%. CONCLUSIONS The success of this method makes it possible to suggest a new, simple, and powerful tool for the mental health professional, with the potential to improve the reliability and objectivity of diagnoses of both AD and DLB.
Collapse
Affiliation(s)
- Ahmad Salman
- Shamoon College of Engineering, Department of Physics, Beer-Sheva, Israel
| | - Itshak Lapidot
- Afeka Tel-Aviv Academic College of Engineering, Afeka Center for Language Processing, Department of Electrical and Electronics Engineering, Tel-Aviv, Israel
| | - Elad Shufan
- Shamoon College of Engineering, Department of Physics, Beer-Sheva, Israel
| | - Adam H. Agbaria
- Ben-Gurion University of the Negev, Department of Physics, Faculty of Natural Sciences, Beer-Sheva, Israel
| | - Bat-Sheva Porat Katz
- The Hebrew University of Jerusalem, School of Nutritional Sciences, The Robert H. Smith Faculty of Agriculture, Food, and Environment, Rehovot, Israel
- Kaplan Medical Center, Rehovot, Israel
| | - Shaul Mordechai
- Ben-Gurion University of the Negev, Department of Physics, Faculty of Natural Sciences, Beer-Sheva, Israel
| |
Collapse
|
27
|
Hamadeh L, Imran S, Bencsik M, Sharpe GR, Johnson MA, Fairhurst DJ. Machine Learning Analysis for Quantitative Discrimination of Dried Blood Droplets. Sci Rep 2020; 10:3313. [PMID: 32094359 PMCID: PMC7040018 DOI: 10.1038/s41598-020-59847-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 01/24/2020] [Indexed: 01/30/2023] Open
Abstract
One of the most interesting and everyday natural phenomenon is the formation of different patterns after the evaporation of liquid droplets on a solid surface. The analysis of dried patterns from blood droplets has recently gained a lot of attention, experimentally and theoretically, due to its potential application in diagnostic medicine and forensic science. This paper presents evidence that images of dried blood droplets have a signature revealing the exhaustion level of the person, and discloses an entirely novel approach to studying human dried blood droplet patterns. We took blood samples from 30 healthy young male volunteers before and after exhaustive exercise, which is well known to cause large changes to blood chemistry. We objectively and quantitatively analysed 1800 images of dried blood droplets, developing sophisticated image processing analysis routines and optimising a multivariate statistical machine learning algorithm. We looked for statistically relevant correlations between the patterns in the dried blood droplets and exercise-induced changes in blood chemistry. An analysis of the various measured physiological parameters was also investigated. We found that when our machine learning algorithm, which optimises a statistical model combining Principal Component Analysis (PCA) as an unsupervised learning method and Linear Discriminant Analysis (LDA) as a supervised learning method, is applied on the logarithmic power spectrum of the images, it can provide up to 95% prediction accuracy, in discriminating the physiological conditions, i.e., before or after physical exercise. This correlation is strongest when all ten images taken per volunteer per condition are averaged, rather than treated individually. Having demonstrated proof-of-principle, this method can be applied to identify diseases.
Collapse
Affiliation(s)
- Lama Hamadeh
- Department of Physics and Mathematics, School of Science and Technology, Nottingham Trent University, Nottingham, Clifton Campus, NG11 8NS, United Kingdom.
| | - Samia Imran
- Department of Physics and Mathematics, School of Science and Technology, Nottingham Trent University, Nottingham, Clifton Campus, NG11 8NS, United Kingdom
| | - Martin Bencsik
- Department of Physics and Mathematics, School of Science and Technology, Nottingham Trent University, Nottingham, Clifton Campus, NG11 8NS, United Kingdom
| | - Graham R Sharpe
- Exercise and Health Research Group, Sport, Health and Performance Enhancement (SHAPE) Research Centre, School of Science and Technology, Nottingham Trent University, Clifton Campus, NG11 8NS, United Kingdom
| | - Michael A Johnson
- Exercise and Health Research Group, Sport, Health and Performance Enhancement (SHAPE) Research Centre, School of Science and Technology, Nottingham Trent University, Clifton Campus, NG11 8NS, United Kingdom
| | - David J Fairhurst
- Department of Physics and Mathematics, School of Science and Technology, Nottingham Trent University, Nottingham, Clifton Campus, NG11 8NS, United Kingdom
| |
Collapse
|
28
|
Agbaria AH, Rosen GB, Lapidot I, Rich DH, Mordechai S, Kapelushnik J, Huleihel M, Salman A. Rapid diagnosis of infection etiology in febrile pediatric oncology patients using infrared spectroscopy of leukocytes. JOURNAL OF BIOPHOTONICS 2020; 13:e201900215. [PMID: 31566906 DOI: 10.1002/jbio.201900215] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/27/2019] [Accepted: 09/15/2019] [Indexed: 06/10/2023]
Abstract
Rapid diagnosis of the etiology of infection is highly important for an effective treatment of the infected patients. Bacterial and viral infections are serious diseases that can cause death in many cases. The human immune system deals with many viral and bacterial infections that cause no symptoms and pass quietly without treatment. However, oncology patients undergoing chemotherapy have a very weak immune system caused by leukopenia, and even minor pathogen infection threatens their lives. For this reason, physicians tend to prescribe immediately several types of antibiotics for febrile pediatric oncology patients (FPOPs). Uncontrolled use of antibiotics is one of the major contributors to the development of resistant bacteria. Therefore, for oncology patients, a rapid and objective diagnosis of the etiology of the infection is extremely critical. Current identification methods are time-consuming (>24 h). In this study, the potential of midinfrared spectroscopy in tandem with machine learning algorithms is evaluated for rapid and objective diagnosis of the etiology of infections in FPOPs using simple peripheral blood samples. Our results show that infrared spectroscopy enables the diagnosis of the etiology of infection as bacterial or viral within 70 minutes after the collection of the blood sample with 93% sensitivity and 88% specificity.
Collapse
Affiliation(s)
- Adam H Agbaria
- Department of Physics, Ben-Gurion University, Beer-Sheva, Israel
| | - Guy Beck Rosen
- Department of Hematology, Soroka University Medical Center, Beer-Sheva, Israel
| | - Itshak Lapidot
- Department of Electrical and Electronics Engineering, ACLP-Afeka Center for Language Processing, Afeka Tel-Aviv Academic College of Engineering, Tel-Aviv, Israel
| | - Daniel H Rich
- Department of Physics, Ben-Gurion University, Beer-Sheva, Israel
| | - Shaul Mordechai
- Department of Physics, Ben-Gurion University, Beer-Sheva, Israel
| | - Joseph Kapelushnik
- Department of Hematology, Soroka University Medical Center, Beer-Sheva, Israel
| | - Mahmoud Huleihel
- Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Ahmad Salman
- Department of Physics, SCE-Sami Shamoon College of Engineering, Beer-Sheva, Israel
| |
Collapse
|
29
|
Pazos Obregón F, Palazzo M, Soto P, Guerberoff G, Yankilevich P, Cantera R. An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach. BMC Genomics 2019; 20:1011. [PMID: 31870293 PMCID: PMC6929295 DOI: 10.1186/s12864-019-6380-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 12/09/2019] [Indexed: 11/25/2022] Open
Abstract
Background Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy Conclusions We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability http://synapticgenes.bnd.edu.uy
Collapse
Affiliation(s)
- Flavio Pazos Obregón
- Neurodevelopmental Biology Department, Instituto de Investigaciones Biológicas Clemente Estable, Montevideo, Uruguay.
| | - Martín Palazzo
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA), CONICET - Partner Institute of the Max Planck Society, Buenos Aires, Argentina
| | - Pablo Soto
- Neurodevelopmental Biology Department, Instituto de Investigaciones Biológicas Clemente Estable, Montevideo, Uruguay
| | - Gustavo Guerberoff
- Instituto de Matemática y Estadística "Prof. Ing. Rafael Laguardia", Facultad de Ingeniería, UDELAR, Montevideo, Uruguay
| | - Patricio Yankilevich
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA), CONICET - Partner Institute of the Max Planck Society, Buenos Aires, Argentina
| | - Rafael Cantera
- Neurodevelopmental Biology Department, Instituto de Investigaciones Biológicas Clemente Estable, Montevideo, Uruguay
| |
Collapse
|
30
|
Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins 2019; 88:397-413. [PMID: 31603244 DOI: 10.1002/prot.25832] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/05/2019] [Accepted: 09/17/2019] [Indexed: 12/17/2022]
Abstract
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.
Collapse
Affiliation(s)
- Rosalin Bonetta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Gianluca Valentino
- Department of Communications and Computer Engineering, University of Malta, Msida, Malta
| |
Collapse
|
31
|
Kruempel JC, Howington MB, Leiser SF. Computational tools for geroscience. TRANSLATIONAL MEDICINE OF AGING 2019; 3:132-143. [PMID: 33241167 PMCID: PMC7685266 DOI: 10.1016/j.tma.2019.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The rapid progress of the past three decades has led the geroscience field near a point where human interventions in aging are plausible. Advances across scientific areas, such as high throughput "-omics" approaches, have led to an exponentially increasing quantity of data available for biogerontologists. To best translate the lifespan and healthspan extending interventions discovered by basic scientists into preventative medicine, it is imperative that the current data are comprehensively utilized to generate testable hypotheses about translational interventions. Building a translational pipeline for geroscience will require both systematic efforts to identify interventions that extend healthspan across taxa and diagnostics that can identify patients who may benefit from interventions prior to the onset of an age-related morbidity. Databases and computational tools that organize and analyze both the wealth of information available on basic biogerontology research and clinical data on aging populations will be critical in developing such a pipeline. Here, we review the current landscape of databases and computational resources available for translational aging research. We discuss key platforms and tools available for aging research, with a focus on how each tool can be used in concert with hypothesis driven experiments to move closer to human interventions in aging.
Collapse
Affiliation(s)
- Joseph C.P. Kruempel
- Molecular & Integrative Physiology Department, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Marshall B. Howington
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Scott F. Leiser
- Molecular & Integrative Physiology Department, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
32
|
Li Q, Milenkovic T. Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2019:130-137. [DOI: 10.1109/bibm47256.2019.8983063] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
33
|
Zhu J, Zhao Q, Katsevich E, Sabatti C. Exploratory Gene Ontology Analysis with Interactive Visualization. Sci Rep 2019; 9:7793. [PMID: 31127124 PMCID: PMC6534545 DOI: 10.1038/s41598-019-42178-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/14/2019] [Indexed: 12/17/2022] Open
Abstract
The Gene Ontology (GO) is a central resource for functional-genomics research. Scientists rely on the functional annotations in the GO for hypothesis generation and couple it with high-throughput biological data to enhance interpretation of results. At the same time, the sheer number of concepts (>30,000) and relationships (>70,000) presents a challenge: it can be difficult to draw a comprehensive picture of how certain concepts of interest might relate with the rest of the ontology structure. Here we present new visualization strategies to facilitate the exploration and use of the information in the GO. We rely on novel graphical display and software architecture that allow significant interaction. To illustrate the potential of our strategies, we provide examples from high-throughput genomic analyses, including chromatin immunoprecipitation experiments and genome-wide association studies. The scientist can also use our visualizations to identify gene sets that likely experience coordinated changes in their expression and use them to simulate biologically-grounded single cell RNA sequencing data, or conduct power studies for differential gene expression studies using our built-in pipeline. Our software and documentation are available at http://aegis.stanford.edu .
Collapse
Affiliation(s)
- Junjie Zhu
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
| | - Qian Zhao
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Eugene Katsevich
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Chiara Sabatti
- Department of Statistics, Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
34
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 163] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|