1
|
Yin X, Pu Y, Yuan S, Pache L, Churas C, Weston S, Riva L, Simons LM, Cisneros WJ, Clausen T, De Jesus PD, Kim HN, Fuentes D, Whitelock J, Esko J, Lord M, Mena I, García-Sastre A, Hultquist JF, Frieman MB, Ideker T, Pratt D, Martin-Sancho L, Chanda SK. Global siRNA Screen Reveals Critical Human Host Factors of SARS-CoV-2 Multicycle Replication. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.10.602835. [PMID: 39026801 PMCID: PMC11257544 DOI: 10.1101/2024.07.10.602835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Defining the subset of cellular factors governing SARS-CoV-2 replication can provide critical insights into viral pathogenesis and identify targets for host-directed antiviral therapies. While a number of genetic screens have previously reported SARS-CoV-2 host dependency factors, these approaches relied on utilizing pooled genome-scale CRISPR libraries, which are biased towards the discovery of host proteins impacting early stages of viral replication. To identify host factors involved throughout the SARS-CoV-2 infectious cycle, we conducted an arrayed genome-scale siRNA screen. Resulting data were integrated with published datasets to reveal pathways supported by orthogonal datasets, including transcriptional regulation, epigenetic modifications, and MAPK signalling. The identified proviral host factors were mapped into the SARS-CoV-2 infectious cycle, including 27 proteins that were determined to impact assembly and release. Additionally, a subset of proteins were tested across other coronaviruses revealing 17 potential pan-coronavirus targets. Further studies illuminated a role for the heparan sulfate proteoglycan perlecan in SARS-CoV-2 viral entry, and found that inhibition of the non-canonical NF-kB pathway through targeting of BIRC2 restricts SARS-CoV-2 replication both in vitro and in vivo. These studies provide critical insight into the landscape of virus-host interactions driving SARS-CoV-2 replication as well as valuable targets for host-directed antivirals.
Collapse
Affiliation(s)
- Xin Yin
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, China
| | - Yuan Pu
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, USA
| | - Shuofeng Yuan
- Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Lars Pache
- Immunity and Pathogenesis Program, Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Christopher Churas
- Department of Medicine, University of California San Diego, La Jolla, USA
| | - Stuart Weston
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, USA
| | - Laura Riva
- Calibr-Skaggs at Scripps Research Institute, La Jolla, USA
| | - Lacy M. Simons
- Division of Infectious Diseases, Departments of Medicine and Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - William J. Cisneros
- Division of Infectious Diseases, Departments of Medicine and Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - Thomas Clausen
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, USA
| | - Paul D. De Jesus
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, USA
| | - Ha Na Kim
- Molecular Surface Interaction Laboratory, Mark Wainwright Analytical Centre, UNSW Sydney, Sydney, New South Wales, Australia
| | - Daniel Fuentes
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, USA
| | - John Whitelock
- Molecular Surface Interaction Laboratory, Mark Wainwright Analytical Centre, UNSW Sydney, Sydney, New South Wales, Australia
| | - Jeffrey Esko
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, USA
| | - Megan Lord
- Molecular Surface Interaction Laboratory, Mark Wainwright Analytical Centre, UNSW Sydney, Sydney, New South Wales, Australia
| | - Ignacio Mena
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, USA
| | - Adolfo García-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, USA; Department of Medicine, Division of Infectious Diseases, Icahn School of Medicine at Mount Sinai, New York, USA; The Tisch Institute, Icahn School of Medicine at Mount Sinai, New York, USA; Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, USA; The Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Judd F. Hultquist
- Division of Infectious Diseases, Departments of Medicine and Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - Matthew B. Frieman
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, La Jolla, USA
| | - Laura Martin-Sancho
- Department of Infectious Disease, Imperial College London, London, United Kingdom
| | - Sumit K Chanda
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, USA
| |
Collapse
|
2
|
Pang W, Chen M, Qin Y. Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning. BMC Bioinformatics 2024; 25:182. [PMID: 38724920 PMCID: PMC11080240 DOI: 10.1186/s12859-024-05669-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/22/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND The prediction of drug sensitivity plays a crucial role in improving the therapeutic effect of drugs. However, testing the effectiveness of drugs is challenging due to the complex mechanism of drug reactions and the lack of interpretability in most machine learning and deep learning methods. Therefore, it is imperative to establish an interpretable model that receives various cell line and drug feature data to learn drug response mechanisms and achieve stable predictions between available datasets. RESULTS This study proposes a new and interpretable deep learning model, DrugGene, which integrates gene expression, gene mutation, gene copy number variation of cancer cells, and chemical characteristics of anticancer drugs to predict their sensitivity. This model comprises two different branches of neural networks, where the first involves a hierarchical structure of biological subsystems that uses the biological processes of human cells to form a visual neural network (VNN) and an interpretable deep neural network for human cancer cells. DrugGene receives genotype input from the cell line and detects changes in the subsystem states. We also employ a traditional artificial neural network (ANN) to capture the chemical structural features of drugs. DrugGene generates final drug response predictions by combining VNN and ANN and integrating their outputs into a fully connected layer. The experimental results using drug sensitivity data extracted from the Cancer Drug Sensitivity Genome Database and the Cancer Treatment Response Portal v2 reveal that the proposed model is better than existing prediction methods. Therefore, our model achieves higher accuracy, learns the reaction mechanisms between anticancer drugs and cell lines from various features, and interprets the model's predicted results. CONCLUSIONS Our method utilizes biological pathways to construct neural networks, which can use genotypes to monitor changes in the state of network subsystems, thereby interpreting the prediction results in the model and achieving satisfactory prediction accuracy. This will help explore new directions in cancer treatment. More available code resources can be downloaded for free from GitHub ( https://github.com/pangweixiong/DrugGene ).
Collapse
Affiliation(s)
- Weixiong Pang
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Yufang Qin
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China.
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
| |
Collapse
|
3
|
Herbst K, Wang T, Forchielli EJ, Thommes M, Paschalidis IC, Segrè D. Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations. Commun Biol 2024; 7:407. [PMID: 38570615 PMCID: PMC10991586 DOI: 10.1038/s42003-024-06093-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 03/22/2024] [Indexed: 04/05/2024] Open
Abstract
The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.
Collapse
Affiliation(s)
- Konrad Herbst
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Taiyao Wang
- Division of Systems Engineering, Boston University, Boston, MA, USA
| | - Elena J Forchielli
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Meghan Thommes
- Biological Design Center, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Biological Design Center, Boston University, Boston, MA, USA.
- Department of Biology, Boston University, Boston, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, MA, USA.
- Faculty of Computing and Data Science, Boston University, Boston, MA, USA.
| |
Collapse
|
4
|
Hefny ZA, Ji B, Elsemman IE, Nielsen J, Van Dijck P. Transcriptomic meta-analysis to identify potential antifungal targets in Candida albicans. BMC Microbiol 2024; 24:66. [PMID: 38413885 PMCID: PMC10898158 DOI: 10.1186/s12866-024-03213-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 02/06/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Candida albicans is a fungal pathogen causing human infections. Here we investigated differential gene expression patterns and functional enrichment in C. albicans strains grown under different conditions. METHODS A systematic GEO database search identified 239 "Candida albicans" datasets, of which 14 were selected after rigorous criteria application. Retrieval of raw sequencing data from the ENA database was accompanied by essential metadata extraction from dataset descriptions and original articles. Pre-processing via the tailored nf-core pipeline for C. albicans involved alignment, gene/transcript quantification, and diverse quality control measures. Quality assessment via PCA and DESeq2 identified significant genes (FDR < = 0.05, log2-fold change > = 1 or <= -1), while topGO conducted GO term enrichment analysis. Exclusions were made based on data quality and strain relevance, resulting in the selection of seven datasets from the SC5314 strain background for in-depth investigation. RESULTS The meta-analysis of seven selected studies unveiled a substantial number of genes exhibiting significant up-regulation (24,689) and down-regulation (18,074). These differentially expressed genes were further categorized into 2,497 significantly up-regulated and 2,573 significantly down-regulated Gene Ontology (GO) IDs. GO term enrichment analysis clustered these terms into distinct groups, providing insights into the functional implications. Three target gene lists were compiled based on previous studies, focusing on central metabolism, ion homeostasis, and pathogenicity. Frequency analysis revealed genes with higher occurrence within the identified GO clusters, suggesting their potential as antifungal targets. Notably, the genes TPS2, TPS1, RIM21, PRA1, SAP4, and SAP6 exhibited higher frequencies within the clusters. Through frequency analysis within the GO clusters, several key genes emerged as potential targets for antifungal therapies. These include RSP5, GLC7, SOD2, SOD5, SOD1, SOD6, SOD4, SOD3, and RIM101 which exhibited higher occurrence within the identified clusters. CONCLUSION This comprehensive study significantly advances our understanding of the dynamic nature of gene expression in C. albicans. The identification of genes with enhanced potential as antifungal drug targets underpins their value for future interventions. The highlighted genes, including TPS2, TPS1, RIM21, PRA1, SAP4, SAP6, RSP5, GLC7, SOD2, SOD5, SOD1, SOD6, SOD4, SOD3, and RIM101, hold promise for the development of targeted antifungal therapies.
Collapse
Affiliation(s)
- Zeinab Abdelmoghis Hefny
- Laboratory of Molecular Cell Biology, Department of Biology, Katholieke Universiteit Leuven, Kasteelpark Arenberg 31, Leuven, B-3001, Belgium
| | - Boyang Ji
- BioInnovation Institute, Ole Maaløes Vej 3, Copenhagen, DK2200, Denmark
| | - Ibrahim E Elsemman
- Department of Information Systems, Faculty of Computers and Information, Assiut University, Assiut, 2071515, Egypt
| | - Jens Nielsen
- BioInnovation Institute, Ole Maaløes Vej 3, Copenhagen, DK2200, Denmark.
- Department of Life Sciences, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, SE41296, Sweden.
| | - Patrick Van Dijck
- Laboratory of Molecular Cell Biology, Department of Biology, Katholieke Universiteit Leuven, Kasteelpark Arenberg 31, Leuven, B-3001, Belgium.
| |
Collapse
|
5
|
Esteban-Medina M, Loucera C, Rian K, Velasco S, Olivares-González L, Rodrigo R, Dopazo J, Peña-Chilet M. The mechanistic functional landscape of retinitis pigmentosa: a machine learning-driven approach to therapeutic target discovery. J Transl Med 2024; 22:139. [PMID: 38321543 PMCID: PMC10848380 DOI: 10.1186/s12967-024-04911-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 01/20/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND Retinitis pigmentosa is the prevailing genetic cause of blindness in developed nations with no effective treatments. In the pursuit of unraveling the intricate dynamics underlying this complex disease, mechanistic models emerge as a tool of proven efficiency rooted in systems biology, to elucidate the interplay between RP genes and their mechanisms. The integration of mechanistic models and drug-target interactions under the umbrella of machine learning methodologies provides a multifaceted approach that can boost the discovery of novel therapeutic targets, facilitating further drug repurposing in RP. METHODS By mapping Retinitis Pigmentosa-related genes (obtained from Orphanet, OMIM and HPO databases) onto KEGG signaling pathways, a collection of signaling functional circuits encompassing Retinitis Pigmentosa molecular mechanisms was defined. Next, a mechanistic model of the so-defined disease map, where the effects of interventions can be simulated, was built. Then, an explainable multi-output random forest regressor was trained using normal tissue transcriptomic data to learn causal connections between targets of approved drugs from DrugBank and the functional circuits of the mechanistic disease map. Selected target genes involvement were validated on rd10 mice, a murine model of Retinitis Pigmentosa. RESULTS A mechanistic functional map of Retinitis Pigmentosa was constructed resulting in 226 functional circuits belonging to 40 KEGG signaling pathways. The method predicted 109 targets of approved drugs in use with a potential effect over circuits corresponding to nine hallmarks identified. Five of those targets were selected and experimentally validated in rd10 mice: Gabre, Gabra1 (GABARα1 protein), Slc12a5 (KCC2 protein), Grin1 (NR1 protein) and Glr2a. As a result, we provide a resource to evaluate the potential impact of drug target genes in Retinitis Pigmentosa. CONCLUSIONS The possibility of building actionable disease models in combination with machine learning algorithms to learn causal drug-disease interactions opens new avenues for boosting drug discovery. Such mechanistically-based hypotheses can guide and accelerate the experimental validations prioritizing drug target candidates. In this work, a mechanistic model describing the functional disease map of Retinitis Pigmentosa was developed, identifying five promising therapeutic candidates targeted by approved drug. Further experimental validation will demonstrate the efficiency of this approach for a systematic application to other rare diseases.
Collapse
Affiliation(s)
- Marina Esteban-Medina
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Systems and Computational Medicine Group, Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013, Seville, Spain
| | - Carlos Loucera
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Systems and Computational Medicine Group, Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013, Seville, Spain
| | - Kinza Rian
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain
- Systems and Computational Medicine Group, Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013, Seville, Spain
| | - Sheyla Velasco
- Group of Pathophysiology and Therapies for Vision Disorders, Príncipe Felipe Research Center (CIPF), 46012, Valencia, Spain
| | - Lorena Olivares-González
- Group of Pathophysiology and Therapies for Vision Disorders, Príncipe Felipe Research Center (CIPF), 46012, Valencia, Spain
| | - Regina Rodrigo
- Group of Pathophysiology and Therapies for Vision Disorders, Príncipe Felipe Research Center (CIPF), 46012, Valencia, Spain
- Biomedical Research Networking Center in Rare Diseases (CIBERER), Health Institute Carlos III, 28029, Madrid, Spain
- Department of Physiology, University of Valencia (UV), 46100, Burjassot, Spain
- Department of Anatomy and Physiology, Catholic University of Valencia San Vicente Mártir, 46001, Valencia, Spain
- Joint Research Unit on Endocrinology, Nutrition and Clinical Dietetics UV-IIS La Fe, 46026, Valencia, Spain
| | - Joaquin Dopazo
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain.
- Systems and Computational Medicine Group, Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013, Seville, Spain.
- Biomedical Research Networking Center in Rare Diseases (CIBERER), Health Institute Carlos III, 28029, Madrid, Spain.
| | - Maria Peña-Chilet
- Andalusian Platform for Computational Medicine, Andalusian Public Foundation Progress and Health-FPS, Seville, Spain.
- Systems and Computational Medicine Group, Institute of Biomedicine of Seville, IBiS, University Hospital Virgen del Rocío/CSIC/University of Seville, 41013, Seville, Spain.
- Biomedical Research Networking Center in Rare Diseases (CIBERER), Health Institute Carlos III, 28029, Madrid, Spain.
- BigData, AI, Biostatistics & Bioinformatics Platform, Health Research Institute La Fe (IISLaFe), 46026, Valencia, Spain.
| |
Collapse
|
6
|
Daalman WKG, Sweep E, Laan L. A tractable physical model for the yeast polarity predicts epistasis and fitness. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220044. [PMID: 37004720 PMCID: PMC10067261 DOI: 10.1098/rstb.2022.0044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Accurate phenotype prediction based on genetic information has numerous societal applications, such as crop design or cellular factories. Epistasis, when biological components interact, complicates modelling phenotypes from genotypes. Here we show an approach to mitigate this complication for polarity establishment in budding yeast, where mechanistic information is abundant. We coarse-grain molecular interactions into a so-called mesotype, which we combine with gene expression noise into a physical cell cycle model. First, we show with computer simulations that the mesotype allows validation of the most current biochemical polarity models by quantitatively matching doubling times. Second, the mesotype elucidates epistasis emergence as exemplified by evaluating the predicted mutational effect of key polarity protein Bem1p when combined with known interactors or under different growth conditions. This example also illustrates how unlikely evolutionary trajectories can become more accessible. The tractability of our biophysically justifiable approach inspires a road-map towards bottom-up modelling complementary to statistical inferences. This article is part of the theme issue ‘Interdisciplinary approaches to predicting evolutionary biology’.
Collapse
Affiliation(s)
| | - Els Sweep
- Department of Bionanoscience, TU Delft, 2629 HZ Delft, The Netherlands
| | - Liedewij Laan
- Department of Bionanoscience, TU Delft, 2629 HZ Delft, The Netherlands
| |
Collapse
|
7
|
Sanders LM, Scott RT, Yang JH, Qutub AA, Garcia Martin H, Berrios DC, Hastings JJA, Rask J, Mackintosh G, Hoarfrost AL, Chalk S, Kalantari J, Khezeli K, Antonsen EL, Babdor J, Barker R, Baranzini SE, Beheshti A, Delgado-Aparicio GM, Glicksberg BS, Greene CS, Haendel M, Hamid AA, Heller P, Jamieson D, Jarvis KJ, Komarova SV, Komorowski M, Kothiyal P, Mahabal A, Manor U, Mason CE, Matar M, Mias GI, Miller J, Myers JG, Nelson C, Oribello J, Park SM, Parsons-Wingerter P, Prabhu RK, Reynolds RJ, Saravia-Butler A, Saria S, Sawyer A, Singh NK, Snyder M, Soboczenski F, Soman K, Theriot CA, Van Valen D, Venkateswaran K, Warren L, Worthey L, Zitnik M, Costes SV. Biological research and self-driving labs in deep space supported by artificial intelligence. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00618-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
8
|
Lu C, Zaucha J, Gam R, Fang H, Ben Smithers, Oates ME, Bernabe-Rubio M, Williams J, Zelenka N, Pandurangan AP, Tandon H, Shihab H, Kalaivani R, Sung M, Sardar AJ, Tzovoras BG, Danovi D, Gough J. Hypothesis-free phenotype prediction within a genetics-first framework. Nat Commun 2023; 14:919. [PMID: 36808136 PMCID: PMC9938118 DOI: 10.1038/s41467-023-36634-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 02/10/2023] [Indexed: 02/19/2023] Open
Abstract
Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed 'rare', even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that considers all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.
Collapse
Affiliation(s)
- Chang Lu
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Jan Zaucha
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Rihab Gam
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Hai Fang
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Centre for Translational Medicine at Shanghai, Ruijin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ben Smithers
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Miguel Bernabe-Rubio
- Centre for Gene Therapy and Regenerative Medicine, King's College London, Guy's Hospital, Floor 28, Tower Wing, Great Maze Pond, London, SE1 9RT, UK
| | - James Williams
- Centre for Gene Therapy and Regenerative Medicine, King's College London, Guy's Hospital, Floor 28, Tower Wing, Great Maze Pond, London, SE1 9RT, UK
| | - Natalie Zelenka
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Arun Prasad Pandurangan
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Himani Tandon
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Hashem Shihab
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Raju Kalaivani
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Minkyung Sung
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK
| | - Adam J Sardar
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | | | - Davide Danovi
- Centre for Gene Therapy and Regenerative Medicine, King's College London, Guy's Hospital, Floor 28, Tower Wing, Great Maze Pond, London, SE1 9RT, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK.
| |
Collapse
|
9
|
Gervits A, Sharan R. Predicting genetic interactions, cell line dependencies and drug sensitivities with variational graph auto-encoder. FRONTIERS IN BIOINFORMATICS 2022; 2:1025783. [DOI: 10.3389/fbinf.2022.1025783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 11/21/2022] [Indexed: 12/03/2022] Open
Abstract
Large scale cancer genomics data provide crucial information about the disease and reveal points of intervention. However, systematic data have been collected in specific cell lines and their collection is laborious and costly. Hence, there is a need to develop computational models that can predict such data for any genomic context of interest. Here we develop novel models that build on variational graph auto-encoders and can integrate diverse types of data to provide high quality predictions of genetic interactions, cell line dependencies and drug sensitivities, outperforming previous methods. Our models, data and implementation are available at: https://github.com/aijag/drugGraphNet.
Collapse
|
10
|
Al-Anzi BF, Khajah M, Fakhraldeen SA. Predicting and explaining the impact of genetic disruptions and interactions on organismal viability. Bioinformatics 2022; 38:4088-4099. [PMID: 35861390 PMCID: PMC9438956 DOI: 10.1093/bioinformatics/btac519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/30/2022] [Accepted: 07/20/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Existing computational models can predict single- and double-mutant fitness but they do have limitations. First, they are often tested via evaluation metrics that are inappropriate for imbalanced datasets. Second, all of them only predict a binary outcome (viable or not, and negatively interacting or not). Third, most are uninterpretable black box machine learning models. RESULTS Budding yeast datasets were used to develop high-performance Multinomial Regression (MN) models capable of predicting the impact of single, double and triple genetic disruptions on viability. These models are interpretable and give realistic non-binary predictions and can predict negative genetic interactions (GIs) in triple-gene knockouts. They are based on a limited set of gene features and their predictions are influenced by the probability of target gene participating in molecular complexes or pathways. Furthermore, the MN models have utility in other organisms such as fission yeast, fruit flies and humans, with the single gene fitness MN model being able to distinguish essential genes necessary for cell-autonomous viability from those required for multicellular survival. Finally, our models exceed the performance of previous models, without sacrificing interpretability. AVAILABILITY AND IMPLEMENTATION All code and processed datasets used to generate results and figures in this manuscript are available at our Github repository at https://github.com/KISRDevelopment/cell_viability_paper. The repository also contains a link to the GI prediction website that lets users search for GIs using the MN models. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Saja A Fakhraldeen
- Ecosystem-based Management of Marine Resources Program, Kuwait Institute for Scientific Research, Safat, 13109, Kuwait
| |
Collapse
|
11
|
Wang J, Zhang Q, Han J, Zhao Y, Zhao C, Yan B, Dai C, Wu L, Wen Y, Zhang Y, Leng D, Wang Z, Yang X, He S, Bo X. Computational methods, databases and tools for synthetic lethality prediction. Brief Bioinform 2022; 23:6555403. [PMID: 35352098 PMCID: PMC9116379 DOI: 10.1093/bib/bbac106] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 12/17/2022] Open
Abstract
Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.
Collapse
Affiliation(s)
- Jing Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yanpeng Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Caiyun Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chong Dai
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongming Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
12
|
A BioID-Derived Proximity Interactome for SARS-CoV-2 Proteins. Viruses 2022; 14:v14030611. [PMID: 35337019 PMCID: PMC8951556 DOI: 10.3390/v14030611] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/09/2022] [Accepted: 03/12/2022] [Indexed: 12/11/2022] Open
Abstract
The novel coronavirus SARS-CoV-2 is responsible for the ongoing COVID-19 pandemic and has caused a major health and economic burden worldwide. Understanding how SARS-CoV-2 viral proteins behave in host cells can reveal underlying mechanisms of pathogenesis and assist in development of antiviral therapies. Here, the cellular impact of expressing SARS-CoV-2 viral proteins was studied by global proteomic analysis, and proximity biotinylation (BioID) was used to map the SARS-CoV-2 virus–host interactome in human lung cancer-derived cells. Functional enrichment analyses revealed previously reported and unreported cellular pathways that are associated with SARS-CoV-2 proteins. We have established a website to host the proteomic data to allow for public access and continued analysis of host–viral protein associations and whole-cell proteomes of cells expressing the viral–BioID fusion proteins. Furthermore, we identified 66 high-confidence interactions by comparing this study with previous reports, providing a strong foundation for future follow-up studies. Finally, we cross-referenced candidate interactors with the CLUE drug library to identify potential therapeutics for drug-repurposing efforts. Collectively, these studies provide a valuable resource to uncover novel SARS-CoV-2 biology and inform development of antivirals.
Collapse
|
13
|
Shen JP. Artificial intelligence, molecular subtyping, biomarkers, and precision oncology. Emerg Top Life Sci 2021; 5:747-756. [PMID: 34881776 PMCID: PMC8786277 DOI: 10.1042/etls20210212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/17/2022]
Abstract
A targeted cancer therapy is only useful if there is a way to accurately identify the tumors that are susceptible to that therapy. Thus rapid expansion in the number of available targeted cancer treatments has been accompanied by a robust effort to subdivide the traditional histological and anatomical tumor classifications into molecularly defined subtypes. This review highlights the history of the paired evolution of targeted therapies and biomarkers, reviews currently used methods for subtype identification, and discusses challenges to the implementation of precision oncology as well as possible solutions.
Collapse
Affiliation(s)
- John Paul Shen
- Department of Gastrointestinal Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, U.S.A
| |
Collapse
|
14
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
15
|
May DG, Martin-Sancho L, Anschau V, Liu S, Chrisopulos RJ, Scott KL, Halfmann CT, Peña RD, Pratt D, Campos AR, Roux KJ. A BioID-derived proximity interactome for SARS-CoV-2 proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 34580671 PMCID: PMC8475972 DOI: 10.1101/2021.09.17.460814] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The novel coronavirus SARS-CoV-2 is responsible for the ongoing COVID-19 pandemic and has caused a major health and economic burden worldwide. Understanding how SARS-CoV-2 viral proteins behave in host cells can reveal underlying mechanisms of pathogenesis and assist in development of antiviral therapies. Here we use BioID to map the SARS-CoV-2 virus-host interactome using human lung cancer derived A549 cells expressing individual SARS-CoV-2 viral proteins. Functional enrichment analyses revealed previously reported and unreported cellular pathways that are in association with SARS-CoV-2 proteins. We have also established a website to host the proteomic data to allow for public access and continued analysis of host-viral protein associations and whole-cell proteomes of cells expressing the viral-BioID fusion proteins. Collectively, these studies provide a valuable resource to potentially uncover novel SARS-CoV-2 biology and inform development of antivirals.
Collapse
|
16
|
Tanaka H, Kreisberg JF, Ideker T. Genetic dissection of complex traits using hierarchical biological knowledge. PLoS Comput Biol 2021; 17:e1009373. [PMID: 34534210 PMCID: PMC8480841 DOI: 10.1371/journal.pcbi.1009373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 09/29/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022] Open
Abstract
Despite the growing constellation of genetic loci linked to common traits, these loci have yet to account for most heritable variation, and most act through poorly understood mechanisms. Recent machine learning (ML) systems have used hierarchical biological knowledge to associate genetic mutations with phenotypic outcomes, yielding substantial predictive power and mechanistic insight. Here, we use an ontology-guided ML system to map single nucleotide variants (SNVs) focusing on 6 classic phenotypic traits in natural yeast populations. The 29 identified loci are largely novel and account for ~17% of the phenotypic variance, versus <3% for standard genetic analysis. Representative results show that sensitivity to hydroxyurea is linked to SNVs in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. This work demonstrates a knowledge-based approach to amplifying and interpreting signals in population genetic studies. Genome-wide association studies (GWAS) have identified many important loci for common diseases and other traits. However, the loci identified by these studies are almost always many steps away from an understanding of underlying biological mechanisms. Here we develop an approach using hierarchical biological knowledge to identify genes and pathways responsible for phenotypic traits. Variants identified by the new method could explain a substantially greater fraction of heritability than previously reported. Moreover, we identified mechanistic pathways by which each causal variant affects cellular function. For example, we find that sensitivity to hydroxyurea is tied to genetic variants in two alternative purine biosynthesis pathways, and that sensitivity to copper arises through failure to detoxify reactive oxygen species in fatty acid metabolism. The new approach is a potentially transformative concept for understanding the genetic drivers of phenotypic variance, with potential applications in understanding traits in biomedicine and agriculture.
Collapse
Affiliation(s)
- Hidenori Tanaka
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Jason F. Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (JFK); (TI)
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (JFK); (TI)
| |
Collapse
|
17
|
Yu CY, Mitrofanova A. Mechanism-Centric Approaches for Biomarker Detection and Precision Therapeutics in Cancer. Front Genet 2021; 12:687813. [PMID: 34408770 PMCID: PMC8365516 DOI: 10.3389/fgene.2021.687813] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/28/2021] [Indexed: 12/18/2022] Open
Abstract
Biomarker discovery is at the heart of personalized treatment planning and cancer precision therapeutics, encompassing disease classification and prognosis, prediction of treatment response, and therapeutic targeting. However, many biomarkers represent passenger rather than driver alterations, limiting their utilization as functional units for therapeutic targeting. We suggest that identification of driver biomarkers through mechanism-centric approaches, which take into account upstream and downstream regulatory mechanisms, is fundamental to the discovery of functionally meaningful markers. Here, we examine computational approaches that identify mechanism-centric biomarkers elucidated from gene co-expression networks, regulatory networks (e.g., transcriptional regulation), protein-protein interaction (PPI) networks, and molecular pathways. We discuss their objectives, advantages over gene-centric approaches, and known limitations. Future directions highlight the importance of input and model interpretability, method and data integration, and the role of recently introduced technological advantages, such as single-cell sequencing, which are central for effective biomarker discovery and time-cautious precision therapeutics.
Collapse
Affiliation(s)
- Christina Y. Yu
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
| | - Antonina Mitrofanova
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
- Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, United States
| |
Collapse
|
18
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|
19
|
Shu J, Li Y, Wang S, Xi B, Ma J. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics 2021; 37:i410-i417. [PMID: 34252957 PMCID: PMC8275341 DOI: 10.1093/bioinformatics/btab310] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
Collapse
Affiliation(s)
- Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bowei Xi
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
20
|
Martin-Sancho L, Lewinski MK, Pache L, Stoneham CA, Yin X, Becker ME, Pratt D, Churas C, Rosenthal SB, Liu S, Weston S, De Jesus PD, O'Neill AM, Gounder AP, Nguyen C, Pu Y, Curry HM, Oom AL, Miorin L, Rodriguez-Frandsen A, Zheng F, Wu C, Xiong Y, Urbanowski M, Shaw ML, Chang MW, Benner C, Hope TJ, Frieman MB, García-Sastre A, Ideker T, Hultquist JF, Guatelli J, Chanda SK. Functional landscape of SARS-CoV-2 cellular restriction. Mol Cell 2021; 81:2656-2668.e8. [PMID: 33930332 PMCID: PMC8043580 DOI: 10.1016/j.molcel.2021.04.008] [Citation(s) in RCA: 111] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 02/01/2021] [Accepted: 04/07/2021] [Indexed: 12/21/2022]
Abstract
A deficient interferon (IFN) response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been implicated as a determinant of severe coronavirus disease 2019 (COVID-19). To identify the molecular effectors that govern IFN control of SARS-CoV-2 infection, we conducted a large-scale gain-of-function analysis that evaluated the impact of human IFN-stimulated genes (ISGs) on viral replication. A limited subset of ISGs were found to control viral infection, including endosomal factors inhibiting viral entry, RNA binding proteins suppressing viral RNA synthesis, and a highly enriched cluster of endoplasmic reticulum (ER)/Golgi-resident ISGs inhibiting viral assembly/egress. These included broad-acting antiviral ISGs and eight ISGs that specifically inhibited SARS-CoV-2 and SARS-CoV-1 replication. Among the broad-acting ISGs was BST2/tetherin, which impeded viral release and is antagonized by SARS-CoV-2 Orf7a protein. Overall, these data illuminate a set of ISGs that underlie innate immune control of SARS-CoV-2/SARS-CoV-1 infection, which will facilitate the understanding of host determinants that impact disease severity and offer potential therapeutic strategies for COVID-19.
Collapse
Affiliation(s)
- Laura Martin-Sancho
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Mary K Lewinski
- Department of Medicine, University of California San Diego, and the VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Lars Pache
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Charlotte A Stoneham
- Department of Medicine, University of California San Diego, and the VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Xin Yin
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Mark E Becker
- Department of Cell and Developmental Biology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sara B Rosenthal
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Stuart Weston
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Paul D De Jesus
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Alan M O'Neill
- Department of Dermatology, University of California San Diego, La Jolla, CA 92093, USA
| | - Anshu P Gounder
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Courtney Nguyen
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Yuan Pu
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Heather M Curry
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Aaron L Oom
- Department of Medicine, University of California San Diego, and the VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Lisa Miorin
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA
| | - Ariel Rodriguez-Frandsen
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Fan Zheng
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Chunxiang Wu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06510, USA
| | - Yong Xiong
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06510, USA
| | - Matthew Urbanowski
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA
| | - Megan L Shaw
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA; Department of Medical Biosciences, University of the Western Cape, Cape Town 7535, South Africa
| | - Max W Chang
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Benner
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Thomas J Hope
- Department of Cell and Developmental Biology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Matthew B Frieman
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Adolfo García-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA; Department of Medicine, Division of Infectious Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA; The Tisch Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029-5674, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Judd F Hultquist
- Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - John Guatelli
- Department of Medicine, University of California San Diego, and the VA San Diego Healthcare System, San Diego, CA 92161, USA
| | - Sumit K Chanda
- Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
21
|
Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst 2021; 12:622-635. [PMID: 34139169 PMCID: PMC8245186 DOI: 10.1016/j.cels.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/14/2021] [Indexed: 01/14/2023]
Abstract
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Collapse
Affiliation(s)
- Leah V Schaffer
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
22
|
Rian K, Hidalgo MR, Çubuk C, Falco MM, Loucera C, Esteban-Medina M, Alamo-Alvarez I, Peña-Chilet M, Dopazo J. Genome-scale mechanistic modeling of signaling pathways made easy: A bioconductor/cytoscape/web server framework for the analysis of omic data. Comput Struct Biotechnol J 2021; 19:2968-2978. [PMID: 34136096 PMCID: PMC8170118 DOI: 10.1016/j.csbj.2021.05.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/21/2021] [Accepted: 05/11/2021] [Indexed: 12/13/2022] Open
Abstract
Genome-scale mechanistic models of pathways are gaining importance for genomic data interpretation because they provide a natural link between genotype measurements (transcriptomics or genomics data) and the phenotype of the cell (its functional behavior). Moreover, mechanistic models can be used to predict the potential effect of interventions, including drug inhibitions. Here, we present the implementation of a mechanistic model of cell signaling for the interpretation of transcriptomic data as an R/Bioconductor package, a Cytoscape plugin and a web tool with enhanced functionality which includes building interpretable predictors, estimation of the effect of perturbations and assessment of the effect of mutations in complex scenarios.
Collapse
Affiliation(s)
- Kinza Rian
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Laboratory of Innovative Technologies (LTI), National School of Applied Sciences in Tangier, UAE, Morocco
| | - Marta R. Hidalgo
- Bioinformatics and Biostatistics Unit, Centro de Investigación Príncipe Felipe (CIPF), 46012 Valencia, Spain
| | - Cankut Çubuk
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
| | - Matias M. Falco
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Marina Esteban-Medina
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Inmaculada Alamo-Alvarez
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla 41013, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla 41013, Spain
- Computational Systems Medicine. Institute of Biomedicine of Seville (IBiS), Sevilla 41013, Spain
- Functional Genomics Node (INB-ELIXIR-es), Sevilla, Spain
| |
Collapse
|
23
|
Díaz-Santiago E, Claros MG, Yahyaoui R, de Diego-Otero Y, Calvo R, Hoenicka J, Palau F, Ranea JAG, Perkins JR. Decoding Neuromuscular Disorders Using Phenotypic Clusters Obtained From Co-Occurrence Networks. Front Mol Biosci 2021; 8:635074. [PMID: 34046427 PMCID: PMC8147726 DOI: 10.3389/fmolb.2021.635074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Neuromuscular disorders (NMDs) represent an important subset of rare diseases associated with elevated morbidity and mortality whose diagnosis can take years. Here we present a novel approach using systems biology to produce functionally-coherent phenotype clusters that provide insight into the cellular functions and phenotypic patterns underlying NMDs, using the Human Phenotype Ontology as a common framework. Gene and phenotype information was obtained for 424 NMDs in OMIM and 126 NMDs in Orphanet, and 335 and 216 phenotypes were identified as typical for NMDs, respectively. ‘Elevated serum creatine kinase’ was the most specific to NMDs, in agreement with the clinical test of elevated serum creatinine kinase that is conducted on NMD patients. The approach to obtain co-occurring NMD phenotypes was validated based on co-mention in PubMed abstracts. A total of 231 (OMIM) and 150 (Orphanet) clusters of highly connected co-occurrent NMD phenotypes were obtained. In parallel, a tripartite network based on phenotypes, diseases and genes was used to associate NMD phenotypes with functions, an approach also validated by literature co-mention, with KEGG pathways showing proportionally higher overlap than Gene Ontology and Reactome. Phenotype-function pairs were crossed with the co-occurrent NMD phenotype clusters to obtain 40 (OMIM) and 72 (Orphanet) functionally coherent phenotype clusters. As expected, many of these overlapped with known diseases and confirmed existing knowledge. Other clusters revealed interesting new findings, indicating informative phenotypes for differential diagnosis, providing deeper knowledge of NMDs, and pointing towards specific cell dysfunction caused by pleiotropic genes. This work is an example of reproducible research that i) can help better understand NMDs and support their diagnosis by providing a new tool that exploits existing information to obtain novel clusters of functionally-related phenotypes, and ii) takes us another step towards personalised medicine for NMDs.
Collapse
Affiliation(s)
- Elena Díaz-Santiago
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain
| | - M Gonzalo Claros
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), IBIMA-RARE, Málaga, Spain.,Institute for Mediterranean and Subtropical Horticulture "La Mayora" (IHSM-UMA-CSIC), Málaga, Spain
| | - Raquel Yahyaoui
- Institute of Biomedical Research in Malaga (IBIMA), IBIMA-RARE, Málaga, Spain.,Laboratory of Metabolopathies and Neonatal Screening, Málaga Regional University Hospital, Málaga, Spain
| | | | - Rocío Calvo
- Institute of Biomedical Research in Malaga (IBIMA), IBIMA-RARE, Málaga, Spain.,Laboratory of Metabolopathies and Neonatal Screening, Málaga Regional University Hospital, Málaga, Spain
| | - Janet Hoenicka
- CIBER de Enfermedades Raras (CIBERER), Madrid, Spain.,Sant Joan de Déu Hospital and Research Institute, Barcelona, Spain
| | - Francesc Palau
- CIBER de Enfermedades Raras (CIBERER), Madrid, Spain.,Sant Joan de Déu Hospital and Research Institute, Barcelona, Spain.,Hospital Clínic and University of Barcelona School of Medicine and Health Sciences, Barcelona, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), IBIMA-RARE, Málaga, Spain
| | - James R Perkins
- Department of Molecular Biology and Biochemistry, Universidad de Málaga, Málaga, Spain.,CIBER de Enfermedades Raras (CIBERER), Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), IBIMA-RARE, Málaga, Spain
| |
Collapse
|
24
|
Ruiz C, Zitnik M, Leskovec J. Identification of disease treatment mechanisms through the multiscale interactome. Nat Commun 2021; 12:1796. [PMID: 33741907 PMCID: PMC7979814 DOI: 10.1038/s41467-021-21770-8] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
Most diseases disrupt multiple proteins, and drugs treat such diseases by restoring the functions of the disrupted proteins. How drugs restore these functions, however, is often unknown as a drug's therapeutic effects are not limited to the proteins that the drug directly targets. Here, we develop the multiscale interactome, a powerful approach to explain disease treatment. We integrate disease-perturbed proteins, drug targets, and biological functions into a multiscale interactome network. We then develop a random walk-based method that captures how drug effects propagate through a hierarchy of biological functions and physical protein-protein interactions. On three key pharmacological tasks, the multiscale interactome predicts drug-disease treatment, identifies proteins and biological functions related to treatment, and predicts genes that alter a treatment's efficacy and adverse reactions. Our results indicate that physical interactions between proteins alone cannot explain treatment since many drugs treat diseases by affecting the biological functions disrupted by the disease rather than directly targeting disease proteins or their regulators. We provide a general framework for explaining treatment, even when drugs seem unrelated to the diseases they are recommended for.
Collapse
Affiliation(s)
- Camilo Ruiz
- Computer Science Department, Stanford University, Stanford, CA, USA
- Bioengineering Department, Stanford University, Stanford, CA, USA
| | - Marinka Zitnik
- Biomedical Informatics Department, Harvard University, Boston, MA, USA
| | - Jure Leskovec
- Computer Science Department, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
25
|
Kryazhimskiy S. Emergence and propagation of epistasis in metabolic networks. eLife 2021; 10:e60200. [PMID: 33527897 PMCID: PMC7924954 DOI: 10.7554/elife.60200] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 02/01/2021] [Indexed: 12/11/2022] Open
Abstract
Epistasis is often used to probe functional relationships between genes, and it plays an important role in evolution. However, we lack theory to understand how functional relationships at the molecular level translate into epistasis at the level of whole-organism phenotypes, such as fitness. Here, I derive two rules for how epistasis between mutations with small effects propagates from lower- to higher-level phenotypes in a hierarchical metabolic network with first-order kinetics and how such epistasis depends on topology. Most importantly, weak epistasis at a lower level may be distorted as it propagates to higher levels. Computational analyses show that epistasis in more realistic models likely follows similar, albeit more complex, patterns. These results suggest that pairwise inter-gene epistasis should be common, and it should generically depend on the genetic background and environment. Furthermore, the epistasis coefficients measured for high-level phenotypes may not be sufficient to fully infer the underlying functional relationships.
Collapse
Affiliation(s)
- Sergey Kryazhimskiy
- Division of Biological Sciences, University of California, San DiegoLa JollaUnited States
| |
Collapse
|
26
|
Rian K, Esteban-Medina M, Hidalgo MR, Çubuk C, Falco MM, Loucera C, Gunyel D, Ostaszewski M, Peña-Chilet M, Dopazo J. Mechanistic modeling of the SARS-CoV-2 disease map. BioData Min 2021; 14:5. [PMID: 33478554 PMCID: PMC7817765 DOI: 10.1186/s13040-021-00234-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open
Abstract
Here we present a web interface that implements a comprehensive mechanistic model of the SARS-CoV-2 disease map. In this framework, the detailed activity of the human signaling circuits related to the viral infection, covering from the entry and replication mechanisms to the downstream consequences as inflammation and antigenic response, can be inferred from gene expression experiments. Moreover, the effect of potential interventions, such as knock-downs, or drug effects (currently the system models the effect of more than 8000 DrugBank drugs) can be studied. This freely available tool not only provides an unprecedentedly detailed view of the mechanisms of viral invasion and the consequences in the cell but has also the potential of becoming an invaluable asset in the search for efficient antiviral treatments.
Collapse
Affiliation(s)
- Kinza Rian
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
| | - Marina Esteban-Medina
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Marta R Hidalgo
- Bioinformatics and Biostatistics Unit, Centro de Investigación Príncipe Felipe (CIPF), 46012, Valencia, Spain
| | - Cankut Çubuk
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
| | - Matias M Falco
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain
| | - Carlos Loucera
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Devrim Gunyel
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367, Belvaux, Luxembourg
| | - María Peña-Chilet
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain.
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain.
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain.
| | - Joaquín Dopazo
- Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain.
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain.
- Bioinformatics in RareDiseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Sevilla, Spain.
- Functional Genomics Node (INB-ELIXIR-es), Sevilla, Spain.
| |
Collapse
|
27
|
Fan J, Li XC, Crovella M, Leiserson MDM. Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information. Bioinformatics 2020; 36:i866-i874. [PMID: 33381837 DOI: 10.1093/bioinformatics/btaa818] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/09/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker's yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data are scarce. RESULTS In this article, we address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker's and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data-scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost. AVAILABILITY Implementations of models and experiments are available at: https://github.com/lrgr/EMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742
| | - Xuan Cindy Li
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Mark Crovella
- Department of Computer Science, Boston University, MA, 02215, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742
| |
Collapse
|
28
|
The Path towards Predicting Evolution as Illustrated in Yeast Cell Polarity. Cells 2020; 9:cells9122534. [PMID: 33255231 PMCID: PMC7760196 DOI: 10.3390/cells9122534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/18/2020] [Accepted: 11/21/2020] [Indexed: 01/14/2023] Open
Abstract
A bottom-up route towards predicting evolution relies on a deep understanding of the complex network that proteins form inside cells. In a rapidly expanding panorama of experimental possibilities, the most difficult question is how to conceptually approach the disentangling of such complex networks. These can exhibit varying degrees of hierarchy and modularity, which obfuscate certain protein functions that may prove pivotal for adaptation. Using the well-established polarity network in budding yeast as a case study, we first organize current literature to highlight protein entrenchments inside polarity. Following three examples, we see how alternating between experimental novelties and subsequent emerging design strategies can construct a layered understanding, potent enough to reveal evolutionary targets. We show that if you want to understand a cell’s evolutionary capacity, such as possible future evolutionary paths, seemingly unimportant proteins need to be mapped and studied. Finally, we generalize this research structure to be applicable to other systems of interest.
Collapse
|
29
|
Ren X, Wang S, Huang T. Decipher the connections between proteins and phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140503. [PMID: 32707349 DOI: 10.1016/j.bbapap.2020.140503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
As the outward-most representation of life, phenotype is the fundamental basis with which humans understand life and disease. But with the advent of molecular and sequencing technique and research, a growing portion of science research focuses primarily on the molecular level of life. Our understanding in molecular variations and mechanisms can only be fully utilized when they are translated into the phenotypic level. In this study, we constructed similarity network for phenotype ontology, and then applied network analysis methods to discover phenotype/disease clusters. Then, we used machine learning models to predict protein-phenotype associations. Each protein was characterized by the functional profiles of its interaction neighbors on the protein-protein interaction network. Our methods can not only predict protein-phenotype associations, but also reveal the underlying mechanisms from protein to phenotype.
Collapse
Affiliation(s)
- Xiaohui Ren
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Steven Wang
- Department of Molecular Biology, Columbia University, New York, USA
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
30
|
G2G: A web-server for the prediction of human synthetic lethal interactions. Comput Struct Biotechnol J 2020; 18:1028-1031. [PMID: 32419903 PMCID: PMC7215103 DOI: 10.1016/j.csbj.2020.04.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/18/2020] [Accepted: 04/19/2020] [Indexed: 12/04/2022] Open
Abstract
Genetic interactions (GIs) are fundamental to our understanding of biological processes in the cell. While GIs have been systematically mapped in yeast, there is scarce information about them in humans. Recently, we have suggested a state-of-the-art hierarchical method that leverages gene ontology information for predicting GIs in yeast. Here, we adapt this method and apply it for the first time to predict GIs in human. We introduce a web service called G2G for this task that is available at http://bnet.cs.tau.ac.il/g2g/.
Collapse
|
31
|
Sailem HZ, Rittscher J, Pelkmans L. KCML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens. Mol Syst Biol 2020; 16:e9083. [PMID: 32141232 PMCID: PMC7059140 DOI: 10.15252/msb.20199083] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 02/01/2020] [Accepted: 02/06/2020] [Indexed: 12/13/2022] Open
Abstract
Characterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge- and Context-driven Machine Learning (KCML), a framework that systematically predicts multiple context-specific functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As a proof of concept, we test KCML on three datasets describing phenotypes at the molecular, cellular and population levels and show that it outperforms traditional analysis pipelines. In particular, KCML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors, and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcomes. These results highlight KCML as a systematic framework for discovering novel scale-crossing and context-dependent gene functions. KCML is highly generalisable and applicable to various large-scale genetic perturbation screens.
Collapse
Affiliation(s)
- Heba Z Sailem
- Department of Engineering ScienceInstitute of Biomedical EngineeringUniversity of OxfordOxfordUK
- Big Data InstituteLi Ka Shing Centre for Health Information and DiscoveryUniversity of OxfordOxfordUK
| | - Jens Rittscher
- Department of Engineering ScienceInstitute of Biomedical EngineeringUniversity of OxfordOxfordUK
- Big Data InstituteLi Ka Shing Centre for Health Information and DiscoveryUniversity of OxfordOxfordUK
| | - Lucas Pelkmans
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
| |
Collapse
|
32
|
Il'yasova D, Kinev AV. Editorial: Using Cells in Epidemiological Studies to Characterize Individual Response to Environmental Hazards. Front Public Health 2019; 7:284. [PMID: 31632944 PMCID: PMC6783490 DOI: 10.3389/fpubh.2019.00284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 09/18/2019] [Indexed: 11/15/2022] Open
Affiliation(s)
- Dora Il'yasova
- Department of Population Health Sciences, School of Public Health, Georgia State University, Atlanta, GA, United States
| | | |
Collapse
|
33
|
Cao Y, Sun Y, Karimi M, Chen H, Moronfoye O, Shen Y. Predicting pathogenicity of missense variants with weakly supervised regression. Hum Mutat 2019; 40:1579-1592. [PMID: 31144781 PMCID: PMC6744350 DOI: 10.1002/humu.23826] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 05/23/2019] [Accepted: 05/27/2019] [Indexed: 12/27/2022]
Abstract
Quickly growing genetic variation data of unknown clinical significance demand computational methods that can reliably predict clinical phenotypes and deeply unravel molecular mechanisms. On the platform enabled by the Critical Assessment of Genome Interpretation (CAGI), we develop a novel "weakly supervised" regression (WSR) model that not only predicts precise clinical significance (probability of pathogenicity) from inexact training annotations (class of pathogenicity) but also infers underlying molecular mechanisms in a variant-specific manner. Compared to multiclass logistic regression, a representative multiclass classifier, our kernelized WSR improves the performance for the ENIGMA Challenge set from 0.72 to 0.97 in binary area under the receiver operating characteristic curve (AUC) and from 0.64 to 0.80 in ordinal multiclass AUC. WSR model interpretation and protein structural interpretation reach consensus in corroborating the most probable molecular mechanisms by which some pathogenic BRCA1 variants confer clinical significance, namely metal-binding disruption for p.C44F and p.C47Y, protein-binding disruption for p.M18T, and structure destabilization for p.S1715N.
Collapse
Affiliation(s)
- Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Haoran Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Oluwaseyi Moronfoye
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| |
Collapse
|
34
|
Pratapa A, Adames N, Kraikivski P, Franzese N, Tyson JJ, Peccoud J, Murali TM. CrossPlan: systematic planning of genetic crosses to validate mathematical models. Bioinformatics 2019; 34:2237-2244. [PMID: 29432533 DOI: 10.1093/bioinformatics/bty072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Neil Adames
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - Pavel Kraikivski
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | | | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| |
Collapse
|
35
|
Esteban-Medina M, Peña-Chilet M, Loucera C, Dopazo J. Exploring the druggable space around the Fanconi anemia pathway using machine learning and mechanistic models. BMC Bioinformatics 2019; 20:370. [PMID: 31266445 PMCID: PMC6604281 DOI: 10.1186/s12859-019-2969-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 06/25/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND In spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases. RESULTS The application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets. CONCLUSIONS The use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.
Collapse
Affiliation(s)
- Marina Esteban-Medina
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Bioinformatics in Rare Diseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013 Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Bioinformatics in Rare Diseases (BiER). Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013 Sevilla, Spain
- INB-ELIXIR-es, FPS, Hospital Virgen del Rocío, 42013 Sevilla, Spain
| |
Collapse
|
36
|
Alter C, Ding Z, Flögel U, Scheller J, Schrader J. A2bR-dependent signaling alters immune cell composition and enhances IL-6 formation in the ischemic heart. Am J Physiol Heart Circ Physiol 2019; 317:H190-H200. [DOI: 10.1152/ajpheart.00029.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Although the cardioprotective effect of adenosine is undisputed, the role of the adenosine A2breceptor (A2bR) in ischemic cardiac remodeling is not defined. In this study we aimed to unravel the role A2bR plays in modulating the immune response and the healing mechanisms after myocardial infarction. Genetic and pharmacological (PSB603) inactivation of A2bR as well as activation of A2bR with BAY60-6583 does not alter cardiac remodeling of the infarcted (50-min left anterior descending artery occlusion/reperfusion) murine heart. Flow cytometry of immune cell subsets identified a significant increase in B cells, NK cells, CD8 and CD4 T cells, as well as FoxP3-expressing regulatory T cells in the injured heart in A2bR-deficient mice. Analysis of T-cell function revealed that expression and secretion of interleukin (IL)-2, interferon (IFN)γ, and tumor necrosis factor (TNF)α by T cells is under A2bR control. In addition, we found substantial cellular heterogeneity in the response of immune cells and cardiomyocytes to A2bR deficiency: while in the absence of A2bR, expression of IL-6 was greatly reduced in cardiomyocytes and immune cells except T cells, and expression of IL-1β was strongly reduced in cardiomyocytes, granulocytes, and B cells as determined by quantitative PCR. Our findings indicate that A2bR signaling in the ischemic heart triggers substantial changes in cardiac immune cell composition of the lymphoid lineage and induces a profound cell type-specific downregulation of IL-6 and IL-1β. This suggests the presence of a targetable adenosine–A2bR–IL-6-axis triggered by adenosine formed by the ischemic heart.NEW & NOTEWORTHY Genetic deletion and pharmacological inactivation/activation of A2bR does not alter cardiac remodeling after MI but is associated by compensatory upregulation of various pro- and anti-inflammatory immune cell subsets (B cells, NK cells, CD8 and CD4 T cells, regulatory T cells). In the inflamed heart, A2bR modulates the expression of IL-2, IFNγ, TNFα in T cells and of IL-6 in cardiomyocytes, monocytes, granulocytes and B cells. This suggests an important adenosine–IL-6 axis, which is controlled by A2bR via local adenosine.
Collapse
Affiliation(s)
- Christina Alter
- Department of Molecular Cardiology, University Düsseldorf, Medical Faculty, Düsseldorf, Germany
| | - Zhaoping Ding
- Department of Molecular Cardiology, University Düsseldorf, Medical Faculty, Düsseldorf, Germany
| | - Ulrich Flögel
- Department of Molecular Cardiology, University Düsseldorf, Medical Faculty, Düsseldorf, Germany
| | - Jürgen Scheller
- Institute of Biochemistry and Molecular Biology II, University Düsseldorf, Medical Faculty, Heinrich-Heine University, Düsseldorf, Germany
| | - Jürgen Schrader
- Department of Molecular Cardiology, University Düsseldorf, Medical Faculty, Düsseldorf, Germany
| |
Collapse
|
37
|
Abstract
Classically, phenotype is what is observed, and genotype is the genetic makeup. Statistical studies aim to project phenotypic likelihoods of genotypic patterns. The traditional genotype-to-phenotype theory embraces the view that the encoded protein shape together with gene expression level largely determines the resulting phenotypic trait. Here, we point out that the molecular biology revolution at the turn of the century explained that the gene encodes not one but ensembles of conformations, which in turn spell all possible gene-associated phenotypes. The significance of a dynamic ensemble view is in understanding the linkage between genetic change and the gained observable physical or biochemical characteristics. Thus, despite the transformative shift in our understanding of the basis of protein structure and function, the literature still commonly relates to the classical genotype-phenotype paradigm. This is important because an ensemble view clarifies how even seemingly small genetic alterations can lead to pleiotropic traits in adaptive evolution and in disease, why cellular pathways can be modified in monogenic and polygenic traits, and how the environment may tweak protein function.
Collapse
Affiliation(s)
- Ruth Nussinov
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Chung-Jung Tsai
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
| | - Hyunbum Jang
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
| |
Collapse
|
38
|
Fan J, Cannistra A, Fried I, Lim T, Schaffner T, Crovella M, Hescott B, Leiserson MDM. Functional protein representations from biological networks enable diverse cross-species inference. Nucleic Acids Res 2019; 47:e51. [PMID: 30847485 PMCID: PMC6511848 DOI: 10.1093/nar/gkz132] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/09/2019] [Accepted: 02/18/2019] [Indexed: 12/31/2022] Open
Abstract
Transferring knowledge between species is key for many biological applications, but is complicated by divergent and convergent evolution. Many current approaches for this problem leverage sequence and interaction network data to transfer knowledge across species, exemplified by network alignment methods. While these techniques do well, they are limited in scope, creating metrics to address one specific problem or task. We take a different approach by creating an environment where multiple knowledge transfer tasks can be performed using the same protein representations. Specifically, our kernel-based method, MUNK, integrates sequence and network structure to create functional protein representations, embedding proteins from different species in the same vector space. First we show proteins in different species that are close in MUNK-space are functionally similar. Next, we use these representations to share knowledge of synthetic lethal interactions between species. Importantly, we find that the results using MUNK-representations are at least as accurate as existing algorithms for these tasks. Finally, we generalize the notion of a phenolog ('orthologous phenotype') to use functionally similar proteins (i.e. those with similar representations). We demonstrate the utility of this broadened notion by using it to identify known phenologs and novel non-obvious ones supported by current research.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | | | - Inbar Fried
- University of North Carolina Medical School, USA
| | - Tim Lim
- Department of Computer Science, Boston University, USA
| | | | - Mark Crovella
- Department of Computer Science, Boston University, USA
| | - Benjamin Hescott
- College of Computer and Information Science, Northeastern University, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
39
|
Grimbs A, Klosik DF, Bornholdt S, Hütt MT. A system-wide network reconstruction of gene regulation and metabolism in Escherichia coli. PLoS Comput Biol 2019; 15:e1006962. [PMID: 31050661 PMCID: PMC6519848 DOI: 10.1371/journal.pcbi.1006962] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 05/15/2019] [Accepted: 03/18/2019] [Indexed: 11/19/2022] Open
Abstract
Genome-scale metabolic models have become a fundamental tool for examining metabolic principles. However, metabolism is not solely characterized by the underlying biochemical reactions and catalyzing enzymes, but also affected by regulatory events. Since the pioneering work of Covert and co-workers as well as Shlomi and co-workers it is debated, how regulation and metabolism synergistically characterize a coherent cellular state. The first approaches started from metabolic models, which were extended by the regulation of the encoding genes of the catalyzing enzymes. By now, bioinformatics databases in principle allow addressing the challenge of integrating regulation and metabolism on a system-wide level. Collecting information from several databases we provide a network representation of the integrated gene regulatory and metabolic system for Escherichia coli, including major cellular processes, from metabolic processes via protein modification to a variety of regulatory events. Besides transcriptional regulation, we also take into account regulation of translation, enzyme activities and reactions. Our network model provides novel topological characterizations of system components based on their positions in the network. We show that network characteristics suggest a representation of the integrated system as three network domains (regulatory, metabolic and interface networks) instead of two. This new three-domain representation reveals the structural centrality of components with known high functional relevance. This integrated network can serve as a platform for understanding coherent cellular states as active subnetworks and to elucidate crossover effects between metabolism and gene regulation.
Collapse
Affiliation(s)
- Anne Grimbs
- Computational Systems Biology, Department of Life Sciences & Chemistry, Jacobs University, Bremen, Germany
| | - David F. Klosik
- Institute for Theoretical Physics, University of Bremen, Bremen, Germany
| | - Stefan Bornholdt
- Institute for Theoretical Physics, University of Bremen, Bremen, Germany
| | - Marc-Thorsten Hütt
- Computational Systems Biology, Department of Life Sciences & Chemistry, Jacobs University, Bremen, Germany
| |
Collapse
|
40
|
Benstead-Hume G, Chen X, Hopkins SR, Lane KA, Downs JA, Pearl FMG. Predicting synthetic lethal interactions using conserved patterns in protein interaction networks. PLoS Comput Biol 2019; 15:e1006888. [PMID: 30995217 PMCID: PMC6488098 DOI: 10.1371/journal.pcbi.1006888] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 04/29/2019] [Accepted: 02/18/2019] [Indexed: 11/30/2022] Open
Abstract
In response to a need for improved treatments, a number of promising novel targeted cancer therapies are being developed that exploit human synthetic lethal interactions. This is facilitating personalised medicine strategies in cancers where specific tumour suppressors have become inactivated. Mainly due to the constraints of the experimental procedures, relatively few human synthetic lethal interactions have been identified. Here we describe SLant (Synthetic Lethal analysis via Network topology), a computational systems approach to predicting human synthetic lethal interactions that works by identifying and exploiting conserved patterns in protein interaction network topology both within and across species. SLant out-performs previous attempts to classify human SSL interactions and experimental validation of the models predictions suggests it may provide useful guidance for future SSL screenings and ultimately aid targeted cancer therapy development.
Collapse
Affiliation(s)
- Graeme Benstead-Hume
- Bioinformatics Lab, School of Life Sciences, University of Sussex, Falmer, Brighton, United Kingdom
| | - Xiangrong Chen
- Bioinformatics Lab, School of Life Sciences, University of Sussex, Falmer, Brighton, United Kingdom
| | - Suzanna R. Hopkins
- Division of Cancer Biology, Institute of Cancer Research, Chester Beatty Laboratories, London, United Kingdom
| | - Karen A. Lane
- Division of Cancer Biology, Institute of Cancer Research, Chester Beatty Laboratories, London, United Kingdom
| | - Jessica A. Downs
- Division of Cancer Biology, Institute of Cancer Research, Chester Beatty Laboratories, London, United Kingdom
| | - Frances M. G. Pearl
- Bioinformatics Lab, School of Life Sciences, University of Sussex, Falmer, Brighton, United Kingdom
| |
Collapse
|
41
|
Yu MK, Ma J, Ono K, Zheng F, Fong SH, Gary A, Chen J, Demchak B, Pratt D, Ideker T. DDOT: A Swiss Army Knife for Investigating Data-Driven Biological Ontologies. Cell Syst 2019; 8:267-273.e3. [PMID: 30878356 PMCID: PMC7042149 DOI: 10.1016/j.cels.2019.02.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 12/08/2018] [Accepted: 02/08/2019] [Indexed: 01/08/2023]
Abstract
Systems biology requires not only genome-scale data but also methods to integrate these data into interpretable models. Previously, we developed approaches that organize omics data into a structured hierarchy of cellular components and pathways, called a "data-driven ontology." Such hierarchies recapitulate known cellular subsystems and discover new ones. To broadly facilitate this type of modeling, we report the development of a software library called the Data-Driven Ontology Toolkit (DDOT), consisting of a Python package (https://github.com/idekerlab/ddot) to assemble and analyze ontologies and a web application (http://hiview.ucsd.edu) to visualize them. Using DDOT, we programmatically assemble a compendium of ontologies for 652 diseases by integrating gene-disease mappings with a gene similarity network derived from omics data. For example, the ontology for Fanconi anemia describes known and novel disease mechanisms in its hierarchy of 194 genes and 74 subsystems. DDOT provides an easy interface to share ontologies online at the Network Data Exchange.
Collapse
Affiliation(s)
- Michael Ku Yu
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA 92093, USA; Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jianzhu Ma
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Fan Zheng
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Samson H Fong
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Aaron Gary
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Barry Demchak
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Dexter Pratt
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
42
|
Capriotti E, Ozturk K, Carter H. Integrating molecular networks with genetic variant interpretation for precision medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2018; 11:e1443. [PMID: 30548534 PMCID: PMC6450710 DOI: 10.1002/wsbm.1443] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/23/2018] [Accepted: 10/30/2018] [Indexed: 02/01/2023]
Abstract
More reliable and cheaper sequencing technologies have revealed the vast mutational landscapes characteristic of many phenotypes. The analysis of such genetic variants has led to successful identification of altered proteins underlying many Mendelian disorders. Nevertheless the simple one‐variant one‐phenotype model valid for many monogenic diseases does not capture the complexity of polygenic traits and disorders. Although experimental and computational approaches have improved detection of functionally deleterious variants and important interactions between gene products, the development of comprehensive models relating genotype and phenotypes remains a challenge in the field of genomic medicine. In this context, a new view of the pathologic state as significant perturbation of the network of interactions between biomolecules is crucial for the identification of biochemical pathways associated with complex phenotypes. Seminal studies in systems biology combined the analysis of genetic variation with protein–protein interaction networks to demonstrate that even as biological systems evolve to be robust to genetic variation, their topologies create disease vulnerabilities. More recent analyses model the impact of genetic variants as changes to the “wiring” of the interactome to better capture heterogeneity in genotype–phenotype relationships. These studies lay the foundation for using networks to predict variant effects at scale using machine‐learning or algorithmic approaches. A wealth of databases and resources for the annotation of genotype–phenotype relationships have been developed to support developments in this area. This overview describes how study of the molecular interactome has generated insights linking the organization of biological systems to disease mechanism, and how this information can enable precision medicine. This article is categorized under:
Translational, Genomic, and Systems Medicine > Translational Medicine Biological Mechanisms > Cell Signaling Models of Systems Properties and Processes > Mechanistic Models Analytical and Computational Methods > Computational Methods
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Kivilcim Ozturk
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California
| | - Hannah Carter
- Department of Medicine and Institute for Genomic Medicine, University of California, San Diego, La Jolla, California
| |
Collapse
|
43
|
Stupnikov A, O'Reilly PG, McInerney CE, Roddy AC, Dunne PD, Gilmore A, Ellis HP, Flannery T, Healy E, McIntosh SA, Savage K, Kurian KM, Emmert-Streib F, Prise KM, Salto-Tellez M, McArt DG. Impact of Variable RNA-Sequencing Depth on Gene Expression Signatures and Target Compound Robustness: Case Study Examining Brain Tumor (Glioma) Disease Progression. JCO Precis Oncol 2018; 2. [PMID: 30324181 PMCID: PMC6186166 DOI: 10.1200/po.18.00014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Purpose Gene expression profiling can uncover biologic mechanisms underlying disease and is important in drug development. RNA sequencing (RNA-seq) is routinely used to assess gene expression, but costs remain high. Sample multiplexing reduces RNA-seq costs; however, multiplexed samples have lower cDNA sequencing depth, which can hinder accurate differential gene expression detection. The impact of sequencing depth alteration on RNA-seq–based downstream analyses such as gene expression connectivity mapping is not known, where this method is used to identify potential therapeutic compounds for repurposing. Methods In this study, published RNA-seq profiles from patients with brain tumor (glioma) were assembled into two disease progression gene signature contrasts for astrocytoma. Available treatments for glioma have limited effectiveness, rendering this a disease of poor clinical outcome. Gene signatures were subsampled to simulate sequencing alterations and analyzed in connectivity mapping to investigate target compound robustness. Results Data loss to gene signatures led to the loss, gain, and consistent identification of significant connections. The most accurate gene signature contrast with consistent patient gene expression profiles was more resilient to data loss and identified robust target compounds. Target compounds lost included candidate compounds of potential clinical utility in glioma (eg, suramin, dasatinib). Lost connections may have been linked to low-abundance genes in the gene signature that closely characterized the disease phenotype. Consistently identified connections may have been related to highly expressed abundant genes that were ever-present in gene signatures, despite data reductions. Potential noise surrounding findings included false-positive connections that were gained as a result of gene signature modification with data loss. Conclusion Findings highlight the necessity for gene signature accuracy for connectivity mapping, which should improve the clinical utility of future target compound discoveries.
Collapse
Affiliation(s)
- Alexey Stupnikov
- Queen's University Belfast; Johns Hopkins University, Baltimore, MD
| | | | | | | | | | | | - Hayley P Ellis
- Brain Tumour Research Centre, University of Bristol, Bristol, United Kingdom
| | - Tom Flannery
- Belfast Health and Social Care Trust, Belfast, United Kingdom
| | - Estelle Healy
- Belfast Health and Social Care Trust, Belfast, United Kingdom
| | | | | | - Kathreena M Kurian
- Brain Tumour Research Centre, University of Bristol, Bristol, United Kingdom
| | | | | | - Manuel Salto-Tellez
- Queen's University Belfast; Belfast Health and Social Care Trust, Belfast, United Kingdom
| | | |
Collapse
|
44
|
Willsey AJ, Morris MT, Wang S, Willsey HR, Sun N, Teerikorpi N, Baum TB, Cagney G, Bender KJ, Desai TA, Srivastava D, Davis GW, Doudna J, Chang E, Sohal V, Lowenstein DH, Li H, Agard D, Keiser MJ, Shoichet B, von Zastrow M, Mucke L, Finkbeiner S, Gan L, Sestan N, Ward ME, Huttenhain R, Nowakowski TJ, Bellen HJ, Frank LM, Khokha MK, Lifton RP, Kampmann M, Ideker T, State MW, Krogan NJ. The Psychiatric Cell Map Initiative: A Convergent Systems Biological Approach to Illuminating Key Molecular Pathways in Neuropsychiatric Disorders. Cell 2018; 174:505-520. [PMID: 30053424 PMCID: PMC6247911 DOI: 10.1016/j.cell.2018.06.016] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 05/07/2018] [Accepted: 06/08/2018] [Indexed: 12/11/2022]
Abstract
Although gene discovery in neuropsychiatric disorders, including autism spectrum disorder, intellectual disability, epilepsy, schizophrenia, and Tourette disorder, has accelerated, resulting in a large number of molecular clues, it has proven difficult to generate specific hypotheses without the corresponding datasets at the protein complex and functional pathway level. Here, we describe one path forward-an initiative aimed at mapping the physical and genetic interaction networks of these conditions and then using these maps to connect the genomic data to neurobiology and, ultimately, the clinic. These efforts will include a team of geneticists, structural biologists, neurobiologists, systems biologists, and clinicians, leveraging a wide array of experimental approaches and creating a collaborative infrastructure necessary for long-term investigation. This initiative will ultimately intersect with parallel studies that focus on other diseases, as there is a significant overlap with genes implicated in cancer, infectious disease, and congenital heart defects.
Collapse
Affiliation(s)
- A Jeremy Willsey
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA.
| | - Montana T Morris
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Sheng Wang
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Helen R Willsey
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Nawei Sun
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Nia Teerikorpi
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Tetrad Graduate Program, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Tierney B Baum
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Gerard Cagney
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Kevin J Bender
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Tejal A Desai
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Deepak Srivastava
- Gladstone Institutes, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Graeme W Davis
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA; Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jennifer Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA; Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA; Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA; Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA; MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Edward Chang
- Department of Neurological Surgery, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Vikaas Sohal
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Daniel H Lowenstein
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Hao Li
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - David Agard
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Michael J Keiser
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Brian Shoichet
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Mark von Zastrow
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Lennart Mucke
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Gladstone Institutes, San Francisco, CA 94158, USA
| | - Steven Finkbeiner
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Department of Physiology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Li Gan
- Department of Neurology, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Gladstone Institutes, San Francisco, CA 94158, USA
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06510, USA
| | - Michael E Ward
- National Institute of Neurological Disorders and Stroke, NIH, Bethesda, MD 20892, USA
| | - Ruth Huttenhain
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Tomasz J Nowakowski
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Anatomy, University of California, San Francisco, San Francisco, CA 94143, USA; The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Hugo J Bellen
- Departments of Molecular and Human Genetics and Neuroscience, Neurological Research Institute at TCH, Baylor College of Medicine, Houston, TX 77030, USA; Howard Hughes Medical Institute, Baylor College of Medicine, Houston, TX 77030, USA
| | - Loren M Frank
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Physiology, University of California, San Francisco, San Francisco, CA 94143, USA; Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Mustafa K Khokha
- Pediatric Genomics Discovery Program, Departments of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Richard P Lifton
- Laboratory of Human Genetics and Genomics, The Rockefeller University, New York, NY 10065, USA
| | - Martin Kampmann
- Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Matthew W State
- Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA
| | - Nevan J Krogan
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94143, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94143, USA; Helen Diller Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
45
|
Hutt DM, Loguercio S, Roth DM, Su AI, Balch WE. Correcting the F508del-CFTR variant by modulating eukaryotic translation initiation factor 3-mediated translation initiation. J Biol Chem 2018; 293:13477-13495. [PMID: 30006345 DOI: 10.1074/jbc.ra118.003192] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Inherited and somatic rare diseases result from >200,000 genetic variants leading to loss- or gain-of-toxic function, often caused by protein misfolding. Many of these misfolded variants fail to properly interact with other proteins. Understanding the link between factors mediating the transcription, translation, and protein folding of these disease-associated variants remains a major challenge in cell biology. Herein, we utilized the cystic fibrosis transmembrane conductance regulator (CFTR) protein as a model and performed a proteomics-based high-throughput screen (HTS) to identify pathways and components affecting the folding and function of the most common cystic fibrosis-associated mutation, the F508del variant of CFTR. Using a shortest-path algorithm we developed, we mapped HTS hits to the CFTR interactome to provide functional context to the targets and identified the eukaryotic translation initiation factor 3a (eIF3a) as a central hub for the biogenesis of CFTR. Of note, siRNA-mediated silencing of eIF3a reduced the polysome-to-monosome ratio in F508del-expressing cells, which, in turn, decreased the translation of CFTR variants, leading to increased CFTR stability, trafficking, and function at the cell surface. This finding suggested that eIF3a is involved in mediating the impact of genetic variations in CFTR on the folding of this protein. We posit that the number of ribosomes on a CFTR mRNA transcript is inversely correlated with the stability of the translated polypeptide. Polysome-based translation challenges the capacity of the proteostasis environment to balance message fidelity with protein folding, leading to disease. We suggest that this deficit can be corrected through control of translation initiation.
Collapse
Affiliation(s)
| | | | | | - Andrew I Su
- Integrative Structural and Computational Biology and
| | - William E Balch
- From the Departments of Molecular Medicine and .,the Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California 92037
| |
Collapse
|
46
|
Cri-du-Chat Syndrome interactome network: Correlating genotypic variations to associated phenotypes. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.03.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
47
|
Abstract
Motivation Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results Here, we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding-based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation Source code and datasets are available at http://snap.stanford.edu/ohmnet.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
48
|
Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, Sharan R, Ideker T. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 2018; 15:290-298. [PMID: 29505029 PMCID: PMC5882547 DOI: 10.1038/nmeth.4627] [Citation(s) in RCA: 206] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 01/20/2023]
Abstract
Although artificial neural networks simulate a variety of human functions, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) which couple the model’s inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in-silico investigations of the molecular mechanisms underlying genotype-phenotype associations. These mechanisms can be validated and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance, and synthetic life.
Collapse
Affiliation(s)
- Jianzhu Ma
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Michael Ku Yu
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Program in Bioinformatics, University of California San Diego, La Jolla, California, USA
| | - Samson Fong
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| | - Keiichiro Ono
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Eric Sage
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Barry Demchak
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Program in Bioinformatics, University of California San Diego, La Jolla, California, USA.,Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
49
|
Ignatius Pang CN, Goel A, Wilkins MR. Investigating the Network Basis of Negative Genetic Interactions in Saccharomyces cerevisiae with Integrated Biological Networks and Triplet Motif Analysis. J Proteome Res 2018; 17:1014-1030. [DOI: 10.1021/acs.jproteome.7b00649] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Chi Nam Ignatius Pang
- Systems
Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Apurv Goel
- Systems
Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Marc R. Wilkins
- Systems
Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|
50
|
Hansen J, Meretzky D, Woldesenbet S, Stolovitzky G, Iyengar R. A flexible ontology for inference of emergent whole cell function from relationships between subcellular processes. Sci Rep 2017; 7:17689. [PMID: 29255142 PMCID: PMC5735158 DOI: 10.1038/s41598-017-16627-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 11/15/2017] [Indexed: 01/14/2023] Open
Abstract
Whole cell responses arise from coordinated interactions between diverse human gene products functioning within various pathways underlying sub-cellular processes (SCP). Lower level SCPs interact to form higher level SCPs, often in a context specific manner to give rise to whole cell function. We sought to determine if capturing such relationships enables us to describe the emergence of whole cell functions from interacting SCPs. We developed the Molecular Biology of the Cell Ontology based on standard cell biology and biochemistry textbooks and review articles. Currently, our ontology contains 5,384 genes, 753 SCPs and 19,180 expertly curated gene-SCP associations. Our algorithm to populate the SCPs with genes enables extension of the ontology on demand and the adaption of the ontology to the continuously growing cell biological knowledge. Since whole cell responses most often arise from the coordinated activity of multiple SCPs, we developed a dynamic enrichment algorithm that flexibly predicts SCP-SCP relationships beyond the current taxonomy. This algorithm enables us to identify interactions between SCPs as a basis for higher order function in a context dependent manner, allowing us to provide a detailed description of how SCPs together can give rise to whole cell functions. We conclude that this ontology can, from omics data sets, enable the development of detailed SCP networks for predictive modeling of emergent whole cell functions.
Collapse
Affiliation(s)
- Jens Hansen
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - David Meretzky
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Simeneh Woldesenbet
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Life Science, IMC University of Applied Sciences Krems, Krems an der Donau, Austria
| | - Gustavo Stolovitzky
- Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA.,Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ravi Iyengar
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA. .,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|