1
|
Larmore M, Palomero OE, Kamat NP, DeCaen PG. A synthetic method to assay polycystin channel biophysics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592666. [PMID: 38766162 PMCID: PMC11100589 DOI: 10.1101/2024.05.06.592666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Ion channels are biological transistors that control ionic flux across cell membranes to regulate electrical transmission and signal transduction. They are found in all biological membranes and their conductive states are frequently disrupted in human diseases. Organelle ion channels are among the most resistant to functional and pharmacological interrogation. Traditional channel protein reconstitution methods rely upon exogenous expression and/or purification from endogenous cellular sources which are frequently contaminated by resident ionophores. Here we describe a fully synthetic method to assay the functional properties of the polycystin subfamily of transient receptor potential (TRP) channels that natively traffic to primary cilia and endoplasmic reticulum organelles. Using this method, we characterize their membrane integration, orientation and conductance while comparing these results to their endogenous channel properties. Outcomes define a novel synthetic approach that can be applied broadly to investigate other channels resistant to biophysical analysis and pharmacological characterization.
Collapse
Affiliation(s)
- Megan Larmore
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Orhi Esarte Palomero
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Neha P Kamat
- Department of Biomedical Engineering, McCormick School of Engineering and Applied Science, Northwestern University, Evanston, Illinois, USA
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA
| | - Paul G DeCaen
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
- Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
2
|
Muscò A, Martini D, Digregorio M, Broccoli V, Andreazzoli M. Shedding a Light on Dark Genes: A Comparative Expression Study of PRR12 Orthologues during Zebrafish Development. Genes (Basel) 2024; 15:492. [PMID: 38674426 PMCID: PMC11050278 DOI: 10.3390/genes15040492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 04/06/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024] Open
Abstract
Haploinsufficiency of the PRR12 gene is implicated in a human neuro-ocular syndrome. Although identified as a nuclear protein highly expressed in the embryonic mouse brain, PRR12 molecular function remains elusive. This study explores the spatio-temporal expression of zebrafish PRR12 co-orthologs, prr12a and prr12b, as a first step to elucidate their function. In silico analysis reveals high evolutionary conservation in the DNA-interacting domains for both orthologs, with significant syntenic conservation observed for the prr12b locus. In situ hybridization and RT-qPCR analyses on zebrafish embryos and larvae reveal distinct expression patterns: prr12a is expressed early in zygotic development, mainly in the central nervous system, while prr12b expression initiates during gastrulation, localizing later to dopaminergic telencephalic and diencephalic cell clusters. Both transcripts are enriched in the ganglion cell and inner neural layers of the 72 hpf retina, with prr12b widely distributed in the ciliary marginal zone. In the adult brain, prr12a and prr12b are found in the cerebellum, amygdala and ventral telencephalon, which represent the main areas affected in autistic patients. Overall, this study suggests PRR12's potential involvement in eye and brain development, laying the groundwork for further investigations into PRR12-related neurobehavioral disorders.
Collapse
Affiliation(s)
- Alessia Muscò
- Cell and Developmental Biology Unit, University of Pisa, 56126 Pisa, Italy (D.M.)
| | - Davide Martini
- Cell and Developmental Biology Unit, University of Pisa, 56126 Pisa, Italy (D.M.)
| | - Matteo Digregorio
- Cell and Developmental Biology Unit, University of Pisa, 56126 Pisa, Italy (D.M.)
| | - Vania Broccoli
- Stem Cell and Neurogenesis Unit, Division of Neuroscience, San Raffaele Scientific Institute, 20132 Milan, Italy
- CNR Institute of Neuroscience, 20132 Milan, Italy
| | | |
Collapse
|
3
|
Brlek P, Bulić L, Bračić M, Projić P, Škaro V, Shah N, Shah P, Primorac D. Implementing Whole Genome Sequencing (WGS) in Clinical Practice: Advantages, Challenges, and Future Perspectives. Cells 2024; 13:504. [PMID: 38534348 DOI: 10.3390/cells13060504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/04/2024] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
The integration of whole genome sequencing (WGS) into all aspects of modern medicine represents the next step in the evolution of healthcare. Using this technology, scientists and physicians can observe the entire human genome comprehensively, generating a plethora of new sequencing data. Modern computational analysis entails advanced algorithms for variant detection, as well as complex models for classification. Data science and machine learning play a crucial role in the processing and interpretation of results, using enormous databases and statistics to discover new and support current genotype-phenotype correlations. In clinical practice, this technology has greatly enabled the development of personalized medicine, approaching each patient individually and in accordance with their genetic and biochemical profile. The most propulsive areas include rare disease genomics, oncogenomics, pharmacogenomics, neonatal screening, and infectious disease genomics. Another crucial application of WGS lies in the field of multi-omics, working towards the complete integration of human biomolecular data. Further technological development of sequencing technologies has led to the birth of third and fourth-generation sequencing, which include long-read sequencing, single-cell genomics, and nanopore sequencing. These technologies, alongside their continued implementation into medical research and practice, show great promise for the future of the field of medicine.
Collapse
Affiliation(s)
- Petar Brlek
- St. Catherine Specialty Hospital, 10000 Zagreb, Croatia
- International Center for Applied Biological Research, 10000 Zagreb, Croatia
- School of Medicine, Josip Juraj Strossmayer University of Osijek, 31000 Osijek, Croatia
| | - Luka Bulić
- St. Catherine Specialty Hospital, 10000 Zagreb, Croatia
| | - Matea Bračić
- St. Catherine Specialty Hospital, 10000 Zagreb, Croatia
| | - Petar Projić
- International Center for Applied Biological Research, 10000 Zagreb, Croatia
| | | | - Nidhi Shah
- Dartmouth Hitchcock Medical Center, Lebannon, NH 03766, USA
| | - Parth Shah
- Dartmouth Hitchcock Medical Center, Lebannon, NH 03766, USA
| | - Dragan Primorac
- St. Catherine Specialty Hospital, 10000 Zagreb, Croatia
- International Center for Applied Biological Research, 10000 Zagreb, Croatia
- School of Medicine, Josip Juraj Strossmayer University of Osijek, 31000 Osijek, Croatia
- Medical School, University of Split, 21000 Split, Croatia
- Eberly College of Science, The Pennsylvania State University, State College, PA 16802, USA
- The Henry C. Lee College of Criminal Justice and Forensic Sciences, University of New Haven, West Haven, CT 06516, USA
- REGIOMED Kliniken, 96450 Coburg, Germany
- Medical School, University of Rijeka, 51000 Rijeka, Croatia
- Faculty of Dental Medicine and Health, Josip Juraj Strossmayer University of Osijek, 31000 Osijek, Croatia
- Medical School, University of Mostar, 88000 Mostar, Bosnia and Herzegovina
- National Forensic Sciences University, Gujarat 382007, India
| |
Collapse
|
4
|
Oprea TI, Bologa C, Holmes J, Mathias S, Metzger VT, Waller A, Yang JJ, Leach AR, Jensen LJ, Kelleher KJ, Sheils TK, Mathé E, Avram S, Edwards JS. Overview of the Knowledge Management Center for Illuminating the Druggable Genome. Drug Discov Today 2024; 29:103882. [PMID: 38218214 PMCID: PMC10939799 DOI: 10.1016/j.drudis.2024.103882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/22/2023] [Accepted: 01/09/2024] [Indexed: 01/15/2024]
Abstract
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.
Collapse
Affiliation(s)
- Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Cristian Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Stephen Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Anna Waller
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Keith J Kelleher
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Ewy Mathé
- National Center for Advancing Translational Sciences (NCATS), NIH, Bethesda, MD, USA
| | - Sorin Avram
- Coriolan Dragulescu Institute of Chemistry, Timisoara, Romania
| | - Jeremy S Edwards
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA; Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM, USA.
| |
Collapse
|
5
|
Koutrouli M, Nastou K, Piera Líndez P, Bouwmeester R, Rasmussen S, Martens L, Jensen LJ. FAVA: high-quality functional association networks inferred from scRNA-seq and proteomics data. Bioinformatics 2024; 40:btae010. [PMID: 38192003 PMCID: PMC10868155 DOI: 10.1093/bioinformatics/btae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 12/07/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Pau Piera Líndez
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
6
|
Kafita D, Nkhoma P, Dzobo K, Sinkala M. Shedding light on the dark genome: Insights into the genetic, CRISPR-based, and pharmacological dependencies of human cancers and disease aggressiveness. PLoS One 2023; 18:e0296029. [PMID: 38117798 PMCID: PMC10732413 DOI: 10.1371/journal.pone.0296029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/05/2023] [Indexed: 12/22/2023] Open
Abstract
Investigating the human genome is vital for identifying risk factors and devising effective therapies to combat genetic disorders and cancer. Despite the extensive knowledge of the "light genome", the poorly understood "dark genome" remains understudied. In this study, we integrated data from 20,412 protein-coding genes in Pharos and 8,395 patient-derived tumours from The Cancer Genome Atlas (TCGA) to examine the genetic and pharmacological dependencies in human cancers and their treatment implications. We discovered that dark genes exhibited high mutation rates in certain cancers, similar to light genes. By combining the drug response profiles of cancer cells with cell fitness post-CRISPR-mediated gene knockout, we identified the crucial vulnerabilities associated with both dark and light genes. Our analysis also revealed that tumours harbouring dark gene mutations displayed worse overall and disease-free survival rates than those without such mutations. Furthermore, dark gene expression levels significantly influenced patient survival outcomes. Our findings demonstrated a similar distribution of genetic and pharmacological dependencies across the light and dark genomes, suggesting that targeting the dark genome holds promise for cancer treatment. This study underscores the need for ongoing research on the dark genome to better comprehend the underlying mechanisms of cancer and develop more effective therapies.
Collapse
Affiliation(s)
- Doris Kafita
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
| | - Panji Nkhoma
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
| | - Kevin Dzobo
- Department of Medicine, Division of Dermatology, Hair and Skin Research Laboratory, Wound and Keloid Scarring Research Unit, The South African Medical Research Council, University of Cape Town, Cape Town, South Africa
| | - Musalula Sinkala
- Department of Biomedical Sciences, University of Zambia, School of Health Sciences, Lusaka, Zambia
- Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine and Department of Integrative Biomedical Sciences, University of Cape Town, Computational Biology Division, Cape Town, South Africa
| |
Collapse
|
7
|
Cunningham M, Pins D, Dezső Z, Torrent M, Vasanthakumar A, Pandey A. PINNED: identifying characteristics of druggable human proteins using an interpretable neural network. J Cheminform 2023; 15:64. [PMID: 37468968 PMCID: PMC10354961 DOI: 10.1186/s13321-023-00735-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between "druggable" and "undruggable" proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Collapse
Affiliation(s)
- Michael Cunningham
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA.
| | - Danielle Pins
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Zoltán Dezső
- Genomics Research Center, AbbVie Inc., 1000 Gateway Boulevard, South San Francisco, CA, 94080, USA
| | - Maricel Torrent
- Small Molecule Therapeutics and Platform Technologies, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Aparna Vasanthakumar
- Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| | - Abhishek Pandey
- Information Research, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL, 60064, USA
| |
Collapse
|
8
|
Lindovsky J, Nichtova Z, Dragano NRV, Pajuelo Reguera D, Prochazka J, Fuchs H, Marschall S, Gailus-Durner V, Sedlacek R, Hrabě de Angelis M, Rozman J, Spielmann N. A review of standardized high-throughput cardiovascular phenotyping with a link to metabolism in mice. Mamm Genome 2023; 34:107-122. [PMID: 37326672 PMCID: PMC10290615 DOI: 10.1007/s00335-023-09997-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 05/03/2023] [Indexed: 06/17/2023]
Abstract
Cardiovascular diseases cause a high mortality rate worldwide and represent a major burden for health care systems. Experimental rodent models play a central role in cardiovascular disease research by effectively simulating human cardiovascular diseases. Using mice, the International Mouse Phenotyping Consortium (IMPC) aims to target each protein-coding gene and phenotype multiple organ systems in single-gene knockout models by a global network of mouse clinics. In this review, we summarize the current advances of the IMPC in cardiac research and describe in detail the diagnostic requirements of high-throughput electrocardiography and transthoracic echocardiography capable of detecting cardiac arrhythmias and cardiomyopathies in mice. Beyond that, we are linking metabolism to the heart and describing phenotypes that emerge in a set of known genes, when knocked out in mice, such as the leptin receptor (Lepr), leptin (Lep), and Bardet-Biedl syndrome 5 (Bbs5). Furthermore, we are presenting not yet associated loss-of-function genes affecting both, metabolism and the cardiovascular system, such as the RING finger protein 10 (Rfn10), F-box protein 38 (Fbxo38), and Dipeptidyl peptidase 8 (Dpp8). These extensive high-throughput data from IMPC mice provide a promising opportunity to explore genetics causing metabolic heart disease with an important translational approach.
Collapse
Affiliation(s)
- Jiri Lindovsky
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
| | - Zuzana Nichtova
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
| | - Nathalia R. V. Dragano
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - David Pajuelo Reguera
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
| | - Jan Prochazka
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
| | - Helmut Fuchs
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Susan Marschall
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Valerie Gailus-Durner
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Radislav Sedlacek
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
| | - Martin Hrabě de Angelis
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Jan Rozman
- Czech Centre for Phenogenomics, Institute of Molecular Genetics, Czech Academy of Sciences, Prumyslova 595, 252 50 Vestec, Czech Republic
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Nadine Spielmann
- Institute of Experimental Genetics, German Mouse Clinic, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| |
Collapse
|
9
|
Zhang J, Wang T, Bi J, Ke M, Ren Y, Wang M, Du Z, Liu W, Hu L, Zhang X, Liu X, Wang B, Wu Z, Lv Y, Meng L, Wu R. Overexpression of HSF2 binding protein suppresses endoplasmic reticulum stress via regulating subcellular localization of CDC73 in hepatocytes. Cell Biosci 2023; 13:64. [PMID: 36964632 PMCID: PMC10039577 DOI: 10.1186/s13578-023-01010-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/07/2023] [Indexed: 03/26/2023] Open
Abstract
BACKGROUND Endoplasmic reticulum (ER) stress plays an important role in the occurrence and development of various liver diseases. However, there are no effective prevention and treatment strategies. We aimed to determine the role of heat shock factor 2 binding protein (HSF2BP) in ER stress. METHODS HSF2BP expression in mice and cultured hepatocytes was measured during ER stress induced by tunicamycin, and its importance in ER stress was evaluated in hepatocyte-specific HSF2BP transgenic (TG) and knockout (KO) mice. The effects and mechanisms of HSF2BP on ER stress were further probed in hepatic ischemia-reperfusion (I/R) injury. RESULTS HSF2BP expression was significantly upregulated during tunicamycin-induced ER stress in mice and cultured hepatocytes. Liver injury and ER stress were reduced in HSF2BP overexpressing mice after treating with tunicamycin, but were aggravated in HSF2BP knockout mice compared to the controls. In hepatic I/R injury, HSF2BP expression was significantly upregulated, and HSF2BP overexpressing mice had reduced liver injury and inflammation. These improvements were associated with ER stress inhibition. However, these results were reversed in hepatocyte-specific HSF2BP knockout mice. HSF2BP overexpression increased cytoplasmic CDC73 levels and inhibited the JNK signaling pathway. CDC73 knockdown using siRNA eliminated the protection exerted by HSF2BP overexpression in hypoxia/reoxygenation (H/R)-induced ER stress in hepatocytes. CONCLUSION HSF2BP is a previously uncharacterized regulatory factor in ER stress-likely acts by regulating CDC73 subcellular localization. The feasibility of HSF2BP-targeted treatment in ER stress-related liver disease deserves future research.
Collapse
Affiliation(s)
- Jia Zhang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Gastroenterology, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Tao Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Jianbin Bi
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Oncology, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Mengyun Ke
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
| | - Yifan Ren
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of General Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Mengzhou Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Zhaoqing Du
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Hepatobiliary Surgery, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, China
| | - Wuming Liu
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Liangshuo Hu
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiaogang Zhang
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xuemin Liu
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Bo Wang
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Zheng Wu
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China
- Department of Hepatobiliary Surgery, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Lingzhong Meng
- Anesthesiology and Perioperative Medicine, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Rongqian Wu
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Shaanxi Provincial Center for Regenerative Medicine and Surgical Engineering, Center for Regenerative and Reconstructive Medicine, Med-X Institute, First Affiliated Hospital of Xi'an Jiaotong University, 124, 76 West Yanta Road, Xi'an, Shaanxi, 710061, China.
| |
Collapse
|
10
|
Lachmann A, Rizzo KA, Bartal A, Jeon M, Clarke DJB, Ma’ayan A. PrismEXP: gene annotation prediction from stratified gene-gene co-expression matrices. PeerJ 2023; 11:e14927. [PMID: 36874981 PMCID: PMC9979837 DOI: 10.7717/peerj.14927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 01/30/2023] [Indexed: 03/03/2023] Open
Abstract
Background Gene-gene co-expression correlations measured by mRNA-sequencing (RNA-seq) can be used to predict gene annotations based on the co-variance structure within these data. In our prior work, we showed that uniformly aligned RNA-seq co-expression data from thousands of diverse studies is highly predictive of both gene annotations and protein-protein interactions. However, the performance of the predictions varies depending on whether the gene annotations and interactions are cell type and tissue specific or agnostic. Tissue and cell type-specific gene-gene co-expression data can be useful for making more accurate predictions because many genes perform their functions in unique ways in different cellular contexts. However, identifying the optimal tissues and cell types to partition the global gene-gene co-expression matrix is challenging. Results Here we introduce and validate an approach called PRediction of gene Insights from Stratified Mammalian gene co-EXPression (PrismEXP) for improved gene annotation predictions based on RNA-seq gene-gene co-expression data. Using uniformly aligned data from ARCHS4, we apply PrismEXP to predict a wide variety of gene annotations including pathway membership, Gene Ontology terms, as well as human and mouse phenotypes. Predictions made with PrismEXP outperform predictions made with the global cross-tissue co-expression correlation matrix approach on all tested domains, and training using one annotation domain can be used to predict annotations in other domains. Conclusions By demonstrating the utility of PrismEXP predictions in multiple use cases we show how PrismEXP can be used to enhance unsupervised machine learning methods to better understand the roles of understudied genes and proteins. To make PrismEXP accessible, it is provided via a user-friendly web interface, a Python package, and an Appyter. AVAILABILITY. The PrismEXP web-based application, with pre-computed PrismEXP predictions, is available from: https://maayanlab.cloud/prismexp; PrismEXP is also available as an Appyter: https://appyters.maayanlab.cloud/PrismEXP/; and as Python package: https://github.com/maayanlab/prismexp.
Collapse
Affiliation(s)
- Alexander Lachmann
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Kaeli A. Rizzo
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alon Bartal
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Minji Jeon
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Daniel J. B. Clarke
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Avi Ma’ayan
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
11
|
Kumar L, Brenner N, Sledzieski S, Olaosebikan M, Roger LM, Lynn-Goin M, Klein-Seetharaman R, Berger B, Putnam H, Yang J, Lewinski NA, Singh R, Daniels NM, Cowen L, Klein-Seetharaman J. Transfer of knowledge from model organisms to evolutionarily distant non-model organisms: The coral Pocillopora damicornis membrane signaling receptome. PLoS One 2023; 18:e0270965. [PMID: 36735673 PMCID: PMC9897584 DOI: 10.1371/journal.pone.0270965] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 06/21/2022] [Indexed: 02/04/2023] Open
Abstract
With the ease of gene sequencing and the technology available to study and manipulate non-model organisms, the extension of the methodological toolbox required to translate our understanding of model organisms to non-model organisms has become an urgent problem. For example, mining of large coral and their symbiont sequence data is a challenge, but also provides an opportunity for understanding functionality and evolution of these and other non-model organisms. Much more information than for any other eukaryotic species is available for humans, especially related to signal transduction and diseases. However, the coral cnidarian host and human have diverged over 700 million years ago and homologies between proteins in the two species are therefore often in the gray zone, or at least often undetectable with traditional BLAST searches. We introduce a two-stage approach to identifying putative coral homologues of human proteins. First, through remote homology detection using Hidden Markov Models, we identify candidate human homologues in the cnidarian genome. However, for many proteins, the human genome alone contains multiple family members with similar or even more divergence in sequence. In the second stage, therefore, we filter the remote homology results based on the functional and structural plausibility of each coral candidate, shortlisting the coral proteins likely to have conserved some of the functions of the human proteins. We demonstrate our approach with a pipeline for mapping membrane receptors in humans to membrane receptors in corals, with specific focus on the stony coral, P. damicornis. More than 1000 human membrane receptors mapped to 335 coral receptors, including 151 G protein coupled receptors (GPCRs). To validate specific sub-families, we chose opsin proteins, representative GPCRs that confer light sensitivity, and Toll-like receptors, representative non-GPCRs, which function in the immune response, and their ability to communicate with microorganisms. Through detailed structure-function analysis of their ligand-binding pockets and downstream signaling cascades, we selected those candidate remote homologues likely to carry out related functions in the corals. This pipeline may prove generally useful for other non-model organisms, such as to support the growing field of synthetic biology.
Collapse
Affiliation(s)
- Lokender Kumar
- Department of Chemistry, Colorado School of Mines, Golden, CO, United States of America
| | - Nathanael Brenner
- Department of Chemistry, Colorado School of Mines, Golden, CO, United States of America
| | - Samuel Sledzieski
- MIT Computer Science & Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Monsurat Olaosebikan
- Department of Computer Science, Tufts University, Medford, MA, United States of America
| | - Liza M. Roger
- Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Matthew Lynn-Goin
- Department of Chemistry, Colorado School of Mines, Golden, CO, United States of America
| | | | - Bonnie Berger
- MIT Computer Science & Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Hollie Putnam
- Department of Biological Sciences, University of Rhode Island, South Kingstown, RI, United States of America
| | - Jinkyu Yang
- Department of Department of Aeronautics & Astronautics, University of Washington, Seattle, WA, United States of America
| | - Nastassja A. Lewinski
- Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Rohit Singh
- MIT Computer Science & Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Noah M. Daniels
- Department of Computer Science and Statistics, University of Rhode Island, South Kingstown, RI, United States of America
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA, United States of America
| | - Judith Klein-Seetharaman
- Department of Chemistry, Colorado School of Mines, Golden, CO, United States of America
- * E-mail:
| |
Collapse
|
12
|
Cai T, Xie L, Zhang S, Chen M, He D, Badkul A, Liu Y, Namballa HK, Dorogan M, Harding WW, Mura C, Bourne PE, Xie L. End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins. PLoS Comput Biol 2023; 19:e1010851. [PMID: 36652496 PMCID: PMC9886305 DOI: 10.1371/journal.pcbi.1010851] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 01/30/2023] [Accepted: 01/05/2023] [Indexed: 01/19/2023] Open
Abstract
Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain "dark"-i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Shuo Zhang
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Muge Chen
- Master Program in Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Amitesh Badkul
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Hari Krishna Namballa
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Michael Dorogan
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Wayne W. Harding
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Cameron Mura
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Philip E. Bourne
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York, United States of America
| |
Collapse
|
13
|
Amaral MD. Using the genome to correct the ion transport defect in cystic fibrosis. J Physiol 2022; 601:1573-1582. [PMID: 36068724 DOI: 10.1113/jp282308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 08/31/2022] [Indexed: 11/08/2022] Open
Abstract
KEY POINTS Human genome information can help finding drugs for human diseases. 'Omics' allow unbiased identification of novel drug targets. High-throughput (HT) approaches provide a global view on disease mechanisms. As a monogenic disease CF has led the way in multiple 'Omic' studies. 'Multi-omics' integration will generate maximal biological significance. ABSTRACT Today Biomedicine faces one of its greatest challenges, i.e. treating diseases through their causative dysfunctional processes and not just their symptoms. However, we still miss a global view of mechanisms and pathways involved in pathophysiology of most diseases. In fact, disease mechanisms and pathways can be achieved by holistic studies provided by 'Omic' approaches. Cystic Fibrosis (CF), caused by mutations in the CF transmembrane conductance regulator (CFTR) gene which encodes an anion channel, is paradigmatic for monogenic disorders, namely channelopathies. A high number of 'omics studies' have focussed on CF, namely several cell-based high-throughput (HT) approaches were developed and applied towards a global mechanistic characterization of CF pathophysiology and the identification of novel and 'unbiased' drug targets. Notwithstanding, it is likely that, through the integration of all these 'layers' of large datasets into comprehensive disease maps that biological significance can be extracted so that the enormous potential of these approaches to identifying dysfunctional mechanisms and novel drugs may become a reality. Abstract figure legend Schematic overview of the 3 main approaches to discovery of new drugs/drug targets. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Margarida D Amaral
- BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Campo Grande-C8 bdg, Lisboa, 1749-016, Portugal
| |
Collapse
|
14
|
Abstract
For many years, the laboratory mouse has been the favored model organism to study mammalian development, biology and disease. Among its advantages for these studies are its close concordance with human biology, the syntenic relationship between the mouse and other mammalian genomes, the existence of many inbred strains, its short gestation period, its relatively low cost for housing and husbandry, and the wide array of tools for genome modification, mutagenesis, and for cryopreserving embryos, sperm and eggs. The advent of CRISPR genome modification techniques has considerably broadened the landscape of model organisms available for study, including other mammalian species. However, the mouse remains the most popular and utilized system to model human development, biology, and disease processes. In this review, we will briefly summarize the long history of mice as a preferred mammalian genetic and model system, and review current large-scale mutagenesis efforts using genome modification to produce improved models for mammalian development and disease.
Collapse
Affiliation(s)
- Thomas Gridley
- Center for Clinical and Translational Research, Maine Medical Center Research Institute, Scarborough, ME, United States.
| | | |
Collapse
|
15
|
Cai T, Abbu KA, Liu Y, Xie L. DeepREAL: A Deep Learning Powered Multi-scale Modeling Framework for Predicting Out-of-distribution Ligand-induced GPCR Activity. Bioinformatics 2022; 38:2561-2570. [PMID: 35274689 PMCID: PMC9048666 DOI: 10.1093/bioinformatics/btac154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 11/20/2022] Open
Abstract
Motivation Drug discovery has witnessed intensive exploration of predictive modeling of drug–target physical interactions over two decades. However, a critical knowledge gap needs to be filled for correlating drug–target interactions with clinical outcomes: predicting genome-wide receptor activities or function selectivity, especially agonist versus antagonist, induced by novel chemicals. Two major obstacles compound the difficulty on this task: known data of receptor activity is far too scarce to train a robust model in light of genome-scale applications, and real-world applications need to deploy a model on data from various shifted distributions. Results To address these challenges, we have developed an end-to-end deep learning framework, DeepREAL, for multi-scale modeling of genome-wide ligand-induced receptor activities. DeepREAL utilizes self-supervised learning on tens of millions of protein sequences and pre-trained binary interaction classification to solve the data distribution shift and data scarcity problems. Extensive benchmark studies on G-protein coupled receptors (GPCRs), which simulate real-world scenarios, demonstrate that DeepREAL achieves state-of-the-art performances in out-of-distribution settings. DeepREAL can be extended to other gene families beyond GPCRs. Availability and implementation All data used are downloaded from Pfam (Mistry et al., 2020), GLASS (Chan et al., 2015) and IUPHAR/BPS and the data from reference (Sakamuru et al., 2021). Readers are directed to their official website for original data. Code is available on GitHub https://github.com/XieResearchGroup/DeepREAL. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Kyra Alyssa Abbu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA.,Helen and Robert Appel Alzheimer's Disease Research Institute,Feil Family Brain & Mind Research Institute,Weill Cornell Medicine,Cornell University, New York, 10021, USA
| |
Collapse
|
16
|
Cai T, Xie L, Chen M, Liu Y, He D, Zhang S, Mura C, Bourne PE, Xie L. Exploration of Dark Chemical Genomics Space via Portal Learning: Applied to Targeting the Undruggable Genome and COVID-19 Anti-Infective Polypharmacology. RESEARCH SQUARE 2021:rs.3.rs-1109318. [PMID: 34873596 PMCID: PMC8647653 DOI: 10.21203/rs.3.rs-1109318/v1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones-a common dilemma in scientific inquiry. We have developed a new deep learning framework, called Portal Learning, to explore dark chemical and biological space. Three key, novel components of our approach include: (i) end-to-end, step-wise transfer learning, in recognition of biology's sequence-structure-function paradigm, (ii) out-of-cluster meta-learning, and (iii) stress model selection. Portal Learning provides a practical solution to the out-of-distribution (OOD) problem in statistical machine learning. Here, we have implemented Portal Learning to predict chemical-protein interactions on a genome-wide scale. Systematic studies demonstrate that Portal Learning can effectively assign ligands to unexplored gene families (unknown functions), versus existing state-of-the-art methods. Compared with AlphaFold2-based protein-ligand docking, Portal Learning significantly improved the performance by 79% in PR-AUC and 27% in ROC-AUC, respectively. The superior performance of Portal Learning allowed us to target previously "undruggable" proteins and design novel polypharmacological agents for disrupting interactions between SARS-CoV-2 and human proteins. Portal Learning is general-purpose and can be further applied to other areas of scientific inquiry.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Muge Chen
- Master Program in Computer Science, Courant Institute of Mathematical Sciences, New York University
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Shuo Zhang
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
| | - Cameron Mura
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Virginia, 22903, USA
| | - Philip E. Bourne
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Virginia, 22903, USA
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016, USA
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, USA
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, 10021, USA
| |
Collapse
|
17
|
Sheils T, Mathias SL, Siramshetty VB, Bocci G, Bologa CG, Yang JJ, Waller A, Southall N, Nguyen DT, Oprea TI. How to Illuminate the Druggable Genome Using Pharos. ACTA ACUST UNITED AC 2021; 69:e92. [PMID: 31898878 DOI: 10.1002/cpbi.92] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Pharos is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied "dark" targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. Two basic protocols illustrate the levels of detail available for targets and several methods of finding targets of interest. An Alternate Protocol illustrates the difference in available knowledge between less and more studied targets. © 2020 by John Wiley & Sons, Inc. Basic Protocol 1: Search for a target and view details Alternate Protocol: Search for dark target and view details Basic Protocol 2: Filter a target list to get refined results.
Collapse
Affiliation(s)
- Timothy Sheils
- National Center for Advancing Translational Sciences, Rockville, Maryland
| | - Stephen L Mathias
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, New Mexico
| | | | - Giovanni Bocci
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, New Mexico
| | - Cristian G Bologa
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, New Mexico
| | - Jeremy J Yang
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, New Mexico
| | - Anna Waller
- Department of Pathology, University of New Mexico School of Medicine, Albuquerque, New Mexico
| | - Noel Southall
- National Center for Advancing Translational Sciences, Rockville, Maryland
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, Rockville, Maryland
| | - Tudor I Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, New Mexico.,UNM Comprehensive Cancer Center, Albuquerque, New Mexico.,Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
18
|
Ferrari E, Naponelli V, Bettuzzi S. Lemur Tyrosine Kinases and Prostate Cancer: A Literature Review. Int J Mol Sci 2021; 22:ijms22115453. [PMID: 34064250 PMCID: PMC8196904 DOI: 10.3390/ijms22115453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/06/2021] [Accepted: 05/18/2021] [Indexed: 12/16/2022] Open
Abstract
The members of the Lemur Tyrosine Kinases (LMTK1-3) subfamily constitute a group of three membrane-anchored kinases. They are known to influence a wide variety of key cellular events, often affecting cell proliferation and apoptosis. They have been discovered to be involved in cancer, in that they impact various signalling pathways that influence cell proliferation, migration, and invasiveness. Notably, in the context of genome-wide association studies, one member of the LMTK family has been identified as a candidate gene which could contribute to the development of prostate cancer. In this review, of published literature, we present evidence on the role of LMTKs in human prostate cancer and model systems, focusing on the complex network of interacting partners involved in signalling cascades that are frequently activated in prostate cancer malignancy. We speculate that the modulators of LMTK enzyme expression and activity would be of high clinical relevance for the design of innovative prostate cancer treatment.
Collapse
Affiliation(s)
- Elena Ferrari
- Department of Medicine and Surgery, University of Parma, Via Gramsci, 14, 43126 Parma, Italy; (V.N.); (S.B.)
- Correspondence: ; Tel.: +39-0521-033-822
| | - Valeria Naponelli
- Department of Medicine and Surgery, University of Parma, Via Gramsci, 14, 43126 Parma, Italy; (V.N.); (S.B.)
- National Institute of Biostructure and Biosystems (INBB), Viale Medaglie d’Oro 305, 00136 Rome, Italy
- Centre for Molecular and Translational Oncology (COMT), University of Parma, Parco Area delle Scienze 11/a, 43124 Parma, Italy
| | - Saverio Bettuzzi
- Department of Medicine and Surgery, University of Parma, Via Gramsci, 14, 43126 Parma, Italy; (V.N.); (S.B.)
- National Institute of Biostructure and Biosystems (INBB), Viale Medaglie d’Oro 305, 00136 Rome, Italy
- Centre for Molecular and Translational Oncology (COMT), University of Parma, Parco Area delle Scienze 11/a, 43124 Parma, Italy
| |
Collapse
|
19
|
Cai T, Lim H, Abbu KA, Qiu Y, Nussinov R, Xie L. MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization. J Chem Inf Model 2021; 61:1570-1582. [PMID: 33757283 PMCID: PMC8154251 DOI: 10.1021/acs.jcim.0c01285] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Indexed: 01/14/2023]
Abstract
Small molecules play a critical role in modulating biological systems. Knowledge of chemical-protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical-protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Hansaim Lim
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Kyra Alyssa Abbu
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
| | - Yue Qiu
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
| | - Ruth Nussinov
- Computational
Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Lei Xie
- Ph.D.
Program in Computer Science, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Ph.D.
Program in Biochemistry, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Department
of Computer Science, Hunter College, The
City University of New York, New York, New York 10065, United States
- Ph.D.
Program in Biology, The Graduate Center, The City University of New York, New York, New York 10016, United States
- Helen
and Robert Appel Alzheimer’s Disease Research Institute, Feil
Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York 10021, United States
| |
Collapse
|
20
|
Koulouras G, Frith MC. Significant non-existence of sequences in genomes and proteomes. Nucleic Acids Res 2021; 49:3139-3155. [PMID: 33693858 PMCID: PMC8034619 DOI: 10.1093/nar/gkab139] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/11/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.
Collapse
Affiliation(s)
- Grigorios Koulouras
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan
| |
Collapse
|
21
|
Horner NR, Venkataraman S, Armit C, Casero R, Brown JM, Wong MD, van Eede MC, Henkelman RM, Johnson S, Teboul L, Wells S, Brown SD, Westerberg H, Mallon AM. LAMA: automated image analysis for the developmental phenotyping of mouse embryos. Development 2021; 148:dev192955. [PMID: 33574040 PMCID: PMC8015254 DOI: 10.1242/dev.192955] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 12/21/2020] [Indexed: 11/20/2022]
Abstract
Advanced 3D imaging modalities, such as micro-computed tomography (micro-CT), have been incorporated into the high-throughput embryo pipeline of the International Mouse Phenotyping Consortium (IMPC). This project generates large volumes of raw data that cannot be immediately exploited without significant resources of personnel and expertise. Thus, rapid automated annotation is crucial to ensure that 3D imaging data can be integrated with other multi-dimensional phenotyping data. We present an automated computational mouse embryo phenotyping pipeline that harnesses the large amount of wild-type control data available in the IMPC embryo pipeline in order to address issues of low mutant sample number as well as incomplete penetrance and variable expressivity. We also investigate the effect of developmental substage on automated phenotyping results. Designed primarily for developmental biologists, our software performs image pre-processing, registration, statistical analysis and segmentation of embryo images. We also present a novel anatomical E14.5 embryo atlas average and, using it with LAMA, show that we can uncover known and novel dysmorphology from two IMPC knockout lines.
Collapse
Affiliation(s)
- Neil R Horner
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | - Shanmugasundaram Venkataraman
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Chris Armit
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine (IGMM), University of Edinburgh, Edinburgh EH4 2XU, UK
- BGI Hong Kong, 26/F, Kings Wing Plaza 2, 1 On Kwan Street, Shek Mun, New Territories, Hong Kong
| | - Ramón Casero
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | - James M Brown
- School of Computer Science, University of Lincoln, Lincoln LN6 7TS
| | - Michael D Wong
- Mouse Imaging Centre, Hospital for Sick Children, Toronto, Ontario M5T 3H7, Canada
| | - Matthijs C van Eede
- Mouse Imaging Centre, Hospital for Sick Children, Toronto, Ontario M5T 3H7, Canada
| | - R Mark Henkelman
- Mouse Imaging Centre, Hospital for Sick Children, Toronto, Ontario M5T 3H7, Canada
| | - Sara Johnson
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | - Lydia Teboul
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | - Sara Wells
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | - Steve D Brown
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| | | | - Ann-Marie Mallon
- Medical Research Council Harwell Institute, Harwell OX11 0RD, UK
| |
Collapse
|
22
|
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
23
|
Avram S, Bologa CG, Holmes J, Bocci G, Wilson TB, Nguyen DT, Curpan R, Halip L, Bora A, Yang JJ, Knockel J, Sirimulla S, Ursu O, Oprea TI. DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Res 2021; 49:D1160-D1169. [PMID: 33151287 PMCID: PMC7779058 DOI: 10.1093/nar/gkaa997] [Citation(s) in RCA: 94] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/09/2020] [Accepted: 10/14/2020] [Indexed: 12/18/2022] Open
Abstract
DrugCentral is a public resource (http://drugcentral.org) that serves the scientific community by providing up-to-date drug information, as described in previous papers. The current release includes 109 newly approved (October 2018 through March 2020) active pharmaceutical ingredients in the US, Europe, Japan and other countries; and two molecular entities (e.g. mefuparib) of interest for COVID19. New additions include a set of pharmacokinetic properties for ∼1000 drugs, and a sex-based separation of side effects, processed from FAERS (FDA Adverse Event Reporting System); as well as a drug repositioning prioritization scheme based on the market availability and intellectual property rights forFDA approved drugs. In the context of the COVID19 pandemic, we also incorporated REDIAL-2020, a machine learning platform that estimates anti-SARS-CoV-2 activities, as well as the 'drugs in news' feature offers a brief enumeration of the most interesting drugs at the present moment. The full database dump and data files are available for download from the DrugCentral web portal.
Collapse
Affiliation(s)
- Sorin Avram
- Department of Computational Chemistry, “Coriolan Dragulescu’’ Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş, 300223, România
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- UNM Comprehensive Cancer Center, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Giovanni Bocci
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Thomas B Wilson
- College of Pharmacy, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ramona Curpan
- Department of Computational Chemistry, “Coriolan Dragulescu’’ Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş, 300223, România
| | - Liliana Halip
- Department of Computational Chemistry, “Coriolan Dragulescu’’ Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş, 300223, România
| | - Alina Bora
- Department of Computational Chemistry, “Coriolan Dragulescu’’ Institute of Chemistry, 24 Mihai Viteazu Blvd, Timişoara, Timiş, 300223, România
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeffrey Knockel
- Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA
| | - Suman Sirimulla
- Department of Pharmaceutical Sciences, School of Pharmacy, The University of Texas at El Paso, TX 79902, USA
| | - Oleg Ursu
- Computational and Structural Chemistry, Merck & Co., Inc., 2000 Galloping Hill Road, Kenilworth, NJ 07033, USA
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Computational and Structural Chemistry, Merck & Co., Inc., 2000 Galloping Hill Road, Kenilworth, NJ 07033, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, 40530 Gothenburg, Sweden
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
24
|
Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter MC, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh LS, Zhang J, Ruch P, Teodoro D. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480-D489. [PMID: 33237286 PMCID: PMC7778908 DOI: 10.1093/nar/gkaa1100] [Citation(s) in RCA: 3710] [Impact Index Per Article: 1236.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/21/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
25
|
Sheils TK, Mathias SL, Kelleher KJ, Siramshetty VB, Nguyen DT, Bologa CG, Jensen LJ, Vidović D, Koleti A, Schürer SC, Waller A, Yang JJ, Holmes J, Bocci G, Southall N, Dharkar P, Mathé E, Simeonov A, Oprea TI. TCRD and Pharos 2021: mining the human proteome for disease biology. Nucleic Acids Res 2021; 49:D1334-D1346. [PMID: 33156327 PMCID: PMC7778974 DOI: 10.1093/nar/gkaa993] [Citation(s) in RCA: 91] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/09/2020] [Accepted: 10/14/2020] [Indexed: 12/13/2022] Open
Abstract
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.
Collapse
Affiliation(s)
- Timothy K Sheils
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Stephen L Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Keith J Kelleher
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Dušica Vidović
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Amar Koleti
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Stephan C Schürer
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Anna Waller
- UNM Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Giovanni Bocci
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Noel Southall
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Poorva Dharkar
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ewy Mathé
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Anton Simeonov
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
- UNM Comprehensive Cancer Center, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, 40530 Gothenburg, Sweden
| |
Collapse
|
26
|
Preuss F, Chatterjee D, Mathea S, Shrestha S, St-Germain J, Saha M, Kannan N, Raught B, Rottapel R, Knapp S. Nucleotide Binding, Evolutionary Insights, and Interaction Partners of the Pseudokinase Unc-51-like Kinase 4. Structure 2020; 28:1184-1196.e6. [PMID: 32814032 DOI: 10.1016/j.str.2020.07.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/17/2020] [Accepted: 07/29/2020] [Indexed: 01/11/2023]
Abstract
Unc-51-like kinase 4 (ULK4) is a pseudokinase that has been linked to the development of several diseases. Even though sequence motifs required for ATP binding in kinases are lacking, ULK4 still tightly binds ATP and the presence of the co-factor is required for structural stability of ULK4. Here, we present a high-resolution structure of a ULK4-ATPγS complex revealing a highly unusual ATP binding mode in which the lack of the canonical VAIK motif lysine is compensated by K39, located N-terminal to αC. Evolutionary analysis suggests that degradation of active site motifs in metazoan ULK4 has co-occurred with an ULK4-specific activation loop, which stabilizes the C helix. In addition, cellular interaction studies using BioID and biochemical validation data revealed high confidence interactors of the pseudokinase and armadillo repeat domains. Many of the identified ULK4 interaction partners were centrosomal and tubulin-associated proteins and several active kinases suggesting interesting regulatory roles for ULK4.
Collapse
Affiliation(s)
- Franziska Preuss
- Institute for Pharmaceutical Chemistry, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany; Buchmann Institute for Molecular Life Sciences, Structural Genomics Consortium, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 15, 60438 Frankfurt am Main, Germany
| | - Deep Chatterjee
- Institute for Pharmaceutical Chemistry, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany; Buchmann Institute for Molecular Life Sciences, Structural Genomics Consortium, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 15, 60438 Frankfurt am Main, Germany
| | - Sebastian Mathea
- Institute for Pharmaceutical Chemistry, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany; Buchmann Institute for Molecular Life Sciences, Structural Genomics Consortium, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 15, 60438 Frankfurt am Main, Germany
| | - Safal Shrestha
- Institute of Bioinformatics & Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green Street, Athens, GA 30602-7229, USA
| | - Jonathan St-Germain
- Princess Margaret Cancer Centre, University Health Network, Toronto M5G 2C4, Canada
| | - Manipa Saha
- Princess Margaret Cancer Centre, University Health Network, Toronto M5G 2C4, Canada
| | - Natarajan Kannan
- Institute of Bioinformatics & Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green Street, Athens, GA 30602-7229, USA
| | - Brian Raught
- Princess Margaret Cancer Centre, University Health Network, Toronto M5G 2C4, Canada
| | - Robert Rottapel
- Princess Margaret Cancer Centre, University Health Network, Toronto M5G 2C4, Canada; Departments of Medicine, Immunology and Medical Biophysics, University of Toronto, Toronto M5G 1L7, Canada; Division of Rheumatology, St. Michael's Hospital, Toronto M5B 1W8, Canada
| | - Stefan Knapp
- Institute for Pharmaceutical Chemistry, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany; Buchmann Institute for Molecular Life Sciences, Structural Genomics Consortium, Johann Wolfgang Goethe-University, Max-von-Laue-Str. 15, 60438 Frankfurt am Main, Germany; German Cancer Consortium (DKTK) and Frankfurt Cancer Institute (FCI), 60596 Frankfurt am Main, Germany.
| |
Collapse
|
27
|
Abstract
Surprisingly we remain ignorant of the function of the majority of genes in the human and mouse genomes. The dark genome is a major obstacle to the interpretation of the function of human genetic variation and its impact on disease. At the same time, pleiotropy, how individual variants influence multiple phenotypes, is key to understanding gene function and the role of genes and genetic networks in disease systems. Both understanding the genetics of disease and developing new therapeutic approaches and advances in precision medicine are all compromised by our limited knowledge of gene function and pleiotropic effects. Illuminating the dark genome and revealing pleiotropy across the genome requires a highly coordinated and international effort to acquire and analyse high-dimensional phenotype data from model organisms. We describe briefly how the International Mouse Phenotyping Consortium is addressing these challenges and the novel features of the pleiotropic landscape that are revealed by functional genomics programmes at genome-wide scale.
Collapse
Affiliation(s)
| | - Heena V Lad
- MRC Harwell Institute, Harwell, OX11 0RD, UK
| |
Collapse
|