1
|
Klimov PB, He Q. Predicting host range expansion in parasitic mites using a global mammalian-acarine dataset. Nat Commun 2024; 15:5431. [PMID: 38926409 PMCID: PMC11208579 DOI: 10.1038/s41467-024-49515-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 06/07/2024] [Indexed: 06/28/2024] Open
Abstract
Multi-host parasites pose greater health risks to wildlife, livestock, and humans than single-host parasites, yet our understanding of how ecological and biological factors influence a parasite's host range remains limited. Here, we assemble the largest and most complete dataset on permanently parasitic mammalian mites and build a predictive model assessing the probability of single-host parasites to become multi-hosts, while accounting for potentially unobserved host-parasite links and class imbalance. This model identifies statistically significant predictors related to parasites, hosts, climate, and habitat disturbance. The most important predictors include the parasite's contact level with the host immune system and two variables characterizing host phylogenetic similarity and spatial co-distribution. Our model reveals an overrepresentation of mites associated with Rodentia (rodents), Chiroptera (bats), and Carnivora in the multi-host risk group. This highlights both the potential vulnerability of these hosts to parasitic infestations and the risk of serving as reservoirs of parasites for new hosts. In addition, we find independent macroevolutionary evidence that supports our prediction of several single-host species of Notoedres, the bat skin parasites, to be in the multi-host risk group, demonstrating the forecasting potential of our model.
Collapse
Affiliation(s)
- Pavel B Klimov
- Lilly Hall of Life Sciences, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, Indiana, 47907, USA.
| | - Qixin He
- Lilly Hall of Life Sciences, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, Indiana, 47907, USA.
| |
Collapse
|
2
|
Trepte P, Secker C, Olivet J, Blavier J, Kostova S, Maseko SB, Minia I, Silva Ramos E, Cassonnet P, Golusik S, Zenkner M, Beetz S, Liebich MJ, Scharek N, Schütz A, Sperling M, Lisurek M, Wang Y, Spirohn K, Hao T, Calderwood MA, Hill DE, Landthaler M, Choi SG, Twizere JC, Vidal M, Wanker EE. AI-guided pipeline for protein-protein interaction drug discovery identifies a SARS-CoV-2 inhibitor. Mol Syst Biol 2024; 20:428-457. [PMID: 38467836 PMCID: PMC10987651 DOI: 10.1038/s44320-024-00019-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 03/13/2024] Open
Abstract
Protein-protein interactions (PPIs) offer great opportunities to expand the druggable proteome and therapeutically tackle various diseases, but remain challenging targets for drug discovery. Here, we provide a comprehensive pipeline that combines experimental and computational tools to identify and validate PPI targets and perform early-stage drug discovery. We have developed a machine learning approach that prioritizes interactions by analyzing quantitative data from binary PPI assays or AlphaFold-Multimer predictions. Using the quantitative assay LuTHy together with our machine learning algorithm, we identified high-confidence interactions among SARS-CoV-2 proteins for which we predicted three-dimensional structures using AlphaFold-Multimer. We employed VirtualFlow to target the contact interface of the NSP10-NSP16 SARS-CoV-2 methyltransferase complex by ultra-large virtual drug screening. Thereby, we identified a compound that binds to NSP10 and inhibits its interaction with NSP16, while also disrupting the methyltransferase activity of the complex, and SARS-CoV-2 replication. Overall, this pipeline will help to prioritize PPI targets to accelerate the discovery of early-stage drug candidates targeting protein complexes and pathways.
Collapse
Affiliation(s)
- Philipp Trepte
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
- Brain Development and Disease, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, 1030, Vienna, Austria.
| | - Christopher Secker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
- Zuse Institute Berlin, Berlin, Germany.
| | - Julien Olivet
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Structural Biology Unit, Laboratory of Virology and Chemotherapy, Rega Institute for Medical Research, Department of Microbiology, Immunology and Transplantation, Katholieke Universiteit Leuven, 3000, Leuven, Belgium
| | - Jeremy Blavier
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Simona Kostova
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Sibusiso B Maseko
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Igor Minia
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
| | - Eduardo Silva Ramos
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Patricia Cassonnet
- Département de Virologie, Unité de Génétique Moléculaire des Virus à ARN (GMVR), Institut Pasteur, Centre National de la Recherche Scientifique (CNRS), Université de Paris, Paris, France
| | - Sabrina Golusik
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Martina Zenkner
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Stephanie Beetz
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Mara J Liebich
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Nadine Scharek
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Anja Schütz
- Protein Production & Characterization, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Marcel Sperling
- Multifunctional Colloids and Coating, Fraunhofer Institute for Applied Polymer Research (IAP), 14476, Potsdam-Golm, Germany
| | - Michael Lisurek
- Structural Chemistry and Computational Biophysics, Leibniz-Institut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | - Yang Wang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - David E Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Markus Landthaler
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
- Institute of Biology, Humboldt-Universität zu Berlin, 13125, Berlin, Germany
| | - Soon Gang Choi
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, 5030, Gembloux, Belgium.
- Laboratory of Algal Synthetic and Systems Biology, Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
| | - Erich E Wanker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
| |
Collapse
|
3
|
Xiao D, Lin M, Liu C, Geddes TA, Burchfield J, Parker B, Humphrey SJ, Yang P. SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data. NAR Genom Bioinform 2023; 5:lqad099. [PMID: 37954574 PMCID: PMC10632189 DOI: 10.1093/nargab/lqad099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 09/18/2023] [Accepted: 10/25/2023] [Indexed: 11/14/2023] Open
Abstract
A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a 'pseudo-positive' learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model ('SnapKin') by incorporating the above two learning strategies into a 'snapshot' ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.
Collapse
Affiliation(s)
- Di Xiao
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Michael Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Thomas A Geddes
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James G Burchfield
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Benjamin L Parker
- Centre for Muscle Research, Department of Anatomy and Physiology, School of Biomedical Sciences, Melbourne, VIC 3010, Australia
| | - Sean J Humphrey
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Environmental and Life Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Murdoch Children’s Research Institute, The Royal Children’s Hospital, Melbourne, VIC, 3052, Australia
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
4
|
Kavaliauskaite G, Madsen JS. Automatic quality control of single-cell and single-nucleus RNA-seq using valiDrops. NAR Genom Bioinform 2023; 5:lqad101. [PMID: 38025048 PMCID: PMC10657416 DOI: 10.1093/nargab/lqad101] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 10/05/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) measures gene expression in individual cells or nuclei enabling comprehensive characterization of cell types and states. However, isolation of cells or nuclei for sxRNA-seq releases contaminating RNA, which can distort biological signals, through, for example, cell damage and transcript leakage. Thus, identifying barcodes containing high-quality cells or nuclei is a critical analytical step in the processing of sxRNA-seq data. Here, we present valiDrops, an automated method to identify high-quality barcodes and flag dead cells. In valiDrops, barcodes are initially filtered using data-adaptive thresholding on community-standard quality metrics, and subsequently, valiDrops uses a novel clustering-based approach to identify barcodes with distinct biological signals. We benchmark valiDrops and show that biological signals from cell types and states are more distinct, easier to separate and more consistent after filtering by valiDrops compared to existing tools. Finally, we show that valiDrops can predict and flag dead cells with high accuracy. This novel classifier can further improve data quality or be used to identify dead cells to interrogate the biology of cell death. Thus, valiDrops is an effective and easy-to-use method to improve data quality and biological interpretation. Our method is openly available as an R package at www.github.com/madsen-lab/valiDrops.
Collapse
Affiliation(s)
- Gabija Kavaliauskaite
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M 5230, Denmark
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M 5230, Denmark
| | - Jesper Grud Skat Madsen
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M 5230, Denmark
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense M 5230, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
5
|
Trepte P, Secker C, Kostova S, Maseko SB, Choi SG, Blavier J, Minia I, Ramos ES, Cassonnet P, Golusik S, Zenkner M, Beetz S, Liebich MJ, Scharek N, Schütz A, Sperling M, Lisurek M, Wang Y, Spirohn K, Hao T, Calderwood MA, Hill DE, Landthaler M, Olivet J, Twizere JC, Vidal M, Wanker EE. AI-guided pipeline for protein-protein interaction drug discovery identifies a SARS-CoV-2 inhibitor. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.14.544560. [PMID: 37398436 PMCID: PMC10312674 DOI: 10.1101/2023.06.14.544560] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Protein-protein interactions (PPIs) offer great opportunities to expand the druggable proteome and therapeutically tackle various diseases, but remain challenging targets for drug discovery. Here, we provide a comprehensive pipeline that combines experimental and computational tools to identify and validate PPI targets and perform early-stage drug discovery. We have developed a machine learning approach that prioritizes interactions by analyzing quantitative data from binary PPI assays and AlphaFold-Multimer predictions. Using the quantitative assay LuTHy together with our machine learning algorithm, we identified high-confidence interactions among SARS-CoV-2 proteins for which we predicted three-dimensional structures using AlphaFold Multimer. We employed VirtualFlow to target the contact interface of the NSP10-NSP16 SARS-CoV-2 methyltransferase complex by ultra-large virtual drug screening. Thereby, we identified a compound that binds to NSP10 and inhibits its interaction with NSP16, while also disrupting the methyltransferase activity of the complex, and SARS-CoV-2 replication. Overall, this pipeline will help to prioritize PPI targets to accelerate the discovery of early-stage drug candidates targeting protein complexes and pathways.
Collapse
Affiliation(s)
- Philipp Trepte
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
- Brain Development and Disease, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, 1030, Vienna, Austria
| | - Christopher Secker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
- Zuse Institute Berlin, Berlin, Germany
| | - Simona Kostova
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Sibusiso B. Maseko
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Soon Gang Choi
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Jeremy Blavier
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Igor Minia
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
| | - Eduardo Silva Ramos
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Patricia Cassonnet
- Département de Virologie, Unité de Génétique Moléculaire des Virus à ARN (GMVR), Institut Pasteur, Centre National de la Recherche Scientifique (CNRS), Université de Paris, Paris, France
| | - Sabrina Golusik
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Martina Zenkner
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Stephanie Beetz
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Mara J. Liebich
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Nadine Scharek
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Anja Schütz
- Protein Production & Characterization, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Marcel Sperling
- Multifunctional Colloids and Coating, Fraunhofer Institute for Applied Polymer Research (IAP), 14476, Potsdam-Golm, Germany
| | - Michael Lisurek
- Structural Chemistry and Computational Biophysics, Leibniz-Institut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | - Yang Wang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Michael A. Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - David E. Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Markus Landthaler
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
- Institute of Biology, Humboldt-Universität zu Berlin, 13125, Berlin, Germany
| | - Julien Olivet
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Structural Biology Unit, Laboratory of Virology and Chemotherapy, Rega Institute for Medical Research, Department of Microbiology, Immunology and Transplantation, Katholieke Universiteit Leuven, 3000, Leuven, Belgium
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, 5030, Gembloux, Belgium
- Laboratory of Algal Synthetic and Systems Biology, Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
| | - Erich E. Wanker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| |
Collapse
|
6
|
Ruan Y, Wang J, Yu M, Wang F, Wang J, Xu Y, Liu L, Cheng Y, Yang R, Zhang C, Yang Y, Wang J, Wu W, Huang Y, Tian Y, Chen G, Zhang J, Jian R. A multi-omics integrative analysis based on CRISPR screens re-defines the pluripotency regulatory network in ESCs. Commun Biol 2023; 6:410. [PMID: 37059858 PMCID: PMC10104827 DOI: 10.1038/s42003-023-04700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 03/13/2023] [Indexed: 04/16/2023] Open
Abstract
A comprehensive and precise definition of the pluripotency gene regulatory network (PGRN) is crucial for clarifying the regulatory mechanisms in embryonic stem cells (ESCs). Here, after a CRISPR/Cas9-based functional genomics screen and integrative analysis with other functional genomes, transcriptomes, proteomes and epigenome data, an expanded pluripotency-associated gene set is obtained, and a new PGRN with nine sub-classes is constructed. By integrating the DNA binding, epigenetic modification, chromatin conformation, and RNA expression profiles, the PGRN is resolved to six functionally independent transcriptional modules (CORE, MYC, PAF, PRC, PCGF and TBX). Spatiotemporal transcriptomics reveal activated CORE/MYC/PAF module activity and repressed PRC/PCGF/TBX module activity in both mouse ESCs (mESCs) and pluripotent cells of early embryos. Moreover, this module activity pattern is found to be shared by human ESCs (hESCs) and cancers. Thus, our results provide novel insights into elucidating the molecular basis of ESC pluripotency.
Collapse
Affiliation(s)
- Yan Ruan
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Jiaqi Wang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
- Department of Pathophysiology, College of High Altitude Military Medicine, Army Medical University, Chongqing, 400038, China
| | - Meng Yu
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
- Department of Joint Surgery, The First Affiliated Hospital, Army Medical University, Chongqing, 400038, China
| | - Fengsheng Wang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
- State Key Laboratory of NBC Protection for Civilian, Beijing, 102205, China
| | - Jiangjun Wang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
- Department of Cell Biology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Yixiao Xu
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Lianlian Liu
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Yuda Cheng
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Ran Yang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
- Department of Pathophysiology, College of High Altitude Military Medicine, Army Medical University, Chongqing, 400038, China
| | - Chen Zhang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Yi Yang
- Experimental Center of Basic Medicine, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - JiaLi Wang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Wei Wu
- Thoracic Surgery Department, Southwest Hospital, The First Hospital Affiliated to Army Medical University, Chongqing, 400038, China
| | - Yi Huang
- Biomedical Analysis Center, Army Medical University, Chongqing, 400038, China
| | - Yanping Tian
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China
| | - Guangxing Chen
- Department of Joint Surgery, The First Affiliated Hospital, Army Medical University, Chongqing, 400038, China.
| | - Junlei Zhang
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China.
| | - Rui Jian
- Laboratory of Stem Cell & Developmental Biology, Department of Histology and Embryology, College of Basic Medical Sciences, Army Medical University, Chongqing, 400038, China.
| |
Collapse
|
7
|
Wang L, You ZH, Huang DS, Li JQ. MGRCDA: Metagraph Recommendation Method for Predicting CircRNA-Disease Association. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:67-75. [PMID: 34236991 DOI: 10.1109/tcyb.2021.3090756] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Clinical evidence began to accumulate, suggesting that circRNAs can be novel therapeutic targets for various diseases and play a critical role in human health. However, limited by the complex mechanism of circRNA, it is difficult to quickly and large-scale explore the relationship between disease and circRNA in the wet-lab experiment. In this work, we design a new computational model MGRCDA on account of the metagraph recommendation theory to predict the potential circRNA-disease associations. Specifically, we first regard the circRNA-disease association prediction problem as the system recommendation problem, and design a series of metagraphs according to the heterogeneous biological networks; then extract the semantic information of the disease and the Gaussian interaction profile kernel (GIPK) similarity of circRNA and disease as network attributes; finally, the iterative search of the metagraph recommendation algorithm is used to calculate the scores of the circRNA-disease pair. On the gold standard dataset circR2Disease, MGRCDA achieved a prediction accuracy of 92.49% with an area under the ROC curve of 0.9298, which is significantly higher than other state-of-the-art models. Furthermore, among the top 30 disease-related circRNAs recommended by the model, 25 have been verified by the latest published literature. The experimental results prove that MGRCDA is feasible and efficient, and it can recommend reliable candidates to further wet-lab experiment and reduce the scope of the experiment.
Collapse
|
8
|
Bhatia A, Chug A, Singh AP, Singh D. A hybrid approach for noise reduction-based optimal classifier using genetic algorithm: A case study in plant disease prediction. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-216011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Plant diseases can cause significant losses to agricultural productivity; therefore, their early prediction is much needed. So far, many machine learning-based plant disease prediction models have been recommended, but these models face a problem of noisy class label dataset that degrades the performance. Noisy class label dataset results from the improper assignment of positive class labels into negative class data samples or vice versa. Hence, a precise and noise-free plant disease model is required for a better prediction. The current study proposes noise reduction-based hybridized classifiers for plant disease prediction. One tomato and four soybean disease datasets have been selected to conduct the proposed research. The Adaptive Sampling-based Class Label Noise Reduction (AS-CLNR) method has been used along with the Support Vector Machine (SVM) approach for noise reduction. The noise-minimized datasets have been fed into the Extreme Learning Machine (ELM), Decision Tree (DT), and Random Forest (RF) classifiers whose parameters are optimized using Genetic Algorithm (GA) for developing plant disease prediction models. The performances of all these models viz. Hybrid SVM-GA-ELM, Hybrid SVM-GA-DT, and Hybrid SVM-GA-RF have been evaluated using Accuracy, Area under ROC Curve, and F1-Score metrics. Further, these classifiers have been ranked using the statistical Friedman Test in which the Hybrid SVM-GA-RF classifier performed the best. Lastly, the Nemenyi test has also been performed to find out if significant differences exist between various classifiers or not. It was found that 33.33% of the total pairs of hybrid classifiers show a remarkably different performance from one another.
Collapse
Affiliation(s)
- Anshul Bhatia
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Anuradha Chug
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Amit Prakash Singh
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Dinesh Singh
- Division of Plant Pathology, Indian Agricultural Research Institute (IARI), New Delhi, India
| |
Collapse
|
9
|
|
10
|
PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning. PLoS Comput Biol 2022; 18:e1009956. [PMID: 35349572 PMCID: PMC8992993 DOI: 10.1371/journal.pcbi.1009956] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 04/08/2022] [Accepted: 02/23/2022] [Indexed: 11/19/2022] Open
Abstract
Metastatic cancer accounts for over 90% of all cancer deaths, and evaluations of metastasis potential are vital for minimizing the metastasis-associated mortality and achieving optimal clinical decision-making. Computational assessment of metastasis potential based on large-scale transcriptomic cancer data is challenging because metastasis events are not always clinically detectable. The under-diagnosis of metastasis events results in biased classification labels, and classification tools using biased labels may lead to inaccurate estimations of metastasis potential. This issue is further complicated by the unknown metastasis prevalence at the population level, the small number of confirmed metastasis cases, and the high dimensionality of the candidate molecular features. Our proposed algorithm, called Positive and unlabeled Learning from Unbalanced cases and Sparse structures (PLUS), is the first to use a positive and unlabeled learning framework to account for the under-detection of metastasis events in building a classifier. PLUS is specifically tailored for studying metastasis that deals with the unbalanced instance allocation as well as unknown metastasis prevalence, which are not considered by other methods. PLUS achieves superior performance on synthetic datasets compared with other state-of-the-art methods. Application of PLUS to The Cancer Genome Atlas Pan-Cancer gene expression data generated metastasis potential predictions that show good agreement with the clinical follow-up data, in addition to predictive genes that have been validated by independent single-cell RNA-sequencing datasets. Metastasis is the major cause of cancer-related deaths, and evaluations of metastasis risk are essential for tailored treatment of cancer patients. Existing methods often build a classifier using the clinical metastasis diagnoses as binary responses or detect genomic features significantly associated with metastasis-related survival outcomes. However, these methods tend to identify genomic predictors that have little consistency across different cancer types. Thus, there is an urgent need for a powerful tool to characterize the cancer metastasis potential applicable across a wide span of cancer types. Computational assessment of metastasis potential based on large-scale transcriptomic cancer data is challenging because metastasis events are not always clinically detectable, which results in biased estimations of metastasis potential. Our proposed algorithm, called PLUS, considers patients with metastasis diagnosis as positive instances and the remainder as unlabeled instances, meaning they are either metastatic or non-metastatic. Such a classifier given by PLUS rendered concordance between the predicted cancer metastasis and observed metastasis survival outcomes in the follow-up data for almost all cancer types considered. The selected genes were found to perform functions consistent with experimental research findings and are capable of clustering the single cells based on their levels of metastasis potential.
Collapse
|
11
|
Stupp D, Sharon E, Bloch I, Zitnik M, Zuk O, Tabach Y. Co-evolution based machine-learning for predicting functional interactions between human genes. Nat Commun 2021; 12:6454. [PMID: 34753957 PMCID: PMC8578642 DOI: 10.1038/s41467-021-26792-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/09/2021] [Indexed: 12/20/2022] Open
Abstract
Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il. With the rise in number of eukaryotic species being fully sequenced, large scale phylogenetic profiling can give insights on gene function, Here, the authors describe a machine-learning approach that integrates co-evolution across eukaryotic clades to predict gene function and functional interactions among human genes.
Collapse
Affiliation(s)
- Doron Stupp
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, 02115, USA
| | - Or Zuk
- Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem, 9190501, Israel.
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel.
| |
Collapse
|
12
|
Kim HJ, Kim T, Xiao D, Yang P. Protocol for the processing and downstream analysis of phosphoproteomic data with PhosR. STAR Protoc 2021; 2:100585. [PMID: 34151303 PMCID: PMC8190506 DOI: 10.1016/j.xpro.2021.100585] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of phosphoproteomic data requires advanced computational methodologies. To this end, we developed PhosR, a set of tools and methodologies implemented in R to allow the comprehensive analysis of phosphoproteomic data. PhosR enables processing steps such as imputation, normalization, and functional analysis such as kinase activity inference and signalome construction. Together, PhosR facilitates interpretation and discovery from large-scale phosphoproteomic data sets. For complete details on the use and execution of this protocol, please refer to Kim et al. (2021). This protocol describes how to run and interpret the results of PhosR PhosR performs filtering, imputation, and normalization of phosphoproteomic data PhosR enables kinase-substrate predictions and signalome construction The step-by-step protocol provides a comprehensive introduction to phospho-data analysis
Collapse
Affiliation(s)
- Hani Jieun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
- Corresponding author
| | - Taiyun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Di Xiao
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Pengyi Yang
- Charles Perkins Centre, School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
- Corresponding author
| |
Collapse
|
13
|
Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2021. [DOI: 10.3390/mca26020040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.
Collapse
|
14
|
PhosR enables processing and functional analysis of phosphoproteomic data. Cell Rep 2021; 34:108771. [PMID: 33626354 DOI: 10.1016/j.celrep.2021.108771] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 12/07/2020] [Accepted: 01/28/2021] [Indexed: 02/08/2023] Open
Abstract
Mass spectrometry (MS)-based phosphoproteomics has revolutionized our ability to profile phosphorylation-based signaling in cells and tissues on a global scale. To infer the action of kinases and signaling pathways in phosphoproteomic experiments, we present PhosR, a set of tools and methodologies implemented in a suite of R packages facilitating comprehensive analysis of phosphoproteomic data. By applying PhosR to both published and new phosphoproteomic datasets, we demonstrate capabilities in data imputation and normalization by using a set of "stably phosphorylated sites" and in functional analysis for inferring active kinases and signaling pathways. In particular, we introduce a "signalome" construction method for identifying a collection of signaling modules to summarize and visualize the interaction of kinases and their collective actions on signal transduction. Together, our data and findings demonstrate the utility of PhosR in processing and generating biological knowledge from MS-based phosphoproteomic data.
Collapse
|
15
|
|
16
|
Kim HJ, Osteil P, Humphrey SJ, Cinghu S, Oldfield AJ, Patrick E, Wilkie EE, Peng G, Suo S, Jothi R, Tam PPL, Yang P. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning. Nucleic Acids Res 2020; 48:1828-1842. [PMID: 31853542 PMCID: PMC7038952 DOI: 10.1093/nar/gkz1179] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/02/2019] [Accepted: 12/09/2019] [Indexed: 12/12/2022] Open
Abstract
The developmental potential of cells, termed pluripotency, is highly dynamic and progresses through a continuum of naive, formative and primed states. Pluripotency progression of mouse embryonic stem cells (ESCs) from naive to formative and primed state is governed by transcription factors (TFs) and their target genes. Genomic techniques have uncovered a multitude of TF binding sites in ESCs, yet a major challenge lies in identifying target genes from functional binding sites and reconstructing dynamic transcriptional networks underlying pluripotency progression. Here, we integrated time-resolved ‘trans-omic’ datasets together with TF binding profiles and chromatin conformation data to identify target genes of a panel of TFs. Our analyses revealed that naive TF target genes are more likely to be TFs themselves than those of formative TFs, suggesting denser hierarchies among naive TFs. We also discovered that formative TF target genes are marked by permissive epigenomic signatures in the naive state, indicating that they are poised for expression prior to the initiation of pluripotency transition to the formative state. Finally, our reconstructed transcriptional networks pinpointed the precise timing from naive to formative pluripotency progression and enabled the spatiotemporal mapping of differentiating ESCs to their in vivo counterparts in developing embryos.
Collapse
Affiliation(s)
- Hani Jieun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| | - Pierre Osteil
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Sean J Humphrey
- Charles Perkins Centre, School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Senthilkumar Cinghu
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Andrew J Oldfield
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Ellis Patrick
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia
| | - Emilie E Wilkie
- Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China, and Guangzhou Regenerative Medicine and Health Guangdong Laboratory (GRMH-GDL), Guangzhou 510005, China
| | - Shengbao Suo
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Raja Jothi
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Patrick P L Tam
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Pengyi Yang
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| |
Collapse
|
17
|
Peng L, Liu F, Yang J, Liu X, Meng Y, Deng X, Peng C, Tian G, Zhou L. Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet 2020; 10:1346. [PMID: 32082358 PMCID: PMC7005249 DOI: 10.3389/fgene.2019.01346] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/09/2019] [Indexed: 12/31/2022] Open
Abstract
Identifying lncRNA-protein interactions (LPIs) is vital to understanding various key biological processes. Wet experiments found a few LPIs, but experimental methods are costly and time-consuming. Therefore, computational methods are increasingly exploited to capture LPI candidates. We introduced relevant data repositories, focused on two types of LPI prediction models: network-based methods and machine learning-based methods. Machine learning-based methods contain matrix factorization-based techniques and ensemble learning-based techniques. To detect the performance of computational methods, we compared parts of LPI prediction models on Leave-One-Out cross-validation (LOOCV) and fivefold cross-validation. The results show that SFPEL-LPI obtained the best performance of AUC. Although computational models have efficiently unraveled some LPI candidates, there are many limitations involved. We discussed future directions to further boost LPI predictive performance.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Fuxing Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jialiang Yang
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Xiaojun Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaojun Deng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Cheng Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
18
|
Kim T, Lo K, Geddes TA, Kim HJ, Yang JYH, Yang P. scReClassify: post hoc cell type classification of single-cell rNA-seq data. BMC Genomics 2019; 20:913. [PMID: 31874628 PMCID: PMC6929456 DOI: 10.1186/s12864-019-6305-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. Results Here, we propose a semi-supervised learning framework, named scReClassify, for ‘post hoc’ cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, we demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types. Conclusions scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify
Collapse
Affiliation(s)
- Taiyun Kim
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Kitty Lo
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Thomas A Geddes
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia.,School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, 2006, NSW, Australia
| | - Hani Jieun Kim
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 2145, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia.,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, 2006, NSW, Australia. .,Computational Systems Biology Group, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 2145, NSW, Australia. .,Charles Perkins Centre, The University of Sydney, 2006, NSW, Australia.
| |
Collapse
|